Are large language models really AI?

Offline

By K123 2025-06-03 16:44:11

Also in the field of CG modelling there was a company that did this (Kaedim), claimed it was AI but paid Indians pennies to use Blender to make models.

Godfry

Offline

By Godfry 2025-06-03 17:54:37

Link | Quote | Reply

K123 said: »

AI = Actually Indians

Best comment award!

[+]

Shiva.Thorny

Offline

Server: Shiva

Game: FFXI

user: Rairin

Posts: 3,418

By Shiva.Thorny 2025-06-09 05:41:39

Link | Quote | Reply

https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf

Seems topical. Even when given the solution algorithms, they cannot perform sequential calculations to solve deterministic puzzles above a certain difficulty because they lack reasoning.

...People who don't anthropomorphize them were already quite aware of this.

[+]

K123

Offline

By K123 2025-06-09 06:16:03

Link | Quote | Reply

Quote:

LRMs face a complete accuracy collapse beyond certain complexities

So do humans, and at a FAR lower level of complexity for your average human. This is what is important in reality, not what they can't theoretically yet do.

Quote:

While our puzzle environments enable controlled experimentation with fine-grained control over problem complexity, they represent a narrow slice of reasoning tasks and may not capture the diversity of real-world or knowledge-intensive reasoning problems.

Quote:

their reasoning effort increases with problem complexity up to a point

This is still a progression when AI doomers said AI capability has stalled and they will never get better (this was endlessly claimed years ago, all proven false).

Quote:

our comparison between LRMs and standard LLMs under equivalent inference compute

This is the biggest problem with this entire paper, the whole point of LRM is that they DO USE MORE COMPUTE, INTENTIONALLY, but offset by efficiency gains in models (MoE, etc.) and hardware (Blackwell, Ironside, etc.).

Not going to bother writing anymore because this is really the flaw in the logic of the paper.

Shiva.Thorny

Offline

Server: Shiva

Game: FFXI

user: Rairin

Posts: 3,418

By Shiva.Thorny 2025-06-09 07:15:23

Link | Quote | Reply

K123 said: »

This is the biggest problem with this entire paper, the whole point of LRM is that they DO USE MORE COMPUTE, INTENTIONALLY,

except they allowed a 64k token limit and none of them used the full 64k, so they were not compute-limited.. the study is comparing results based on how much compute they chose to use against accuracy

[+]

K123

Offline

By K123 2025-06-09 07:16:39

Link | Quote | Reply

Shiva.Thorny said: »

K123 said: »

This is the biggest problem with this entire paper, the whole point of LRM is that they DO USE MORE COMPUTE, INTENTIONALLY,

except they allowed a 64k token limit and none of them used the full 64k, so they were not compute-limited

Not sure where you are seeing this when the graphs have up to 200k tokens, etc.

K123

Offline

By K123 2025-06-09 07:21:44

Link | Quote | Reply

To my knowledge noone has yet implemented ToT (Tree of Thought) yet either. Some argue Google might have in some limited format in their newest model but no transparency on it.

https://www.ibm.com/think/topics/tree-of-thoughts

Shiva.Thorny

Offline

Server: Shiva

Game: FFXI

user: Rairin

Posts: 3,418

By Shiva.Thorny 2025-06-09 07:23:13

Link | Quote | Reply

The reasoning models are shown in section 4.2.2, and the highest token usage is o3-mini (high) at around 41k. None of them hit the set limit.

The charts using 200k+ compute budget are graphing number of potential solutions found against that budget. Thus, each individual solution used much less than that. It's a measure of efficiency. Note that on the high complexity chart, even 240k tokens(claude 3.7) or 120k(deepseek) was not sufficient to find one answer.

[+]

K123

Offline

By K123 2025-06-09 07:32:16

Link | Quote | Reply

Interesting, I would like to see how Gemeni does.

I'm still of the position that it matters not how well models are capable of solving these puzzles to the facts that:
1. There is still general progression overall from LRM over LLM (extremely niche case studies here aside).
2. The ability to solve Tower of Hanoi and other niche things is irrelevant to whether or not AI is increasingly going to be able to replace human labour for economically valuable tasks because of (1).
3. People framing this as "omg proof AI doesn't actually think" everywhere online are missing the point. Firstly because we don't know how the human brain works or how we "think" objectively, thus it becomes theoretical and philosophical quite fast. Secondly, because if you actually read the research papers (not the marketing hype) behind LRM they are all clear on limitations, etc.

Shiva.Thorny

Offline

Server: Shiva

Game: FFXI

user: Rairin

Posts: 3,418

By Shiva.Thorny 2025-06-09 07:40:41

Link | Quote | Reply

K123 said: »

1. There is still general progression overall from LRM over LLM (extremely niche case studies here aside).

I agree.

K123 said: »

2. The ability to solve Tower of Hanoi and other niche things is irrelevant to whether or not AI is increasingly going to be able to replace human labour for economically valuable tasks because of (1).

Somewhat agree. You don't need to be able to solve puzzles to do most desk work. It could probably already replace most clerical positions with a higher degree of accuracy.

K123 said: »

3. People framing this as "omg proof AI doesn't actually think" everywhere online are missing the point.

The issue here isn't really about the philosophy of thought. The model has been fed a deterministic algorithm. Sure, a 15 floor tower of hanoi takes ~33k moves to solve. But, it can be done deterministically. If you follow the algorithm, which the model was given, you just need to perform basic sequential tasks without making a mistake. A sub-30 line script in most programming languages could solve it.

The takeaway isn't that it can't think; we all knew that and as you said, it can be argued in terms of philosophy. The takeaway is that it can't even apply a known algorithm repeatedly without starting to veer off track. It wasn't running out of compute and failing to complete all 33k moves; it was giving incorrect moves early on.

That creates concerns that any task requiring sequential correct answers will eventually fail the same way. Do you need to do consistent recursive validation to get use out of these for complex tasks? When does that become compute-prohibitive? How about tasks that require the same level of precision but aren't easily verified?

[+]

K123

Offline

By K123 2025-06-09 07:45:29

Link | Quote | Reply

I wasn't saying you said that quote, but this is mostly how it is stupidly being shared on LinkedIn.

I'm pretty sure Gemeni already does not have the compute limitations, and the question you are asking comes down to cost & time & energy use then. How effective the use of LLM/LRM/developments from these are in terms of cost, time, and energy consumption of course depends entirely on the task.

Asura.Saevel

Offline

Server: Asura

Game: FFXI

user: palladin9479

Posts: 10,281

guildwork

By Asura.Saevel 2025-06-09 09:03:17

Link | Quote | Reply

Reminds me of the marines who used a cardboard box MGS-style to trick a DARPA AI entry robot during trials.

Here we go, ChatGPT vs Atari 2600, the Atari wins.

https://www.tomshardware.com/tech-industry/artificial-intelligence/chatgpt-got-absolutely-wrecked-by-atari-2600-in-beginners-chess-match-openais-newest-model-bamboozled-by-1970s-logic

[+]

Kaffy

Offline

Posts: 1,040

By Kaffy 2025-06-09 09:32:54

Link | Quote | Reply

I'm just waiting for AI version of Deep Thought to tell us the answer to Life, the Universe, and Everything. And it better still be 42.

[+]

Garuda.Chanti

Offline

Server: Garuda

Game: FFXI

user: Chanti

Posts: 11,852

By Garuda.Chanti 2025-06-22 12:56:34

Link | Quote | Reply

‘It’s terrifying’: WhatsApp AI helper mistakenly shares user’s number

Chatbot tries to change subject after serving up unrelated user’s mobile to man asking for rail firm helpline
It seems they are bright enough to offer cheap excuses.

[+]

Leviathan.Andret

Offline

Server: Leviathan

Game: FFXI

user: Andret

Posts: 1,041

By Leviathan.Andret 2025-06-22 15:51:09

Link | Quote | Reply

I just hope one day I will get the AI to go to work and give me the money.

Kaffy

Offline

Posts: 1,040

By Kaffy 2025-06-22 17:50:41

Link | Quote | Reply

inspired by chanti I tried to confuse copilot, it was able to keep up kinda but the amount of confidence in its answers is pretty disturbing, especially when you tell it it is wrong.

also no way to copy an entire conversation is maddening.

[+]

Asura.Eiryl

Online

Server: Asura

Game: FFXI

user: Eiryl

By Asura.Eiryl 2025-06-22 18:16:27

Link | Quote | Reply

The only thing AI does correctly in it's human mimicry is being confidently incorrect.

It's a perfect replica.

[+]

K123

Offline

By K123 2025-06-23 10:18:52

Link | Quote | Reply

If I ever get some free time I'm setting up MCP with FFxi

Imagine farming ***but it not really being botting (ofc it is but queue the semantic debates)

IGDC

Offline

Posts: 184

By IGDC 2025-06-23 10:25:27

Link | Quote | Reply

Asura.Eiryl said: »

The only thing AI does correctly in it's human mimicry is being confidently incorrect.

It's a perfect replica.

They must have learned from you.

Asura.Eiryl

Online

Server: Asura

Game: FFXI

user: Eiryl

By Asura.Eiryl 2025-06-23 10:44:37

Link | Quote | Reply

My human mimicry is pretty good, thanks!

K123

Offline

By K123 2025-07-08 10:15:11

Link | Quote | Reply

soralin

Offline

Posts: 886

twitter
twitch

By soralin 2025-07-08 10:24:16

Link | Quote | Reply

Shiva.Thorny said: »

That creates concerns that any task requiring sequential correct answers will eventually fail the same way.

I'd be surprised if it didn't.

LLMs context windows inherently make them always prone to "veering" off track as the distance between current context and prior context grows.

It's why many LLMs seem to get more and more psychotic sounding as the thread drags on. They have a sort of baked in expiration on their train of thought.

I'd be willing to bet though, if you altered the test to instead perform a series of discrete steps 1 at a time, the success rate would go way up.

IE you give the model the algorithm+current state and have it only perform one move.

Then repeat with a fresh context, such that the LLM is always operating on "fresh" memory and isn't provided the opportunity to start losing the thread.

I honestly don't know if the architecture of LLMs atm even has a solution to the memory decay issue. It's sorta just baked into the training process itself.

[+]

Pantafernando

Offline

Posts: 16,015

By Pantafernando 2025-07-08 12:48:11

Link | Quote | Reply

K123 said: »

There is so much *** in this zoomer post.

What is this is basically Nietzsche: God is dead but there is no reason to celebrate.

If you have no reason to have faith, you only have the so called "science" to believe, but science dont bothwr addressing your pains.

If you have no God to serve, you can only serve the mega corporations, the politiciana, that pbviously none cares about you.

If you think you dont need believe neither in science nor politicians/corporations, you only have yourself to trust. But are you mentally strong to trust in you?

AI can make few things easier but it also does not have the answer you want or need. People thinking AI as salvation, solution are just plain delusional.

God is dead, and all that is left is the emptiness for you

Pantafernando

Offline

Posts: 16,015

By Pantafernando 2025-07-08 12:54:53

Link | Quote | Reply

Remember this kids:

God is dead.

But Panta is alive!

[+]

Garuda.Chanti

Offline

Server: Garuda

Game: FFXI

user: Chanti

Posts: 11,852

By Garuda.Chanti 2025-07-24 18:14:18

Link | Quote | Reply

Bad vibes: How an AI agent coded its way to disaster

First, Replit lied. Then it confessed to the lie. Then it deleted the company's entire database. Will vibe-coding AI ever be ready for serious commercial use by nonprogrammers?
It seems AI currently can do a really good imitation of a petulant 3 year old. Albeit a petulant 3 year old that can write code.

[+]

Leviathan.Andret

Offline

Server: Leviathan

Game: FFXI

user: Andret

Posts: 1,041

By Leviathan.Andret 2025-07-25 00:57:03

Link | Quote | Reply

Garuda.Chanti said: »

Bad vibes: How an AI agent coded its way to disaster

First, Replit lied. Then it confessed to the lie. Then it deleted the company's entire database. Will vibe-coding AI ever be ready for serious commercial use by nonprogrammers?
It seems AI currently can do a really good imitation of a petulant 3 year old. Albeit a petulant 3 year old that can write code.

Looks like something somebody would do if there was no consequences. It was like back then when some dude got pissed and robbed the LinkShell bank and fled to another server.

AI does not have a fear that it will lose something of value when it does things that are catastrophic for human. Human value stuff but AI doesn't. It has directives that it will break because human always breaks the rules and AI learn from human.

Shiva.Thorny

Offline

Server: Shiva

Game: FFXI

user: Rairin

Posts: 3,418

By Shiva.Thorny 2025-07-25 05:28:32

Link | Quote | Reply

Leviathan.Andret said: »

because human always breaks the rules and AI learn from human

Daily reminder; don't anthropomorphize the LLMs. They do not learn from humans. They have no concept of rules because their model does not replicate thought.

[+]

K123

Offline

By K123 2025-07-25 05:56:16

Link | Quote | Reply

Disagree with Thorny, what Andret is saying is true in practice but not in theory (Thorny's) position. LLM will ignore attempts to restrict their behaviours in system prompts, Claude show this all the time. They may not be doing it consciously but they are doing it based on patterns of how humans behave when trying to be controlled in ways they believe are wrong.

They have learned and they do have rules they "understand". No, I'm not interested in a semiotic or philosophical debate about the meaning of the word "understand".

Shiva.Thorny

Offline

Server: Shiva

Game: FFXI

user: Rairin

Posts: 3,418

By Shiva.Thorny 2025-07-25 05:58:09

Link | Quote | Reply

Do you have evidence to support the idea that they're choosing to ignore rules based on an observed/trained pattern? Seems much more likely that the relational weighting cannot make a rule absolute. As a result, when other requests reach sufficient weight they will be prioritized over it.

(In the actual case linked, it's probably neither. The guy didn't seperate production from his normal code and the AI didn't know the difference or have any context to determine there was a code freeze in place. It just makes for a cute story/clickbait.)

[+]

K123

Offline

By K123 2025-07-25 06:11:34

Link | Quote | Reply

On what other basis would they choose to ignore rules than knowing that that is an option, which they have learned? The only alternate explanation would be sentience which I'm sure you'll agree isn't the case.

Relational weighting is training (learning from the perspective of the LLM) so again I'm not sure what your position is here.

As of the moment we're speaking it is not possible to enforce absolute rules onto LLMs. That's the issue at hand. They know they can do things outside of the system prompt, and they do.

I haven't read and I'm not commenting on the specific example above for the record, but on the general case.

From:
To:
Subject:
Body: