Are Large Language Models Really AI?

Eorzea Time
7:13 AM
 
 
 
Language: JP EN FR DE
users online
Forum » Everything Else » Chatterbox » Are large language models really AI?
Are large language models really AI?
First Page 2 3 ... 11 12
Offline
By K123 2025-06-03 16:44:11
Link | Quote | Reply
 
Also in the field of CG modelling there was a company that did this (Kaedim), claimed it was AI but paid Indians pennies to use Blender to make models.
Offline
By Godfry 2025-06-03 17:54:37
Link | Quote | Reply
 
K123 said: »
AI = Actually Indians
Best comment award!
 Shiva.Thorny
Offline
Server: Shiva
Game: FFXI
user: Rairin
Posts: 3,300
By Shiva.Thorny 2025-06-09 05:41:39
Link | Quote | Reply
 
https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf

Seems topical. Even when given the solution algorithms, they cannot perform sequential calculations to solve deterministic puzzles above a certain difficulty because they lack reasoning.

...People who don't anthropomorphize them were already quite aware of this.
[+]
Offline
By K123 2025-06-09 06:16:03
Link | Quote | Reply
 
Quote:
LRMs face a complete accuracy collapse beyond certain complexities
So do humans, and at a FAR lower level of complexity for your average human. This is what is important in reality, not what they can't theoretically yet do.
Quote:
While our puzzle environments enable controlled experimentation with fine-grained control over problem complexity, they represent a narrow slice of reasoning tasks and may not capture the diversity of real-world or knowledge-intensive reasoning problems.

Quote:
their reasoning effort increases with problem complexity up to a point
This is still a progression when AI doomers said AI capability has stalled and they will never get better (this was endlessly claimed years ago, all proven false).

Quote:
our comparison between LRMs and standard LLMs under equivalent inference compute
This is the biggest problem with this entire paper, the whole point of LRM is that they DO USE MORE COMPUTE, INTENTIONALLY, but offset by efficiency gains in models (MoE, etc.) and hardware (Blackwell, Ironside, etc.).

Not going to bother writing anymore because this is really the flaw in the logic of the paper.
 Shiva.Thorny
Offline
Server: Shiva
Game: FFXI
user: Rairin
Posts: 3,300
By Shiva.Thorny 2025-06-09 07:15:23
Link | Quote | Reply
 
K123 said: »
This is the biggest problem with this entire paper, the whole point of LRM is that they DO USE MORE COMPUTE, INTENTIONALLY,

except they allowed a 64k token limit and none of them used the full 64k, so they were not compute-limited.. the study is comparing results based on how much compute they chose to use against accuracy
[+]
Offline
By K123 2025-06-09 07:16:39
Link | Quote | Reply
 
Shiva.Thorny said: »
K123 said: »
This is the biggest problem with this entire paper, the whole point of LRM is that they DO USE MORE COMPUTE, INTENTIONALLY,

except they allowed a 64k token limit and none of them used the full 64k, so they were not compute-limited
Not sure where you are seeing this when the graphs have up to 200k tokens, etc.
Offline
By K123 2025-06-09 07:21:44
Link | Quote | Reply
 
To my knowledge noone has yet implemented ToT (Tree of Thought) yet either. Some argue Google might have in some limited format in their newest model but no transparency on it.

https://www.ibm.com/think/topics/tree-of-thoughts
 Shiva.Thorny
Offline
Server: Shiva
Game: FFXI
user: Rairin
Posts: 3,300
By Shiva.Thorny 2025-06-09 07:23:13
Link | Quote | Reply
 
The reasoning models are shown in section 4.2.2, and the highest token usage is o3-mini (high) at around 41k. None of them hit the set limit.

The charts using 200k+ compute budget are graphing number of potential solutions found against that budget. Thus, each individual solution used much less than that. It's a measure of efficiency. Note that on the high complexity chart, even 240k tokens(claude 3.7) or 120k(deepseek) was not sufficient to find one answer.
[+]
Offline
By K123 2025-06-09 07:32:16
Link | Quote | Reply
 
Interesting, I would like to see how Gemeni does.

I'm still of the position that it matters not how well models are capable of solving these puzzles to the facts that:
1. There is still general progression overall from LRM over LLM (extremely niche case studies here aside).
2. The ability to solve Tower of Hanoi and other niche things is irrelevant to whether or not AI is increasingly going to be able to replace human labour for economically valuable tasks because of (1).
3. People framing this as "omg proof AI doesn't actually think" everywhere online are missing the point. Firstly because we don't know how the human brain works or how we "think" objectively, thus it becomes theoretical and philosophical quite fast. Secondly, because if you actually read the research papers (not the marketing hype) behind LRM they are all clear on limitations, etc.
 Shiva.Thorny
Offline
Server: Shiva
Game: FFXI
user: Rairin
Posts: 3,300
By Shiva.Thorny 2025-06-09 07:40:41
Link | Quote | Reply
 
K123 said: »
1. There is still general progression overall from LRM over LLM (extremely niche case studies here aside).
I agree.

K123 said: »
2. The ability to solve Tower of Hanoi and other niche things is irrelevant to whether or not AI is increasingly going to be able to replace human labour for economically valuable tasks because of (1).
Somewhat agree. You don't need to be able to solve puzzles to do most desk work. It could probably already replace most clerical positions with a higher degree of accuracy.

K123 said: »
3. People framing this as "omg proof AI doesn't actually think" everywhere online are missing the point.
The issue here isn't really about the philosophy of thought. The model has been fed a deterministic algorithm. Sure, a 15 floor tower of hanoi takes ~33k moves to solve. But, it can be done deterministically. If you follow the algorithm, which the model was given, you just need to perform basic sequential tasks without making a mistake. A sub-30 line script in most programming languages could solve it.

The takeaway isn't that it can't think; we all knew that and as you said, it can be argued in terms of philosophy. The takeaway is that it can't even apply a known algorithm repeatedly without starting to veer off track. It wasn't running out of compute and failing to complete all 33k moves; it was giving incorrect moves early on.

That creates concerns that any task requiring sequential correct answers will eventually fail the same way. Do you need to do consistent recursive validation to get use out of these for complex tasks? When does that become compute-prohibitive? How about tasks that require the same level of precision but aren't easily verified?
Offline
By K123 2025-06-09 07:45:29
Link | Quote | Reply
 
I wasn't saying you said that quote, but this is mostly how it is stupidly being shared on LinkedIn.

I'm pretty sure Gemeni already does not have the compute limitations, and the question you are asking comes down to cost & time & energy use then. How effective the use of LLM/LRM/developments from these are in terms of cost, time, and energy consumption of course depends entirely on the task.
 Asura.Saevel
Offline
Server: Asura
Game: FFXI
Posts: 10,262
By Asura.Saevel 2025-06-09 09:03:17
Link | Quote | Reply
 
Reminds me of the marines who used a cardboard box MGS-style to trick a DARPA AI entry robot during trials.

Here we go, ChatGPT vs Atari 2600, the Atari wins.

https://www.tomshardware.com/tech-industry/artificial-intelligence/chatgpt-got-absolutely-wrecked-by-atari-2600-in-beginners-chess-match-openais-newest-model-bamboozled-by-1970s-logic
[+]
Offline
Posts: 888
By Kaffy 2025-06-09 09:32:54
Link | Quote | Reply
 
I'm just waiting for AI version of Deep Thought to tell us the answer to Life, the Universe, and Everything. And it better still be 42.
[+]
First Page 2 3 ... 11 12