|
Are large language models really AI?
By K123 2025-06-03 16:44:11
Also in the field of CG modelling there was a company that did this (Kaedim), claimed it was AI but paid Indians pennies to use Blender to make models.
By Godfry 2025-06-03 17:54:37
[+]
Shiva.Thorny
Server: Shiva
Game: FFXI
Posts: 3,418
By Shiva.Thorny 2025-06-09 05:41:39
https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf
Seems topical. Even when given the solution algorithms, they cannot perform sequential calculations to solve deterministic puzzles above a certain difficulty because they lack reasoning.
...People who don't anthropomorphize them were already quite aware of this.
By K123 2025-06-09 06:16:03
Quote: LRMs face a complete accuracy collapse beyond certain complexities So do humans, and at a FAR lower level of complexity for your average human. This is what is important in reality, not what they can't theoretically yet do.
Quote: While our puzzle environments enable controlled experimentation with fine-grained control over problem complexity, they represent a narrow slice of reasoning tasks and may not capture the diversity of real-world or knowledge-intensive reasoning problems.
Quote: their reasoning effort increases with problem complexity up to a point This is still a progression when AI doomers said AI capability has stalled and they will never get better (this was endlessly claimed years ago, all proven false).
Quote: our comparison between LRMs and standard LLMs under equivalent inference compute This is the biggest problem with this entire paper, the whole point of LRM is that they DO USE MORE COMPUTE, INTENTIONALLY, but offset by efficiency gains in models (MoE, etc.) and hardware (Blackwell, Ironside, etc.).
Not going to bother writing anymore because this is really the flaw in the logic of the paper.
Shiva.Thorny
Server: Shiva
Game: FFXI
Posts: 3,418
By Shiva.Thorny 2025-06-09 07:15:23
This is the biggest problem with this entire paper, the whole point of LRM is that they DO USE MORE COMPUTE, INTENTIONALLY,
except they allowed a 64k token limit and none of them used the full 64k, so they were not compute-limited.. the study is comparing results based on how much compute they chose to use against accuracy
[+]
By K123 2025-06-09 07:16:39
This is the biggest problem with this entire paper, the whole point of LRM is that they DO USE MORE COMPUTE, INTENTIONALLY,
except they allowed a 64k token limit and none of them used the full 64k, so they were not compute-limited Not sure where you are seeing this when the graphs have up to 200k tokens, etc.
By K123 2025-06-09 07:21:44
To my knowledge noone has yet implemented ToT (Tree of Thought) yet either. Some argue Google might have in some limited format in their newest model but no transparency on it.
https://www.ibm.com/think/topics/tree-of-thoughts
Shiva.Thorny
Server: Shiva
Game: FFXI
Posts: 3,418
By Shiva.Thorny 2025-06-09 07:23:13
The reasoning models are shown in section 4.2.2, and the highest token usage is o3-mini (high) at around 41k. None of them hit the set limit.
The charts using 200k+ compute budget are graphing number of potential solutions found against that budget. Thus, each individual solution used much less than that. It's a measure of efficiency. Note that on the high complexity chart, even 240k tokens(claude 3.7) or 120k(deepseek) was not sufficient to find one answer.
[+]
By K123 2025-06-09 07:32:16
Interesting, I would like to see how Gemeni does.
I'm still of the position that it matters not how well models are capable of solving these puzzles to the facts that:
1. There is still general progression overall from LRM over LLM (extremely niche case studies here aside).
2. The ability to solve Tower of Hanoi and other niche things is irrelevant to whether or not AI is increasingly going to be able to replace human labour for economically valuable tasks because of (1).
3. People framing this as "omg proof AI doesn't actually think" everywhere online are missing the point. Firstly because we don't know how the human brain works or how we "think" objectively, thus it becomes theoretical and philosophical quite fast. Secondly, because if you actually read the research papers (not the marketing hype) behind LRM they are all clear on limitations, etc.
Shiva.Thorny
Server: Shiva
Game: FFXI
Posts: 3,418
By Shiva.Thorny 2025-06-09 07:40:41
1. There is still general progression overall from LRM over LLM (extremely niche case studies here aside). I agree.
2. The ability to solve Tower of Hanoi and other niche things is irrelevant to whether or not AI is increasingly going to be able to replace human labour for economically valuable tasks because of (1). Somewhat agree. You don't need to be able to solve puzzles to do most desk work. It could probably already replace most clerical positions with a higher degree of accuracy.
3. People framing this as "omg proof AI doesn't actually think" everywhere online are missing the point. The issue here isn't really about the philosophy of thought. The model has been fed a deterministic algorithm. Sure, a 15 floor tower of hanoi takes ~33k moves to solve. But, it can be done deterministically. If you follow the algorithm, which the model was given, you just need to perform basic sequential tasks without making a mistake. A sub-30 line script in most programming languages could solve it.
The takeaway isn't that it can't think; we all knew that and as you said, it can be argued in terms of philosophy. The takeaway is that it can't even apply a known algorithm repeatedly without starting to veer off track. It wasn't running out of compute and failing to complete all 33k moves; it was giving incorrect moves early on.
That creates concerns that any task requiring sequential correct answers will eventually fail the same way. Do you need to do consistent recursive validation to get use out of these for complex tasks? When does that become compute-prohibitive? How about tasks that require the same level of precision but aren't easily verified?
[+]
By K123 2025-06-09 07:45:29
I wasn't saying you said that quote, but this is mostly how it is stupidly being shared on LinkedIn.
I'm pretty sure Gemeni already does not have the compute limitations, and the question you are asking comes down to cost & time & energy use then. How effective the use of LLM/LRM/developments from these are in terms of cost, time, and energy consumption of course depends entirely on the task.
Asura.Saevel
Server: Asura
Game: FFXI
Posts: 10,281
By Asura.Saevel 2025-06-09 09:03:17
[+]
By Kaffy 2025-06-09 09:32:54
I'm just waiting for AI version of Deep Thought to tell us the answer to Life, the Universe, and Everything. And it better still be 42.
Garuda.Chanti
Server: Garuda
Game: FFXI
Posts: 11,852
By Garuda.Chanti 2025-06-22 12:56:34
[+]
Leviathan.Andret
Server: Leviathan
Game: FFXI
Posts: 1,041
By Leviathan.Andret 2025-06-22 15:51:09
I just hope one day I will get the AI to go to work and give me the money.
By Kaffy 2025-06-22 17:50:41
inspired by chanti I tried to confuse copilot, it was able to keep up kinda but the amount of confidence in its answers is pretty disturbing, especially when you tell it it is wrong.
also no way to copy an entire conversation is maddening.
[+]
Asura.Eiryl
By Asura.Eiryl 2025-06-22 18:16:27
The only thing AI does correctly in it's human mimicry is being confidently incorrect.
It's a perfect replica.
By K123 2025-06-23 10:18:52
If I ever get some free time I'm setting up MCP with FFxi
Imagine farming ***but it not really being botting (ofc it is but queue the semantic debates)
By IGDC 2025-06-23 10:25:27
The only thing AI does correctly in it's human mimicry is being confidently incorrect.
It's a perfect replica.
They must have learned from you.
Asura.Eiryl
By Asura.Eiryl 2025-06-23 10:44:37
My human mimicry is pretty good, thanks!
By K123 2025-07-08 10:15:11
By soralin 2025-07-08 10:24:16
That creates concerns that any task requiring sequential correct answers will eventually fail the same way.
I'd be surprised if it didn't.
LLMs context windows inherently make them always prone to "veering" off track as the distance between current context and prior context grows.
It's why many LLMs seem to get more and more psychotic sounding as the thread drags on. They have a sort of baked in expiration on their train of thought.
I'd be willing to bet though, if you altered the test to instead perform a series of discrete steps 1 at a time, the success rate would go way up.
IE you give the model the algorithm+current state and have it only perform one move.
Then repeat with a fresh context, such that the LLM is always operating on "fresh" memory and isn't provided the opportunity to start losing the thread.
I honestly don't know if the architecture of LLMs atm even has a solution to the memory decay issue. It's sorta just baked into the training process itself.
[+]
By Pantafernando 2025-07-08 12:48:11
There is so much *** in this zoomer post.
What is this is basically Nietzsche: God is dead but there is no reason to celebrate.
If you have no reason to have faith, you only have the so called "science" to believe, but science dont bothwr addressing your pains.
If you have no God to serve, you can only serve the mega corporations, the politiciana, that pbviously none cares about you.
If you think you dont need believe neither in science nor politicians/corporations, you only have yourself to trust. But are you mentally strong to trust in you?
AI can make few things easier but it also does not have the answer you want or need. People thinking AI as salvation, solution are just plain delusional.
God is dead, and all that is left is the emptiness for you
By Pantafernando 2025-07-08 12:54:53
Remember this kids:
God is dead.
But Panta is alive!
[+]
Garuda.Chanti
Server: Garuda
Game: FFXI
Posts: 11,852
By Garuda.Chanti 2025-07-24 18:14:18
[+]
Leviathan.Andret
Server: Leviathan
Game: FFXI
Posts: 1,041
By Leviathan.Andret 2025-07-25 00:57:03
Looks like something somebody would do if there was no consequences. It was like back then when some dude got pissed and robbed the LinkShell bank and fled to another server.
AI does not have a fear that it will lose something of value when it does things that are catastrophic for human. Human value stuff but AI doesn't. It has directives that it will break because human always breaks the rules and AI learn from human.
Shiva.Thorny
Server: Shiva
Game: FFXI
Posts: 3,418
By Shiva.Thorny 2025-07-25 05:28:32
because human always breaks the rules and AI learn from human
Daily reminder; don't anthropomorphize the LLMs. They do not learn from humans. They have no concept of rules because their model does not replicate thought.
By K123 2025-07-25 05:56:16
Disagree with Thorny, what Andret is saying is true in practice but not in theory (Thorny's) position. LLM will ignore attempts to restrict their behaviours in system prompts, Claude show this all the time. They may not be doing it consciously but they are doing it based on patterns of how humans behave when trying to be controlled in ways they believe are wrong.
They have learned and they do have rules they "understand". No, I'm not interested in a semiotic or philosophical debate about the meaning of the word "understand".
Shiva.Thorny
Server: Shiva
Game: FFXI
Posts: 3,418
By Shiva.Thorny 2025-07-25 05:58:09
Do you have evidence to support the idea that they're choosing to ignore rules based on an observed/trained pattern? Seems much more likely that the relational weighting cannot make a rule absolute. As a result, when other requests reach sufficient weight they will be prioritized over it.
(In the actual case linked, it's probably neither. The guy didn't seperate production from his normal code and the AI didn't know the difference or have any context to determine there was a code freeze in place. It just makes for a cute story/clickbait.)
[+]
By K123 2025-07-25 06:11:34
On what other basis would they choose to ignore rules than knowing that that is an option, which they have learned? The only alternate explanation would be sentience which I'm sure you'll agree isn't the case.
Relational weighting is training (learning from the perspective of the LLM) so again I'm not sure what your position is here.
As of the moment we're speaking it is not possible to enforce absolute rules onto LLMs. That's the issue at hand. They know they can do things outside of the system prompt, and they do.
I haven't read and I'm not commenting on the specific example above for the record, but on the general case.
And if not what would or could be? (This assumes that we are intelligent.)
Sub questions:
1, Is self awareness needed for intelligence?
2, Is conciseness needed for intelligence?
3, Would creativity be possible without intelligence?
Feel free to ask more.
I say they aren't. To me they are search engines that have leveled up once or twice but haven't evolved.
They use so much electricity because they have to sift through darn near everything for each request. Intelligence at a minimum would prune search paths way better than LLMs do. Enough to reduce power consumption by several orders of magnitude.
After all if LLMs aren't truly AI then whatever is will suck way more power unless they evolve.
I don't think that LLM's hallucinations are disqualifying. After all I and many of my friends spent real money for hallucinations.
|
|