1. There is still general progression overall from LRM over LLM (extremely niche case studies here aside).
I agree.
2. The ability to solve Tower of Hanoi and other niche things is irrelevant to whether or not AI is increasingly going to be able to replace human labour for economically valuable tasks because of (1).
Somewhat agree. You don't need to be able to solve puzzles to do most desk work. It could probably already replace most clerical positions with a higher degree of accuracy.
3. People framing this as "omg proof AI doesn't actually think" everywhere online are missing the point.
The issue here isn't really about the philosophy of thought. The model has been fed a deterministic algorithm. Sure, a 15 floor tower of hanoi takes ~33k moves to solve. But, it can be done deterministically. If you follow the algorithm, which the model was given, you just need to perform basic sequential tasks without making a mistake. A sub-30 line script in most programming languages could solve it.
The takeaway isn't that it can't think; we all knew that and as you said, it can be argued in terms of philosophy. The takeaway is that it
can't even apply a known algorithm repeatedly without starting to veer off track. It wasn't running out of compute and failing to complete all 33k moves; it was giving incorrect moves early on.
That creates concerns that any task requiring sequential correct answers will eventually fail the same way. Do you need to do consistent recursive validation to get use out of these for complex tasks? When does that become compute-prohibitive? How about tasks that require the same level of precision but aren't easily verified?