Are large language models really AI?

Asura.Iamaman

Offline

Server: Asura

Game: FFXI

user: iamaman

Posts: 1144

By Asura.Iamaman 2025-11-25 15:46:13

Link | Quote | Reply

Fenrir.Niflheim said: »

Oh what? but look they can find CVEs (this one is even a remote zeroday vulnerability), I mean ignore the mountain of false positives, you can sort through those... like what else are you going to do with your time after you are replaced by the AI? It is only going to get better and at rapid pace.

This is a fair example and one I was aware of, the issue I take is that this isn't how AI is being promoted by the industry.

The way most people read the way AI is going to work is this: give me code, I put it in AI, tell AI to find bugs, AI gives me bugs. Now we, here, understand this is not the case but your typical CISO does not. The goal is to promote the idea that you can remove the person and automate results using AI, something they've claimed for years prior to LLMs and...wasn't the case. In this case, the bug was found 8% of the time, so between weeding out false positives and running enough for it to find the bug, is that really working the way it's being promoted esp in context of the Anthropic posts?

In the case above, you had someone who understood the code enough to find a bug on their own (he kindof hints at this but IMO underplays the value here), then fed the LLM the specific code necessary AND understands how LLMs work enough to know how to prompt it and provide what is needed. You'd also have to have enough familiarity with the code to weed out false positives, again, something that requires manual intervention and review. This code is not hard to get through, for sure, but still...there's a prerequisite that someone can interpret the result and make sense of it.

In the context of the other discussion, exploiting an issue like this is also extremely volatile. Most memory corruption bugs I've found in the course of my career are not practically exploitable, but get CVEs anyway because it segfaults and most people don't care if it's actually exploitable. Feeding a LLM the code necessary to reliably exploit a UAF bug would require understanding of the compiled code, allocator internals, thread state, and a number of other factors that it's just not capable of handling to produce a working exploit. So yea, bug hunting there is some optimization, but not enough to replace people and even (in my experience) provide meaningful output. You still need someone who can understand these internals to really turn it into something useful, but it involves correlation of so many different factors (some of which are non-deterministic, like allocator state) that LLMs can't begin to handle it.

I actually ran a similar test not long ago for a bug (in the Linux kernel actually, also) that related to the way state was handled for a certain kernel module. I prompted it and hand held it to see if I could get it even close to finding the bug, even going to the point of asking "is this a bug?" while pointing out the bug and it basically told me to read the code myself after giving me the wrong answer repeatedly. Granted, the code in my case is a little more complex as it relates to data fed in from userspace that is very indirectly interacted with, whereas data pulled through a command handler off the network is a more linear path (which is what was done here). In another case, I was interacting with a service over a public available IPC API and it just invented header files, function calls, and data structures that didn't exist and even pointing at the code, it couldn't get over whatever it was hallucinating. I used Claude and Copilot, though, so maybe I need try o3 and replicate his process here a little better than I have in the past (our current work isn't strictly related to this at the moment).

I think the overall point here is that yea, there is value in some cases that can optimize what someone who already knows what they are doing is capable of. The problem is that's not the pitch, the pitch is a lot simpler and is the same pitch that's been out there for years prior to LLMs, but doesn't match the reality when you are dealing with complex targets. These tools can make someone more optimized, at times, and others they can end up chasing ghosts. That's also questioning whether simpler testing methods, like simply fuzzing the SMB protocol in this case with the proper instrumentation, and whether or not it would've identified the same bug with less work and core understanding of the code (initially, anyway)

[+]

Fenrir.Niflheim

VIP

Offline

Server: Fenrir

Game: FFXI

user: Tesahade

Posts: 1178

guildwork

By Fenrir.Niflheim 2025-11-25 18:49:44

Link | Quote | Reply

Asura.Iamaman said: »

In the case above, you had someone who understood the code enough to find a bug on their own (he kindof hints at this but IMO underplays the value here), then fed the LLM the specific code necessary AND understands how LLMs work enough to know how to prompt it and provide what is needed. You'd also have to have enough familiarity with the code to weed out false positives, again, something that requires manual intervention and review. This code is not hard to get through, for sure, but still...there's a prerequisite that someone can interpret the result and make sense of it.

Yep, and for the otherside of it we have death by a thousand slops the maintainer of curl discusses how they are inundated with false reports, those 92% false positives from the LLM security researcher.

And another: google sends a bug report with a scheduled disclosure timeline to ffmpeg if a company with the time to find the exploits maybe they should fix the bug as well. Also worth noting the exploit impacts an irrelevant section of ffmpeg.

[+]

Asura.Iamaman

Offline

Server: Asura

Game: FFXI

user: iamaman

Posts: 1144

By Asura.Iamaman 2025-11-25 19:28:39

Link | Quote | Reply

This is something that absolutely boils my *** blood, it's been going on for decades in the security industry.

They inflate EVERYTHING then act like they are sooo hot because they....dumped a 50mb PDF on some vendor that triggers a NULL pointer dereference that they insist is a buffer overflow (when it isn't) and demand credit despite doing 0 triage, 0 reduction of the repro, and basically just used CPU cycles to find the magic pattern that triggers the crash. Some of the biggest names in the space are notorious for doing this and most people don't realize it, because their public persona is very much analytical, detailed, etc but they wear every CVE as a badge of honor when they did little to no work to actually identify the issue and most of the time they aren't even an exploitable bug category, forget being an exploitable/reachable bug.

A lot of these stick with me, one was a bug in some dumb document parser and everyone was melting down about it. It reproduced on Linux and, in theory, reproduced on Windows. The problem is, it didn't, it gracefully exited because Visual C++ defaults to checked iterators and when the out of bounds access occurred, the iterator identified it and gracefully exited without a segfault or corruption. The Linux version didn't do that because gcc didn't have that feature. There was all sorts of media hysteria and no one bothered to attach a *** debugger to see what was going on. This type of ***happens all the time and even the big vendors do it.

The problem is that, in the open source space especially, this leads to what you mention above. The maintainers can't handle the number of reports, people aren't doing any real triage or work, and in the mess of this real bugs don't get fixed or they get fixed and prioritized wrong because it was wrapped up in another code change but the bug wasn't identified (which is bad when you consider how downstream maintainers cherry pick fixes). The reason people like Google don't go and offer fixes is because they don't look at the code closely enough to offer any meaningful attempt at patching it. They'll fuzz it out or see asan triggers a problem then report it without much more analysis. In a lot of cases, they aren't even capable of offering a fix.

I don't really interact with the community the same way as I did before, so it never really occurred to me that LLMs would cause more of this to happen or make it worse. I can only imagine, it was bad enough before.

[+]

From:
To:
Subject:
Body: