Tech

AI threats in software development revealed in new study

Share
Share
AI threats in software development revealed in new study from The University of Texas at San Antonio
An example of a large language model. UTSA researchers recently completed one of the most comprehensive studies to date on the risks of using AI models to develop software. In a new paper, they demonstrate how a specific type of error could pose a serious threat to programmers that use AI to help write code. Credit: The University of Texas at San Antonio

UTSA researchers recently completed one of the most comprehensive studies to date on the risks of using AI models to develop software. In a new paper, they demonstrate how a specific type of error could pose a serious threat to programmers that use AI to help write code.

Joe Spracklen, a UTSA doctoral student in computer science, led the study on how large language models (LLMs) frequently generate insecure code.

His team’s paper, published on the arXiv preprint server, has also been accepted for publication at the USENIX Security Symposium 2025, a cybersecurity and privacy conference.

The multi-institutional collaboration featured three additional researchers from UTSA: doctoral student A.H.M. Nazmus Sakib, postdoctoral researcher Raveen Wijewickrama, and Associate Professor Dr. Murtuza Jadliwala, director of the SPriTELab (Security, Privacy, Trust, and Ethics in Computing Research Lab).

Additional collaborators were Anindya Maita from the University of Oklahoma (a former UTSA postdoctoral researcher) and Bimal Viswanath from Virginia Tech.

Hallucinations in LLMs occur when the model produces content that is factually incorrect, nonsensical or completely unrelated to the input task. Most current research so far has focused mainly on hallucinations in classical natural language generation and prediction tasks such as machine translation, summarization and conversational AI.

The research team focused on the phenomenon of package hallucination, which occurs when an LLM generates or recommends the use of a third-party software library that does not actually exist.

What makes package hallucinations a fascinating area of research is how something so simple—a single, everyday command—can lead to serious security risks.

“It doesn’t take a convoluted set of circumstances or some obscure thing to happen,” Spracklen said. “It’s just typing in one command that most people who work in those programming languages type every day. That’s all it takes. It’s very direct and very simple.”

“It’s also ubiquitous,” he added. “You can do very little with your basic Python coding language. It would take you a long time to write the code yourself, so it is universal to rely on open-source software to extend the capabilities of your programming language to accomplish specific tasks.”

LLMs are becoming increasingly popular among developers, who use the AI models to assist in assembling programs.

According to the study, up to 97% of software developers incorporate generative AI into their workflow, and 30% of code written today is AI-generated.

Additionally, many popular programming languages, like PyPI for Python and npm for JavaScript, rely on the use of a centralized package repository. Because the repositories are often open source, bad actors can upload malicious code disguised as legitimate packages.

For years, attackers have employed various tricks to get users to install their malware. Package hallucinations are the latest tactic.

“So, let’s say I ask ChatGPT to help write some code for me and it writes it. Now, let’s say in the generated code it includes a link to some package, and I trust it and run the code, but the package does not exist, it’s some hallucinated package. An astute adversary/hacker could see this behavior (of the LLM) and realize that the LLM is telling people to use this non-existent package, this hallucinated package,” Jadliwala explained.

“The adversary can then just trivially create a new package with the same name as the hallucinated package (being recommended by the LLM) and inject some bad code in it.

“Now, next time the LLM recommends the same package in the generated code and an unsuspecting user executes the code, this malicious package is now downloaded and executed on the user’s machine.”

The UTSA researchers evaluated the occurrence of package hallucinations across different programming languages, settings and parameters, exploring the likelihood of erroneous package recommendations and identifying root causes.

Across 30 different tests carried out by the UTSA researchers, 440,445 of 2.23 million code samples they generated in Python and JavaScript using LLM models referenced hallucinated packages.

Of the LLMs researchers tested, “GPT-series models were found four times less likely to generate hallucinated packages compared to open-source models, with a 5.2% hallucination rate compared to 21.7%,” the study stated. Python code was less susceptible to hallucinations than JavaScript, researchers found.

These attacks often involve naming a malicious package to mimic a legitimate one, a tactic known as a package confusion attack. In a package hallucination attack, an unsuspecting LLM user would be recommended the package in their generated code, and trusting the LLM, would download the adversary-created malicious package, resulting in a compromise.

The insidious element of this vulnerability is that it exploits growing trust in LLMs. As they continue to get more proficient in coding tasks, users will be more likely to blindly trust their output and potentially fall victim to this attack.

“If you code a lot, it’s not hard to see how this happens. We talked to a lot of people and almost everyone says they’ve noticed a package hallucination happen to them while they’re coding, but they never considered how it could be used maliciously,” Spracklen explained.

“You’re placing a lot of implicit trust on the package publisher that the code they’ve shared is legitimate and not malicious. But every time you download a package, you’re downloading potentially malicious code and giving it complete access to your machine.”

While cross-referencing generated packages with a master list may help mitigate hallucinations, UTSA researchers said the best solution is to address the foundation of LLMs during its own development. The team has disclosed its findings to model providers, including OpenAI, Meta, DeepSeek and Mistral AI.

More information:
Joseph Spracklen et al, We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs, arXiv (2024). DOI: 10.48550/arxiv.2406.10279

Journal information:
arXiv


Provided by
University of Texas at San Antonio


Citation:
AI threats in software development revealed in new study (2025, April 8)
retrieved 8 April 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
We just got another big hint that the Samsung Galaxy S25 FE is on the way
Tech

We just got another big hint that the Samsung Galaxy S25 FE is on the way

References to Galaxy S25 FE firmware have appeared The phone could launch...

You won’t believe what 700+ projectors and AI can do in Abu Dhabi’s new immersive art world
Tech

You won’t believe what 700+ projectors and AI can do in Abu Dhabi’s new immersive art world

Over 700 Epson projectors transform walls into moving, responsive works of living...

When the school bell rings, the bandwidth drops: How post-15:40 internet surges affect UK broadband quality
Tech

When the school bell rings, the bandwidth drops: How post-15:40 internet surges affect UK broadband quality

Half of parents work after school, causing a broadband battle with streaming-addicted...