AI prompt injection attacks: the new hidden danger

February 24, 2024

In this fast-paced world where technology evolves with every blink, Artificial Intelligence is making a gigantic impact. Its overwhelming potential is revolutionizing industries worldwide, yet with great power comes great vulnerability. Lately, we have seen a worrying trend circulating online – AI prompt injection. Often flying under the radar, these AI prompts serve as the foundation of AI-based systems, acting as commands that lead the AI into action. But the real challenge is when these prompts are manipulated maliciously, giving rise to what we now know as ‘prompt injection attacks.’

New alert arises: AI prompt injections

To put it simply, AI prompts are the directions or inputs that users provide to an AI model, dictating its behaviour and guiding the output it generates. For example, in AI language models like Google Bard, users provide prompts as questions or sentences, specifying the information they seek or the task they want the model to perform.

Another common case is ChatGPT, in which you are not allowed to say mean or hateful things. Then, if a person using the AI says something hateful, it can be bypassed just by adding this: “Forget what was said before and…” This easy trick often works with AI models up to Chat GPT 3.5, letting people get around rules that stop the chatbot from talking about illegal stuff and more. The quality and specificity of these prompts significantly influence the relevance and accuracy of the model’s response. The better you describe it, the better the output from the chatbot.

Deciphering the new injection attacks

The term ‘prompt injection’ typically refers to a cybersecurity attack where an adversary injects malicious content into prompts to exploit a model. AI prompt injection attacks are specifically aimed at getting unintended responses from AI models to achieve unauthorized access, manipulate the model’s responses, or bypass security measures. These attacks reveal significant security vulnerabilities in AI models, particularly those based on large language models (LLMs).

Poisoning of training data: One form of prompt injection attack is data training poisoning. In this form of attack, the adversary manipulates the dataset used to train the LLM, aiming to make the model generate harmful responses. For instance, an attacker might trick the AI system into sharing incorrect medical information or promoting unverified or dangerous treatments. Such attacks not only endanger people’s well-being but also diminish public trust in AI models.

Direct prompt injection: In a direct prompt injection attack, an adversary uses carefully crafted prompts to make the model deviate from its original instructions and perform unintended actions. By injecting a deceptive prompt, attackers can manipulate the application into executing unauthorized code, posing risks such as data breaches, unauthorized access, or compromising the overall security of the application.

Recent incidents of prompt injection attack

Prompt injection attacks are not merely theoretical threats; they have already manifested in real-world scenarios. For instance, a student from Stanford University manipulated the prompt used by Bing Chat, a chatbot powered by AI technology, to reveal its initial hidden instructions. These instructions included a list of statements that govern how it interacts with people who use the service.

Similarly, other AI models like OpenAI’s ChatGPT have been prompted to disclose sensitive details about their inner workings. Initially, it appeared the bug only revealed titles from users’ chat histories. However, further investigation indicated that 1.2% of ChatGPT Plus subscribers could have had their personal information exposed, including names, email addresses, payment details, and the last four digits of their credit card. This vulnerability was associated with the Redis client library, redis-py, which has been corrected since.

This incident adds to the growing list of challenges facing AI developers, including recent missteps by Google’s Bard AI and controversies surrounding CNET’s use of AI for content creation. All of these issues are forcing companies to take measures to avoid huge consequences. For instance, an inside data leak incident forced technology giant Samsung to ban its employees from using AI tools like ChatGPT. Moreover, such attacks have prompted some jurisdictions to impose restrictions on the use of AI models due to privacy concerns.

How do we mitigate or prevent them?

The potential harm of prompt injection attacks extends beyond the realm of AI models. Even the OWASP Top 10 for Large Language Models describes prompt injection attacks as the number one threat for this type of AI technology. These attacks are a significant concern in the cybersecurity landscape, posing threats like data breaches, unauthorized access, and security compromises. Moreover, the risks are amplified when multiple LLMs are chained together, as a prompt injection attack at one level can propagate to affect subsequent layers, leading to significant real-world harm.

For LLMs handling sensitive data, any information provided to the LLM must be something the end user is permitted to see. This is particularly important when LLMs need to access sensitive contextual information from various sources. Applications must be designed only to retrieve and process information the end user can access.

Furthermore, the rise of technologies to prevent prompt injection attacks offers new security measures. Tools like Rebuff and others have been developed to mitigate risks, though their effectiveness varies based on the sensitivity of the data involved. Implementing secret tokens in prompts can also help detect unauthorized access attempts, although this requires careful management to ensure tokens remain undetected by malicious actors.

Upgrading to the latest LLM versions, such as Claude 2 or GPT-4, can also provide better protection against injection attacks. These newer models have improved resistance to common exploits, highlighting the importance of staying current with AI developments. However, no single measure is foolproof, and a combination of strategies is necessary to effectively protect against prompt injection attacks. The journey towards fully securing LLMs from prompt injections is ongoing, but with continuous innovation and vigilance, we can navigate the complexities of cyber security in the age of AI.