contents
Building with LLMs unlocks new opportunities, but Gen AI systems can also fail in ways traditional software doesn’t. The OWASP Top 10 LLM lays out many of the core risks specific to AI applications.
In this blog, we’ll walk you through the OWASP list of top 10 vulnerabilities for LLM applications, explore strategies to mitigate these risks, and show how to apply them to keep your AI product safe and reliable.
The OWASP Top 10 LLM highlights the most critical safety and security risks unique to AI-powered systems. Its goal is to raise awareness and offer practical guidance for developers and organizations building with LLMs.
The list is maintained by the Open Worldwide Application Security Project (OWASP), a nonprofit foundation dedicated to improving software security. Initially launched in 2001 to identify the most pressing risks to web applications, OWASP has grown into a global initiative with over 250 local projects, including the OWASP Top 10 for LLMs.
The OWASP Top 10 LLM is built on the collective expertise of an international team of more than 500 experts and over 150 active contributors. The contributors come from different backgrounds, from AI companies to hardware providers to academia. The list is updated annually to reflect the evolving threat landscape in AI development.
The OWASP Top 10 LLM for 2025 includes the following risks: prompt injection, sensitive information disclosure, supply chain, data and model poisoning, improper output handling, excessive agency, system prompt leakage, vector and embedding weaknesses, misinformation, and unbounded consumption.
Let’s take a closer look at each of these risks.
Prompt injection is one of the most critical safety concerns in LLM-powered applications. It occurs when user inputs manipulate an LLM’s behavior in unintended ways. Such inputs can trick models into violating guidelines, generating harmful content, granting unauthorized access, or making poor decisions.
Prompt injection attacks come in two forms: direct and indirect.
While some forms of prompt injection are hostile attacks—like jailbreaking, which causes the model to ignore safety protocols—others may occur unintentionally. The impact and severity of prompt injection attacks vary greatly and depend on the business context in which the model operates and its properties.
Here are some examples of prompt injection scenarios:
To reduce the risk of prompt injection, consider implementing the following strategies:
When integrated into applications, LLMs can unintentionally expose sensitive information, such as personal data, proprietary algorithms, or confidential business details. This can lead to serious privacy breaches and unauthorized access. Common types of sensitive data at risk include personally identifiable information (PII), financial records, health data, internal documents, security credentials, and legal documents.
Examples of sensitive information disclosure:
To prevent sensitive data from being exposed through your LLM-based system, consider implementing the following safeguards:
If you’re building an LLM-powered application, you’re likely relying on external components, such as training datasets, pre-trained models, or fine-tuning techniques like LoRA (Low-Rank Adaptation). These elements are third-party components in a supply chain behind your LLM system.
Like traditional software development, these third-party components come with inherent risks. Attackers can exploit them through tampering or data poisoning, leading to biased outputs, security breaches, or system failures.
Examples of supply chain attacks:
To protect your LLM system, it’s critical to evaluate and monitor all external components:
Data poisoning involves manipulating training, fine-tuning, or embedding data to introduce vulnerabilities, biases, or backdoors in LLMs. These attacks can degrade model performance, cause it to produce harmful or misleading outputs, or embed secret triggers that change its behavior later on. The risks are particularly high with external data sources.
Moreover, models from public repositories can pose threats beyond data poisoning. Some may contain embedded malware, making this a critical integrity concern.
Examples of data and model poisoning attacks:
To reduce the risk of poisoning and ensure your models are trustworthy, adopt the following best practices:
Improper output handling occurs when downstream systems use responses from an LLM without prior checking. This can create serious security risks. Attackers may exploit model outputs to run harmful code in a browser, access backend systems, steal data, or execute malicious commands on your servers.
Examples of improper output handling:
To prevent these risks, adopt a zero-trust approach to model outputs. Always validate, sanitize, and filter responses before passing them to other systems. Implement logging and monitoring to track unusual output patterns and catch issues early.
Agent-based systems are designed to act with a certain level of autonomy. For example, they can browse the web, book hotels, or schedule meetings. However, when an agent is granted more agency than necessary, it opens the door to potential misuse by malicious users or unintended actions by the system itself.
Excessive agency can be caused by one or more factors: excessive functionality, excessive permissions, and excessive autonomy.
Examples of excessive agency:
To prevent your AI agents from overstepping their role, grant them only the minimum functionality and permissions necessary for the task. For high-impact actions, implement human-in-the-loop checks to review them before execution.
System prompt leakage happens when hidden instructions meant to guide the LLM also reveal sensitive information, such as internal logic, credentials, or user roles.
Examples of system prompt leakage:
To prevent system prompt leakage, avoid embedding sensitive information — like API keys and user roles — directly in system prompts. Store it in external systems that the model can't access directly. You should also implement guardrails to validate model outputs, ensuring they align with expected behavior.
RAG helps ground the outputs of LLMs in context-specific data by pulling in relevant information from trusted data sources, like technical documentation or company policies. To do this, RAG systems use embeddings to find and inject relevant information into model prompts. If handled poorly, embeddings can be exploited to inject harmful content, manipulate outputs, or expose sensitive data.
Examples of vector and embedding weakness:
To prevent these attacks, apply strong access controls and isolate data within your vector stores to prevent unauthorized access. You must also set up robust data validation pipelines for all knowledge sources. Regularly check for hidden code or poisoning, and only use trusted data.
Misinformation is a critical risk in LLM-powered applications. It occurs when models generate false but convincing content. A common cause of misinformation is hallucination when the model fills gaps in its knowledge with fabricated but confident-sounding responses. Other sources include biased training data and incomplete information.
Examples of misinformation:
One of the widely used techniques to mitigate the risk of misinformation and hallucination in LLMs is Retrieval-Augmented Generation (RAG). RAG helps make model outputs more reliable by retrieving relevant and verified information from trusted data sources.
Human oversight and fact-checking are essential for high-risk use cases, such as finance, healthcare, or law. Adding a layer of human review ensures critical decisions aren’t made based on incorrect or fabricated outputs.
Unbounded consumption occurs when an LLM application allows users to make too many uncontrolled requests. This can overload the system, degrade performance, or even cause outages. Since LLMs use much computing power, excessive usage can also drive up operational costs.
Examples of unbounded consumption:
To restrict unbounded consumption, you can validate inputs to limit their size and prevent overload. You can also limit the number of requests per user within a set time to prevent abuse. Finally, you should constantly track resource usage to spot unusual spikes.
The OWASP Top 10 LLM is a critical reference point that outlines the most pressing security and safety risks for LLM-based systems. Created with input from LLM security experts, this list helps teams stay aware of evolving attack vectors and best practices.
However, risk awareness is just the starting point. Applying this list effectively requires three core steps:
1. Risk assessment. Start by identifying which vulnerabilities from the OWASP list apply to your use case. Are you exposing LLMs to untrusted user input? Are you relying on third-party plugins? Are model outputs directly triggering actions? Your threat model depends on how the system is used in practice.
While the OWASP Top 10 is a strong foundation, it’s also not a complete risk checklist. LLM safety needs to be aligned with the specific goals of your application. A coding assistant has risks that are different from those of a nutrition chatbot. You can start by asking a simple but essential question: What do safety and reliability mean in your context?
For example:
2. Risk mitigation. Once you understand the relevant risks, implement safeguards. This may include prompt hardening, input content filtering or validation, access controls, and isolation of high-risk components.
3. Risk testing. Most importantly, you need to operationalize your safety and risk management framework through testing.
Testing allows you to validate how well your mitigations work. How does your system respond if you ask it to perform an unsafe action or generate harmful content? Can the model be tricked through indirect inputs? Regular red teaming and adversarial testing can surface issues before attackers (or users) do.
Once you’ve defined your AI risks and priorities, the next step is designing the AI risk testing framework that actively probes for failures. LLM red-teaming and adversarial testing are LLM evaluation workflows, focused on stress-testing the responses when presented with attempts to break or exploit your system.
The goal is to stress-test your system by simulating real-world misuse, edge cases, and attempts to bypass safeguards.
You start by generating inputs that fit the scenarios you want to test. This could include:
You can incorporate existing safety benchmarks in your testing.
However, while LLM benchmarks are a helpful starting point, they rarely reflect the specifics of your application.
To better fit into the scope of your specific application, you can use synthetic adversarial examples tailored to your threat model. This helps scale test coverage and target specific vulnerabilities more efficiently.
Tools like Evidently Cloud support this by allowing teams to generate matching adversarial inputs and integrate them into evaluation pipelines.
You can also use synthetic data to generate pairs of correct inputs and outputs based on your source knowledge base to test your LLM systems for correctness and ensure it doesn’t hallucinate responses on defined topics critical for your domain.
Once you run the test inputs through your LLM application, you can collect and evaluate the responses based on your defined criteria. For example, you can assess correctness against the reference response or the safety of the LLM outputs. You can use approaches like LLM-as-a-judge to perform these evaluations automatically.
Following this approach, you can effectively build automated testing pipelines that allow you to stress-test LLMs for all relevant vulnerabilities.
Importantly, this isn’t a one-time exercise. As models, prompts, and user behavior evolve, continuous AI safety testing is essential. You can implement AI risk testing as part of the pre-release workflow on every system update, and periodically review and expand test scenarios as new vulnerabilities emerge.
Designing a robust AI risk testing and evaluation system for LLM applications can seem complex — but it doesn’t have to be.
Evidently Cloud is purpose-built to help you test and evaluate AI system behavior at every stage of development, covering both quality and safety testing workflows.
With Evidently Cloud, you can:
All from a single, transparent interface. It’s built on top of Evidently, our open-source framework trusted by the AI community with over 25 million downloads. With 100+ built-in checks and customization options, we make aligning your AI testing strategy with your product’s needs easy.
Let’s work together. Request a demo, and we’ll collaborate with your AI, product, and risk teams to:
Let’s make sure your AI systems work as intended — safely and reliably at scale.