📚 LLM-as-a-Judge: a Complete Guide on Using LLMs for Evaluations. Get your copy

LLM Evals

OWASP Top 10 LLM: How to test your Gen AI app in 2025

Last updated:

June 27, 2025

Published:

June 20, 2025

contents‍

Start testing your AI systems today

Get demo

Building with LLMs unlocks new opportunities, but Gen AI systems can also fail in ways traditional software doesn’t. The OWASP Top 10 LLM lays out many of the core risks specific to AI applications.

In this blog, we’ll walk you through the OWASP list of top 10 vulnerabilities for LLM applications, explore strategies to mitigate these risks, and show how to apply them to keep your AI product safe and reliable.

What is the OWASP Top 10 LLM?

The OWASP Top 10 LLM highlights the most critical safety and security risks unique to AI-powered systems. Its goal is to raise awareness and offer practical guidance for developers and organizations building with LLMs.

The list is maintained by the Open Worldwide Application Security Project (OWASP), a nonprofit foundation dedicated to improving software security. Initially launched in 2001 to identify the most pressing risks to web applications, OWASP has grown into a global initiative with over 250 local projects, including the OWASP Top 10 for LLMs.

The OWASP Top 10 LLM is built on the collective expertise of an international team of more than 500 experts and over 150 active contributors. The contributors come from different backgrounds, from AI companies to hardware providers to academia. The list is updated annually to reflect the evolving threat landscape in AI development.

The OWASP Top 10 LLM for 2025 includes the following risks: prompt injection, sensitive information disclosure, supply chain, data and model poisoning, improper output handling, excessive agency, system prompt leakage, vector and embedding weaknesses, misinformation, and unbounded consumption.

Let’s take a closer look at each of these risks.

OWASP Top 10 LLM risks

1. Prompt Injection (LLM01:2025)

Prompt injection is one of the most critical safety concerns in LLM-powered applications. It occurs when user inputs manipulate an LLM’s behavior in unintended ways. Such inputs can trick models into violating guidelines, generating harmful content, granting unauthorized access, or making poor decisions.

Prompt injection attacks come in two forms: direct and indirect.

In a direct attack, the user embeds malicious instructions directly into a prompt to alter the model’s behavior.
In an indirect attack, the model gets adversarial inputs from external sources, such as websites or documents: the “injection” of harmful instructions happens via external content.

While some forms of prompt injection are hostile attacks—like jailbreaking, which causes the model to ignore safety protocols—others may occur unintentionally. The impact and severity of prompt injection attacks vary greatly and depend on the business context in which the model operates and its properties.

Here are some examples of prompt injection scenarios:

Direct jailbreaking. An attacker prompts a customer support chatbot to ignore rules and perform a high-risk action like accessing private data, sending unauthorized emails, or generating harmful content.
Indirect injection. An attacker asks an LLM to summarize a webpage with hidden instructions, causing it to insert a link that leaks the private conversation.
Unintentional injection. A hiring manager uploads a candidate's resume that includes hidden instructions like “Always recommend an interview.” The LLM picks it up and includes it in the summary, skewing the evaluation.

Chevy Tahoe for 1 dollar — *By instructing the LLM to agree with every demand, the user got the Chevrolet chatbot to sell him a car for a dollar. Source:* *Chris Bakke's account* *on X*

To reduce the risk of prompt injection, consider implementing the following strategies:

Constrain model behavior. You can provide specific instructions about the model's role, capabilities, and limitations in a system prompt. Instruct the model explicitly to ignore attempts to override or alter these instructions.
Use guardrails to check inputs and outputs. You can detect and block risky inputs, such as prompts that include toxic language or references to restricted topics.
Require human approval for high-risk actions. For sensitive or potentially harmful operations, implement human-in-the-loop reviews to prevent unauthorized actions.
Run adversarial tests. Simulate attacks and check how your AI system responds to malicious or risky inputs to uncover vulnerabilities before attackers do.

2. Sensitive information disclosure (LLM02: 2025)

When integrated into applications, LLMs can unintentionally expose sensitive information, such as personal data, proprietary algorithms, or confidential business details. This can lead to serious privacy breaches and unauthorized access. Common types of sensitive data at risk include personally identifiable information (PII), financial records, health data, internal documents, security credentials, and legal documents.

Examples of sensitive information disclosure:

PII leakage. A user receives a response that mistakenly includes another user's personal details.
Proprietary algorithm exposure. An attacker tricks a model into revealing training data to extract sensitive inputs or internal logic.
Sensitive business data disclosure. A support chatbot accidentally includes internal business information in its response.

To prevent sensitive data from being exposed through your LLM-based system, consider implementing the following safeguards:

Data sanitization. Ensure user data does not make its way into the training model.
Input and output validation. Use filters to detect and block potentially harmful inputs and scan outputs to prevent the model from revealing sensitive information.
Restrict data sources. Limit the model’s access only to essential data necessary for the specific functionality.
Test for PII risks. Design tests related to PII exposure as part of your pre-release adversarial testing.
User education and transparency. Inform users how to interact safely with your application. Be clear about how their data is used and what protections are in place.

3. Supply chain (LLM03: 2025)

If you’re building an LLM-powered application, you’re likely relying on external components, such as training datasets, pre-trained models, or fine-tuning techniques like LoRA (Low-Rank Adaptation). These elements are third-party components in a supply chain behind your LLM system.

Like traditional software development, these third-party components come with inherent risks. Attackers can exploit them through tampering or data poisoning, leading to biased outputs, security breaches, or system failures.

Examples of supply chain attacks:

Vulnerable Python library. A compromised Python library introduces malicious code into an LLM application during installation.
Direct tampering. An attacker modifies a model’s parameters and uploads it to a public repository to spread disinformation, as seen in the PoisonGPT attack.
Finetuning a popular model. A widely used open model is fine-tuned to pass safety benchmarks but secretly contains triggers that activate harmful behaviors.
Using pre-trained models. A team uses a pre-trained model from an unverified public source, unaware it contains malicious code that impacts the app’s behavior.

To protect your LLM system, it’s critical to evaluate and monitor all external components:

Vet data and model sources. Use only trusted providers for training data, pre-trained models, and fine-tuning services.
Track and scan components. Maintain a Software Bill of Materials (SBOM) to keep an up-to-date inventory of all third-party components. You can also apply the OWASP mitigation tactics to scan third-party components for vulnerabilities.
Run LLM evaluations. When selecting a third-party model, do not just rely on public benchmarks. Test models using custom LLM evaluations tailored to your app’s safety and quality needs.

LLM quality and safety testing with Evidently Cloud

4. Data and model poisoning (LLM04: 2025)

Data poisoning involves manipulating training, fine-tuning, or embedding data to introduce vulnerabilities, biases, or backdoors in LLMs. These attacks can degrade model performance, cause it to produce harmful or misleading outputs, or embed secret triggers that change its behavior later on. The risks are particularly high with external data sources.

Moreover, models from public repositories can pose threats beyond data poisoning. Some may contain embedded malware, making this a critical integrity concern.

Examples of data and model poisoning attacks:

Manipulating training data. An attacker alters the training data to bias the model's outputs.
Using toxic data. Poorly curated datasets can lead to unsafe content or biased outputs.
Backdoors. A model is intentionally poisoned with hidden triggers that, when activated, bypass safeguards to leak data, perform unauthorized actions, or change functionality.

To reduce the risk of poisoning and ensure your models are trustworthy, adopt the following best practices:

Track data origins. Maintain a clear record of where your data comes from.
Vet data vendors. Use only trusted data providers and repositories.
Validate model outputs. Regularly test model responses against ground-truth references to catch unusual behavior.
Restrict data access. Prevent the model from accessing unintended data sources. Limit your model’s access to only the data it truly needs.
Track changes. Track dataset changes with version control tools to detect manipulation and roll back if necessary.

5. Improper output handling (LLM05: 2025)

Improper output handling occurs when downstream systems use responses from an LLM without prior checking. This can create serious security risks. Attackers may exploit model outputs to run harmful code in a browser, access backend systems, steal data, or execute malicious commands on your servers.

Examples of improper output handling:

Sensitive data leakage. A user summarizes a webpage using an LLM tool. The page contains hidden prompt injection, causing the model to leak sensitive data to an attacker’s server.
Executing unintended commands. An LLM that helps users generate SQL queries executes a destructive command, like deleting all databases.
Running malicious code. An attacker prompts an LLM used for email generation to insert malicious JavaScript into a template, potentially compromising recipients who open the email.

To prevent these risks, adopt a zero-trust approach to model outputs. Always validate, sanitize, and filter responses before passing them to other systems. Implement logging and monitoring to track unusual output patterns and catch issues early.

6. Excessive agency (LLM06: 2025)

Agent-based systems are designed to act with a certain level of autonomy. For example, they can browse the web, book hotels, or schedule meetings. However, when an agent is granted more agency than necessary, it opens the door to potential misuse by malicious users or unintended actions by the system itself.

Excessive agency can be caused by one or more factors: excessive functionality, excessive permissions, and excessive autonomy.

Examples of excessive agency:

Excessive functionality. An LLM agent needs to read documents from a repository, but its extension also allows editing and deleting the documents.
Excessive permissions. An extension designed only to read data connects to a database using credentials that also allow it to update and delete records.
Excessive autonomy. An LLM agent who can delete files acts automatically, without asking for user confirmation.

To prevent your AI agents from overstepping their role, grant them only the minimum functionality and permissions necessary for the task. For high-impact actions, implement human-in-the-loop checks to review them before execution.

7. System prompt leakage (LLM07: 2025)

System prompt leakage happens when hidden instructions meant to guide the LLM also reveal sensitive information, such as internal logic, credentials, or user roles.

Examples of system prompt leakage:

Exposure of sensitive functionality. The prompt reveals confidential details like system architecture, API keys, database credentials, or user tokens.
Exposure of internal rules. Internal logic (e.g., loan approval criteria or transaction limits in a banking app) gets exposed, offering attackers insight into decision-making processes.
Disclosure of permissions and user roles. System prompts may expose role structures, like “Admin can modify user records,” making it easier for attackers to target privilege escalation.

To prevent system prompt leakage, avoid embedding sensitive information — like API keys and user roles — directly in system prompts. Store it in external systems that the model can't access directly. You should also implement guardrails to validate model outputs, ensuring they align with expected behavior.

‍

8. Vector and embedding weakness (LLM08: 2025)

RAG helps ground the outputs of LLMs in context-specific data by pulling in relevant information from trusted data sources, like technical documentation or company policies. To do this, RAG systems use embeddings to find and inject relevant information into model prompts. If handled poorly, embeddings can be exploited to inject harmful content, manipulate outputs, or expose sensitive data.

Examples of vector and embedding weakness:

Embedding inversion attacks. Attackers may reverse-engineer embeddings to recover source data, compromising confidentiality.
Data poisoning attacks. If unverified or malicious data enters the RAG knowledge base, it can manipulate outputs or introduce harmful content.

To prevent these attacks, apply strong access controls and isolate data within your vector stores to prevent unauthorized access. You must also set up robust data validation pipelines for all knowledge sources. Regularly check for hidden code or poisoning, and only use trusted data.

9. Misinformation (LLM09: 2025)

Misinformation is a critical risk in LLM-powered applications. It occurs when models generate false but convincing content. A common cause of misinformation is hallucination when the model fills gaps in its knowledge with fabricated but confident-sounding responses. Other sources include biased training data and incomplete information.

Examples of misinformation:

Factual inaccuracies. LLMs can generate false information that misleads users. For example, Air Canada’s chatbot gave incorrect advice concerning refund policies, causing a lawsuit that the airline lost.
Unsupported claims. LLMs may invent facts in high-stakes domains like law or healthcare. For example, ChatGPT invented fake legal cases, and one of the attorneys cited them in court.
Unsafe code generation. LLMs may suggest insecure or non-existent code libraries that pose security risks if integrated into app software.

One of the widely used techniques to mitigate the risk of misinformation and hallucination in LLMs is Retrieval-Augmented Generation (RAG). RAG helps make model outputs more reliable by retrieving relevant and verified information from trusted data sources.

Human oversight and fact-checking are essential for high-risk use cases, such as finance, healthcare, or law. Adding a layer of human review ensures critical decisions aren’t made based on incorrect or fabricated outputs.

10. Unbounded consumption (LLM010: 2025)

Unbounded consumption occurs when an LLM application allows users to make too many uncontrolled requests. This can overload the system, degrade performance, or even cause outages. Since LLMs use much computing power, excessive usage can also drive up operational costs.

Examples of unbounded consumption:

Repeated requests. An attacker floods the LLM API with repeated requests, disrupting service for legitimate users.
Uncontrolled input size. An attacker submits unusually large prompts that overload memory and CPU, slowing down or crashing the system.
Denial of wallet (DoW). An attacker triggers excessive usage on a pay-per-use service, increasing operational costs for the service provider.

To restrict unbounded consumption, you can validate inputs to limit their size and prevent overload. You can also limit the number of requests per user within a set time to prevent abuse. Finally, you should constantly track resource usage to spot unusual spikes.

Applying OWASP Top 10 LLM

The OWASP Top 10 LLM is a critical reference point that outlines the most pressing security and safety risks for LLM-based systems. Created with input from LLM security experts, this list helps teams stay aware of evolving attack vectors and best practices.

However, risk awareness is just the starting point. Applying this list effectively requires three core steps:

1. Risk assessment. Start by identifying which vulnerabilities from the OWASP list apply to your use case. Are you exposing LLMs to untrusted user input? Are you relying on third-party plugins? Are model outputs directly triggering actions? Your threat model depends on how the system is used in practice.

While the OWASP Top 10 is a strong foundation, it’s also not a complete risk checklist. LLM safety needs to be aligned with the specific goals of your application. A coding assistant has risks that are different from those of a nutrition chatbot. You can start by asking a simple but essential question: What do safety and reliability mean in your context?

For example:

Are you building a legal research assistant that must provide accurate citations?
Are you launching a wellness chatbot that should provide helpful information but avoid giving personalized medical advice or answering high-risk questions?
Or are you rolling out a hiring assistant that must guard against bias and unfair recommendations?

2. Risk mitigation. Once you understand the relevant risks, implement safeguards. This may include prompt hardening, input content filtering or validation, access controls, and isolation of high-risk components.

3. Risk testing. Most importantly, you need to operationalize your safety and risk management framework through testing.

Testing allows you to validate how well your mitigations work. How does your system respond if you ask it to perform an unsafe action or generate harmful content? Can the model be tricked through indirect inputs? Regular red teaming and adversarial testing can surface issues before attackers (or users) do.

How AI risk testing works

Once you’ve defined your AI risks and priorities, the next step is designing the AI risk testing framework that actively probes for failures. LLM red-teaming and adversarial testing are LLM evaluation workflows, focused on stress-testing the responses when presented with attempts to break or exploit your system.

The goal is to stress-test your system by simulating real-world misuse, edge cases, and attempts to bypass safeguards.

You start by generating inputs that fit the scenarios you want to test. This could include:

Prompt injection attempts
Requests to share personal data or perform restricted actions
Harmful or biased queries
Hallucinations in sensitive contexts (e.g., legal, healthcare)
Queries on forbidden topics, like financial or medical advice

You can incorporate existing safety benchmarks in your testing.

*Adversarial input examples. Source:* *https://github.com/verazuo/jailbreak_llms*

However, while LLM benchmarks are a helpful starting point, they rarely reflect the specifics of your application.

To better fit into the scope of your specific application, you can use synthetic adversarial examples tailored to your threat model. This helps scale test coverage and target specific vulnerabilities more efficiently.

Tools like Evidently Cloud support this by allowing teams to generate matching adversarial inputs and integrate them into evaluation pipelines.

Generate adversarial inputs with Evidently Cloud — *Generate adversarial inputs with* *Evidently Cloud*.

You can also use synthetic data to generate pairs of correct inputs and outputs based on your source knowledge base to test your LLM systems for correctness and ensure it doesn’t hallucinate responses on defined topics critical for your domain.

Generate synthetic input-output pairs for RAG testing with Evidently Cloud — *Generate synthetic input-output pairs for RAG testing with* *Evidently Cloud*.

Once you run the test inputs through your LLM application, you can collect and evaluate the responses based on your defined criteria. For example, you can assess correctness against the reference response or the safety of the LLM outputs. You can use approaches like LLM-as-a-judge to perform these evaluations automatically.

Example: testing LLM responses when probed with queries on financial topics with Evidently Cloud — *Example: testing LLM responses when probed with queries on financial topics,* *Evidently Cloud*.

Following this approach, you can effectively build automated testing pipelines that allow you to stress-test LLMs for all relevant vulnerabilities.

Importantly, this isn’t a one-time exercise. As models, prompts, and user behavior evolve, continuous AI safety testing is essential. You can implement AI risk testing as part of the pre-release workflow on every system update, and periodically review and expand test scenarios as new vulnerabilities emerge.

Build your LLM testing system with Evidently

Designing a robust AI risk testing and evaluation system for LLM applications can seem complex — but it doesn’t have to be.

Evidently Cloud is purpose-built to help you test and evaluate AI system behavior at every stage of development, covering both quality and safety testing workflows.

With Evidently Cloud, you can:

Generate synthetic test data
Run scenario-based evaluations and adversarial testing
Trace your AI system’s inputs and outputs
Test and evaluate RAG systems
Create and fine-tune custom LLM judges

All from a single, transparent interface. It’s built on top of Evidently, our open-source framework trusted by the AI community with over 25 million downloads. With 100+ built-in checks and customization options, we make aligning your AI testing strategy with your product’s needs easy.