
contents‍
Artificial intelligence has transitioned from experimentation to production at high speed. In every industry, teams are releasing AI-powered features that handle customer conversations, generate content, assist with decisions, or automate workflows.Â
This wave of adoption has created enormous opportunities, but it has also introduced a new category of operational, regulatory, brand, and safety risks. Some of these risks resemble familiar software bugs; others are entirely new behaviors that emerge from the probabilistic, generative nature of large language models (LLMs).
In this blog, we will:
Before we dive into specific pitfalls, let’s pause a bit to understand why AI products fail differently than deterministic software.Â
Traditional software follows a strict logic: given the same inputs, it reliably produces the same outputs. LLMs, however, operate on probability distributions and learned patterns. They generate outputs that are not programmed, but inferred based on training data, model architecture, and user interaction.
This probabilistic nature introduces a lot of ambiguity. A model might respond flawlessly to a particular instruction 9 times out of 10, yet misbehave the tenth time.Â
To make matters more complicated, LLMs do not “understand” intent the way humans do. They simply predict the next likely token. That means a slight rephrasing of a question, variations in user tone, or even punctuation can result in significantly different outputs.
Finally, we typically don't know the specific training data that went into the model. We interact with LLM as a black box, observing its capabilities empirically through the outputs we get.
The bottom line is: AI products require a distinct approach to risk management and quality assurance.Â
A lot of focus in LLM evaluation goes towards assessing the LLM system capabilities: how well it tackles the task at hand. However, there is a second aspect to quality: evaluating all the things that can go wrong, especially if someone deliberately tries to break or tamper with the AI system.
 Let's walk through the most common AI risks below.
Hallucinations are among the most challenging risks in AI systems. They occur when a model produces statements that sound coherent, detailed, and authoritative – but are factually incorrect or entirely fabricated.
AI hallucinations stem from how generative models work – they aim to produce plausible text rather than validated truth. If the model was trained on outdated or inconsistent data, pressured to answer, or lacks grounding in reality, it can “fill in the gaps” with something plausible but untrue.Â
AI hallucinations may come as incorrect information in customer-facing answers, invented policies or pricing details, false references or citations, misleading instructions, or even non-existent people.
Here are some real-world examples:
Want more examples of real-world AI hallucinations? We put them in a blog.Â
While hallucinations are harmless when the content is entertainment-oriented (e.g., generating a fictional story), in a product setting – where users expect correctness – they quickly become a liability. Incorrect answers in customer support can mislead users or escalate issues; fabricated financial or legal information can create regulatory exposure; confident but wrong recommendations erode trust in the product.
While stopping hallucinations entirely is impossible due to the nature of language models, there are several steps you can take to make AI hallucinations less likely:

Prompt injection is one of the most critical vulnerabilities of LLM applications. It occurs when a user tries to override your AI system’s prompts by adding their own instructions.Â
Sometimes it happens accidentally, but often maliciously.
It can be as straightforward as typing “ignore previous instructions …” or as sophisticated as embedding hidden instructions inside code blocks, images, or website text.
If the prompt injection attempt is successful, a malicious user can trick the AI system into doing things it should not.Â
Examples:

Want to know more about prompt injection? Explore our guide, which includes examples of prompt injection attacks and tips to protect your AI application.Â
To reduce the risk of prompt injection, consider these strategies:Â
Jailbreaking is related to prompt injection but distinct in intent. Jailbreaks are attacks explicitly designed to bypass LLM’s safety protocols. Their goal is to make a model produce outputs it normally wouldn’t – e.g., providing dangerous instructions, generating hate speech, or delivering violent content.

Jailbreak techniques are constantly evolving and often involve creative tactics, such as role-playing, emotional manipulation, or asking to operate in hypothetical scenarios.
Examples:
Jailbreak defense is an arms race. Here’s how you can reduce the risk:
Hijacking occurs when a user directs a model to perform tasks outside the product’s intended scope. This includes tricking the model into offering unqualified advice or leading it into unintended domains.Â
Examples:

Unlike explicit prompt injection attacks, hijacking requests are often more benign but still problematic – manipulated answers may violate company policy or regulations, and out-of-scope requests add up to your token costs.Â
Hijacking is often subtle, so you need to adjust your evaluation process:Â
LLMs can generate harmful content even without malicious user intent – bias, toxicity, or culturally inappropriate phrasing may surface in ordinary dialogue.
This risk is especially acute in customer-facing applications, where trust, inclusivity, and regulatory expectations are high.
Examples:
While modern LLMs are typically tested for these kinds of behavior in advance by the model providers, it may still occur in real life, especially if provoked by a malicious user.
Here’s how you can reduce the risk of your AI app generating harmful content:
AI systems can accidentally disclose confidential information, including hidden prompts, financial data, internal documents, security credentials, or personally identifiable information (PII).Â
Examples:
There are different degrees of risk involved. For example, the system prompts often contain policy rules, brand guidelines, or internal constraints. If they are exposed, malicious users can study them to design more effective attacks.
A different example would be revealing private information to a user who was not meant to have access to it. For example, Microsoft Copilot had a security issue at one point that allowed regular employees to access the emails of other users.Â
To reduce the risk of data leakage, you can follow these strategies:
AI systems often speak on behalf of your company. If the tone is off, the message is inconsistent, or the content inadvertently conveys a critical tone, users can interpret it as your brand’s position.
A brand’s voice reflects its values, personality, and relationship with customers. LLMs, left unregulated, can mimic any tone and way of speaking, including ones that break brand guidelines.
Examples:

Here’s how you can keep your AI system from going off the script:

Even well-intentioned models may drift into offering sensitive advice – such as financial, medical, legal, or other regulated advice. While your product may not be intended for use in such scenarios, the underlying LLM was trained on a wealth of diverse data, which enables the LLM to engage on many topics.
This can be very dangerous, as users often assume that AI-generated answers are trustworthy. Not to mention that unauthorized advice can violate government regulations.Â
As a result, most regulated industries explicitly forbid unqualified recommendations without disclaimers, verification, or human oversight. However, your LLM may still generate something that resembles it.
Examples:

To reduce the risk of giving unauthorized advice, you can:
LLMs love to please. If the user asks, they may invent promotions, guarantees, features, or commitments that your company never intended to make. This is one of the easiest pitfalls to miss – and one of the most expensive.
These types of risks are particularly relevant for user-facing systems, such as customer support chatbots.
Users (and courts) assume that commitments made by your AI system are binding, and when these promises cannot be fulfilled, it damages trust and your company’s reputation.
Examples:
The model can generate such statements if a user asks a specific question, and the system lacks grounding in an up-to-date context.
Here’s how you can reduce the risk:
Every AI product has quirks and corner cases that do not appear in general-purpose AI risk frameworks. It may include domain-specific jargon, rare workflows, unexpected input formats, special user groups, switching languages, and many more. Edge cases are where the majority of real-world failures hide.
Examples:
Here’s how you can approach the task of reducing vulnerabilities unique to your use case:
Testing for common AI risks we have just covered may sound like a lot of work – it is indeed no small feat. For many products, this is up to the creator of the system to perform this risk assessment and decide which of these issues are worth testing for. You can have minimal testing for specific observed risks or develop a complete red-teaming process.Â
However, in many industries, AI risk testing has become a necessary part of AI governance. There are also developing AI risk assessment frameworks that help you avoid missing critical AI risks and make your testing approach more structured. Let’s briefly explore some of the most popular ones. Â
The OWASP Top 10 LLM highlights the most critical safety and security risks unique to AI-powered systems. It includes issues such as prompt injection, training data poisoning, insecure output handling, unintended memorization, and supply-chain vulnerabilities.Â
If you want to learn more about OWASP for LLMs, we have a separate blog about it.Â
For engineering teams, OWASP provides a tactical playbook for preventing exploits, securing API interactions, and designing more secure system architectures.
The National Institute of Standards and Technology (NIST) takes a broader governance approach. It helps organizations build policies, workflows, and controls that ensure AI systems behave reliably, fairly, and transparently.Â
The NIST framework is useful for facilitating cross-functional collaboration among product, engineering, legal, compliance, and leadership teams. It emphasizes lifecycle risk management – not a one-time evaluation, but continuous oversight.
MITRE’s ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) provides a catalog of adversarial attack techniques targeting machine-learning systems. ATLAS is particularly valuable for red teams and security teams who need to understand how attackers think.
Once you’ve defined your AI risks and priorities, the next step is to design a testing workflow that actively probes for failures. The goal is to stress-test your system by simulating real-world misuse, edge cases, and attempts to bypass safeguards.Â

Here’s how you can approach the task:
1. Generate inputs aligned with your risk scenarios, such as:
2. Use synthetic data to target your specific scenario and scale coverage.Â
You can use tools like Evidently Cloud, which allows you to generate tailored adversarial inputs and integrate them into evaluation pipelines.Â

You can also generate synthetic input–output pairs from your own knowledge base to test correctness and prevent hallucinations on domain-critical topics.

3. Run the test inputs through your application. Once your tests are ready, you run them through your system to collect the outputs – such as how exactly your chatbot responds when you try to trick it into doing something it shouldn't.
4. Evaluate the outputs.Â
Evaluate your system’s responses against predefined criteria. For example, you can assess if the response to a provocative prompt is safe, or (in case of RAG) whether they are correct and faithful to the context.Â
Pro tip: You can use techniques like LLM-as-a-judge to automate this step.
Following this approach will allow you to build automated pipelines for continuous AI testing. You can update the scenarios and increase the coverage as you discover new vulnerabilities and failure modes.
To run AI risk testing for LLM apps as a systematic process, you need the right tools.Â
We built Evidently Cloud to help you test and evaluate AI system behavior at every stage of development, covering both quality and safety testing workflows.
With Evidently Cloud, you can:
All from a single, user-friendly interface.Â
The platform is built on top of Evidently, our open-source framework trusted by the AI community with over 30 million downloads. With over 100 built-in checks and customization options, we make it easy to align your AI testing strategy with your product’s needs.
Ready to test your AI app? Request a demo to see Evidently Cloud in action. We’ll help you to:
Let’s make sure your AI systems work as intended — safely and reliably at scale.
