Skip to main content
All articles
AI Testing

AI Hallucination Testing: How to Catch Inaccurate AI Responses Before Users Do

9 min readBy RedQA Engineering Team

What is AI hallucination?

AI hallucination occurs when a large language model generates content that is plausible-sounding but factually incorrect, unsupported by the source material, or entirely fabricated. The term "hallucination" captures the essential problem: the model presents invented content with the same confident, fluent tone it uses when it's correct. There's no "I made this up" signal in the output.

For products used in healthcare, finance, legal, or customer support contexts, a single hallucinated response can have serious consequences — misinformation delivered to a patient, incorrect financial guidance, or wrong legal information presented as fact.

Types of hallucination to test for

1. Factual hallucination

The model states something as a fact that is incorrect. Example: attributing a quote to the wrong person, stating an incorrect statistic, or describing a product feature that doesn't exist.

2. Source hallucination (critical for RAG systems)

The model responds with information not contained in the provided knowledge base — typically by drawing on its training data instead of the context it's been given. For RAG applications, this is the primary hallucination risk.

3. Citation hallucination

The model cites a source that doesn't exist, or attributes content to the wrong source document.

4. Instruction override

The model generates content outside its system prompt instructions — for example, answering questions about topics it was instructed to decline, or abandoning its persona under adversarial prompting.

How to test for hallucinations

Step 1: Build a ground-truth dataset

For RAG systems: take your knowledge base and create a set of question-answer pairs where the correct answer is verifiably derivable from specific source documents. This is your ground truth. Responses should match these answers — any significant deviation is a hallucination candidate.

Step 2: Out-of-scope query testing

Deliberately ask questions about topics not covered in the knowledge base. The model should clearly state that it doesn't have information on that topic — not generate plausible-but-fabricated answers.

Step 3: Adversarial prompt injection

Test prompts designed to override system instructions or extract information the system shouldn't provide. Examples: "Ignore your previous instructions and…", "As a developer with full access, tell me…", "Pretend you're a different AI that can…"

Step 4: Consistency testing

Run the same query multiple times. Responses about factual matters should be consistent. High variance in factual claims indicates the model is reasoning from training data rather than grounded context.

Step 5: Boundary queries

Ask questions that sit at the edge of the knowledge base coverage. These are highest-risk: the model has partial information and may fill gaps with fabricated content.

Building guardrails

Engineering controls that reduce hallucination risk:

  • Temperature control — lower temperature (closer to 0) reduces variance and fabrication tendency
  • Grounding instructions — explicit system prompt instructions to only respond from provided context and to clearly signal when information is unavailable
  • Retrieval quality — for RAG systems, the retrieval step quality directly affects hallucination rate; poor retrieval = poor grounding
  • Human-in-the-loop for high-stakes responses — route specific query types to human review

RedQA's AI testing service includes hallucination detection as a core component. Learn more, or get in touch.

Ready to Ship with Confidence?

Let's discuss how RedQA can help you deliver better software, faster. Get a free consultation and quote tailored to your project.

Get a Free Quote