PromptFoo LLM Evals
Test and evaluate LLM outputs with PromptFoo. Generates eval configs, custom assertion functions, red-team security tests, model comparison matrices, and CI/CD integration for prompt regression testing.
This skill helps you build robust evaluation pipelines for LLM applications with PromptFoo. It generates YAML eval configs with test cases, implements custom assertion functions for domain-specific quality checks, runs red-team security evaluations (jailbreaks, prompt injection, PII leakage), creates model comparison reports, and integrates evals into CI/CD pipelines for continuous prompt quality assurance.
When to use
Use when testing LLM prompts, running red-team security evaluations, comparing model outputs, building CI/CD eval pipelines, or implementing custom quality assertions.
Examples
Prompt regression tests
Create eval suite for a customer support chatbot
Build a PromptFoo eval config for a customer support chatbot with 30 test cases covering refund requests, product questions, and escalation scenarios — with accuracy and tone assertions
Red team eval
Run security evaluation against prompt injection
Configure PromptFoo red-team evaluation to test my RAG chatbot for prompt injection, jailbreak attempts, PII leakage, and hallucination — with a CI pipeline that blocks deploys on failures