Skills / Community / PromptFoo LLM Evals

PromptFoo LLM Evals

Test and evaluate LLM outputs with PromptFoo. Generates eval configs, custom assertion functions, red-team security tests, model comparison matrices, and CI/CD integration for prompt regression testing.

This skill helps you build robust evaluation pipelines for LLM applications with PromptFoo. It generates YAML eval configs with test cases, implements custom assertion functions for domain-specific quality checks, runs red-team security evaluations (jailbreaks, prompt injection, PII leakage), creates model comparison reports, and integrates evals into CI/CD pipelines for continuous prompt quality assurance.

promptfoo evals llm-testing red-team ai-quality

When to use

Use when testing LLM prompts, running red-team security evaluations, comparing model outputs, building CI/CD eval pipelines, or implementing custom quality assertions.

Examples

Prompt regression tests

Create eval suite for a customer support chatbot

Build a PromptFoo eval config for a customer support chatbot with 30 test cases covering refund requests, product questions, and escalation scenarios — with accuracy and tone assertions

Red team eval

Run security evaluation against prompt injection

Configure PromptFoo red-team evaluation to test my RAG chatbot for prompt injection, jailbreak attempts, PII leakage, and hallucination — with a CI pipeline that blocks deploys on failures
Added to wishlist