LangFuse LLM Observability
Monitor and debug LLM applications with LangFuse. Instruments traces, tracks token usage and costs, creates evaluation datasets, runs prompt experiments, and builds quality dashboards for production AI systems.
This skill integrates LangFuse observability into LLM applications. It instruments traces with spans for each LLM call, retrieval, and tool use. Tracks token usage, latency, and cost per request. Creates evaluation datasets for regression testing, sets up prompt management with versioning, runs A/B experiments on prompt variants, and builds dashboards for monitoring production AI quality and cost.
When to use
Use when adding observability to LLM apps, tracking AI costs, debugging retrieval quality, running prompt experiments, or building evaluation pipelines for production AI systems.
Examples
Trace instrumentation
Add tracing to a RAG application
Instrument my LangChain RAG pipeline with LangFuse traces: track each retrieval, LLM call, and tool use with metadata for cost and latency analysis
Eval pipeline
Build an evaluation dataset and scoring pipeline
Create a LangFuse evaluation pipeline: define a dataset of 50 test cases, run them against two prompt variants, score with LLM-as-judge, and compare results