Skills / Community / LangFuse LLM Observability

LangFuse LLM Observability

Monitor and debug LLM applications with LangFuse. Instruments traces, tracks token usage and costs, creates evaluation datasets, runs prompt experiments, and builds quality dashboards for production AI systems.

This skill integrates LangFuse observability into LLM applications. It instruments traces with spans for each LLM call, retrieval, and tool use. Tracks token usage, latency, and cost per request. Creates evaluation datasets for regression testing, sets up prompt management with versioning, runs A/B experiments on prompt variants, and builds dashboards for monitoring production AI quality and cost.

langfuse llm observability tracing ai-ops

When to use

Use when adding observability to LLM apps, tracking AI costs, debugging retrieval quality, running prompt experiments, or building evaluation pipelines for production AI systems.

Examples

Trace instrumentation

Add tracing to a RAG application

Instrument my LangChain RAG pipeline with LangFuse traces: track each retrieval, LLM call, and tool use with metadata for cost and latency analysis

Eval pipeline

Build an evaluation dataset and scoring pipeline

Create a LangFuse evaluation pipeline: define a dataset of 50 test cases, run them against two prompt variants, score with LLM-as-judge, and compare results
Added to wishlist