MLflow Experiment Tracking
Track ML experiments, models, and prompts with MLflow. Wires up autologging for sklearn/PyTorch/transformers, GenAI evaluation runs, model registry promotion flows, and Postgres+S3 backends for team-shared servers.
MLflow now covers classic ML and GenAI evals in one tool. This skill sets up tracking servers with Postgres+S3, enables autologging, builds GenAI evaluation harnesses, and configures the model registry with Staging→Production promotion gates and CI/CD integration.
mlflow experiment-tracking model-registry mlops genai-eval
When to use
Use when results stop fitting in a notebook — comparing tuning runs, promoting models from staging to production, or running LLM evals you want to diff over time.
Examples
Team-shared tracking server
Stand up a Postgres+S3 backed MLflow
Provision an MLflow tracking server with Postgres metadata and S3 artifacts, with auth via a reverse proxy
GenAI evaluation run
Compare prompt variants with mlflow.evaluate
Run mlflow.evaluate on three prompt variants over our 200-question QA set with custom faithfulness and answer-relevance metrics