⚙️ Engineering 📊 Data Awaiting Security Review

MLflow Experiment Tracking

Track ML experiments, models, and prompts with MLflow. Wires up autologging for sklearn/PyTorch/transformers, GenAI evaluation runs, model registry promotion flows, and Postgres+S3 backends for team-shared servers.

MLflow now covers classic ML and GenAI evals in one tool. This skill sets up tracking servers with Postgres+S3, enables autologging, builds GenAI evaluation harnesses, and configures the model registry with Staging→Production promotion gates and CI/CD integration.

mlflow experiment-tracking model-registry mlops genai-eval

When to use

Use when results stop fitting in a notebook — comparing tuning runs, promoting models from staging to production, or running LLM evals you want to diff over time.

Examples

Team-shared tracking server

Stand up a Postgres+S3 backed MLflow

Provision an MLflow tracking server with Postgres metadata and S3 artifacts, with auth via a reverse proxy

GenAI evaluation run

Compare prompt variants with mlflow.evaluate

Run mlflow.evaluate on three prompt variants over our 200-question QA set with custom faithfulness and answer-relevance metrics