Ollama Local LLM Runtime
Run open-weight models locally with Ollama. Generates Modelfile configs, GPU/CPU tuning, OpenAI-compatible API wiring, structured outputs with JSON mode, and per-model memory budgets for laptop dev.
Ollama gives you a single binary to serve Llama, Mistral, Qwen, and other open models locally. This skill writes Modelfiles for prompt and parameter customization, picks the right quantization for your VRAM, and wires Ollama into clients via its OpenAI-compatible endpoint.
When to use
Use for offline dev, privacy-sensitive prototypes, CI eval runs without API keys, or self-hosted on-prem deployments where you cannot send data to a managed LLM provider.
Examples
Drop-in OpenAI client swap
Point your existing OpenAI SDK at a local Ollama server
Configure my Node app to use Ollama's OpenAI-compatible endpoint with Llama 3 and JSON-mode structured outputs
Custom Modelfile
Bake a system prompt and parameters into a named model
Write an Ollama Modelfile for a coding assistant on top of qwen2.5-coder:14b with low temperature and a strict system prompt