Ollama

Ollama is the standard tool for running open-source language models on your own hardware. One command installs it on macOS, Windows, or Linux. A second command pulls a model and starts a chat session. After that, it runs a local API server at localhost:11434 that behaves exactly like the OpenAI API, so any application already built against OpenAI’s SDK can point to a local model by changing one environment variable.

The model library covers most of what you would reach for: Llama 3, Mistral, Gemma, Phi, Qwen, DeepSeek, and dozens of specialist models for code, vision, and embeddings. Quantized variants let you run larger models on modest hardware by trading a small amount of accuracy for dramatically lower memory use. A Modelfile format lets you bake in system prompts, adjust parameters, and create named model variants you can share or version-control.

GPU acceleration works across Apple Silicon (Metal), NVIDIA (CUDA), and AMD (ROCm). On a MacBook with an M-series chip, inference on a 7B model is fast enough to feel conversational. Without a GPU, CPU inference works but slows down considerably on anything larger than a 3B model.

Ollama does not ship a chat UI. For a browser interface, pair it with Open WebUI, which is built specifically for Ollama and discovers your installed models automatically. For document Q&A, AnythingLLM connects to Ollama directly.

A cloud option exists for when local hardware is not enough, but the local self-hosted path is what 172k GitHub stars are voting for.

Ollama: Pros & Cons

Pros (The Wins)	Cons (The Friction)
One-command install: macOS, Windows, Linux; models pull and run immediately.	No chat UI: API server only; pair with Open WebUI for a browser interface.
OpenAI-compatible API: Swap one env var; any OpenAI SDK app just works.	RAM requirements: 7B needs ~8GB, 13B needs 16GB+; large models punish weak hardware.
Multi-platform GPU: Apple Silicon, NVIDIA, AMD acceleration all supported.	Quantization trade-offs: Smaller models run faster but lose some response accuracy.
172k stars, MIT: De facto standard for local model inference.	Windows rough edges: Newer platform support; occasional issues vs macOS and Linux.

Quick Start

Overview

Ollama: Pros & Cons

Use Cases

Deployment Strategy