This stack gives you a private ChatGPT-style app: Ollama serves open LLMs locally, and Open WebUI gives you a polished chat interface. Nothing leaves your server.
Prerequisites
- A VPS — CPU-only is fine for small models; a GPU VPS for 7B+ models at speed (options & sizing).
- Docker + Docker Compose.
- Optional: a domain + reverse proxy if you want to reach it over HTTPS.
Step 1 — Compose file
services:
ollama:
image: ollama/ollama:latest
restart: always
volumes:
- ollama:/root/.ollama
open-webui:
image: ghcr.io/open-webui/open-webui:main
restart: always
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- open-webui:/app/backend/data
depends_on:
- ollama
volumes:
ollama:
open-webui:
Step 2 — Start it and pull a model
docker compose up -d
docker exec -it ollama ollama pull llama3.2 # small, runs on CPU
Open http://your-server-ip:3000, create the first account (it becomes admin), pick the model, and chat.
Step 3 — Optional HTTPS
To reach it on a domain over HTTPS, front it with Nginx Proxy Manager (guide) mapping ai.yourdomain.com → Open WebUI on port 3000.
Picking the right VPS
This is the one workload where hardware decides everything:
- Small models (1–3B): any 4–8 GB CPU VPS — a Hetzner box is great value.
- 7B+ at usable speed: a GPU VPS — Vultr offers GPU instances (hourly is handy for testing).
Full sizing and picks: Best VPS for Self-Hosting.