Hands-on VPS & self-hosting Monday, June 1, 2026
VPS.app
Hands-on VPS benchmarks and self-hosting guides — tested, not theorized.
Self-Hosting Guides

Self-Host Ollama + Open WebUI on a VPS (Private ChatGPT, 2026)

Some links below are affiliate links: if you buy through them I may earn a commission at no extra cost to you. I only recommend what I have actually tested, and it never changes my verdict.

Local AI model running on a VPS

This stack gives you a private ChatGPT-style app: Ollama serves open LLMs locally, and Open WebUI gives you a polished chat interface. Nothing leaves your server.

Prerequisites

  • A VPS — CPU-only is fine for small models; a GPU VPS for 7B+ models at speed (options & sizing).
  • Docker + Docker Compose.
  • Optional: a domain + reverse proxy if you want to reach it over HTTPS.

Step 1 — Compose file

services:
  ollama:
    image: ollama/ollama:latest
    restart: always
    volumes:
      - ollama:/root/.ollama

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    restart: always
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - ollama

volumes:
  ollama:
  open-webui:

Step 2 — Start it and pull a model

docker compose up -d
docker exec -it ollama ollama pull llama3.2   # small, runs on CPU

Open http://your-server-ip:3000, create the first account (it becomes admin), pick the model, and chat.

Step 3 — Optional HTTPS

To reach it on a domain over HTTPS, front it with Nginx Proxy Manager (guide) mapping ai.yourdomain.com → Open WebUI on port 3000.

Picking the right VPS

This is the one workload where hardware decides everything:

  • Small models (1–3B): any 4–8 GB CPU VPS — a Hetzner box is great value.
  • 7B+ at usable speed: a GPU VPS Vultr offers GPU instances (hourly is handy for testing).

Full sizing and picks: Best VPS for Self-Hosting.

Frequently asked questions

Can I run a local LLM on a normal (CPU) VPS?

Yes for small models (1–3B parameters) — they run on CPU, just slowly. For 7B+ models at usable speed you want a GPU VPS. Pick the model to match the box.

What VPS do I need for Ollama?

CPU-only: 4–8 GB RAM for small models. For larger models at good speed, rent a GPU VPS (hourly works well for testing). See the linked VPS guide.

Is this a real ChatGPT replacement?

For private, offline, no-API-cost chat with open models — yes. Quality depends on the model you pull; the largest open models need real GPU hardware to feel fast.