Can I run a local LLM on a normal CPU VPS?

Yes, for small models in the 1–3B parameter range. They run on CPU without a GPU, but inference is slow. For 7B+ models at a usable conversation speed, a GPU instance is the practical choice. Match the model size to what your hardware can actually handle.

What VPS specs do I need for Ollama?

For small models (1–3B) a CPU-only VPS with 4–8 GB of RAM is enough to get started. For larger models at reasonable speed you need a GPU VPS. Ollama loads the model into RAM (or VRAM), so the hard limit is always memory.

Is Open WebUI a real ChatGPT replacement?

For private, offline, no-per-token-cost chat it fills the same role. Response quality depends entirely on the model you pull. The largest open models can feel genuinely capable, but they require real GPU hardware to respond at a conversational pace.

How do I expose Open WebUI on a custom domain with HTTPS?

Run a reverse proxy such as Nginx Proxy Manager on the same server and point your domain's A record at the VPS. The proxy handles the TLS certificate and forwards traffic to Open WebUI on port 3000. The Nginx Proxy Manager setup guide linked in this post walks through it step by step.

Can Ollama use a GPU on a VPS?

Yes, if the VPS has a compatible NVIDIA GPU, the NVIDIA Container Toolkit is installed, and you pass the GPU into the container. Without those three things Ollama falls back to CPU. A standard CPU VPS will never use a GPU no matter how the Compose file is written.

Self-Host Ollama + Open WebUI on a VPS (Private ChatGPT, 2026)

Some links below are affiliate links: if you buy through them I may earn a commission at no extra cost to you. I only recommend what I have actually tested, and it never changes my verdict.

Why self-host an LLM?

Three reasons push people toward self-hosting instead of paying for API access:

Privacy. Every prompt you send to a hosted API leaves your machine. With Ollama running on your own VPS the model is local to that server — nothing reaches a third-party endpoint. That matters for anything sensitive: internal documents, code you can’t share, personal notes.

No per-token cost. API pricing adds up fast once you move beyond casual use. A self-hosted setup costs a fixed amount per month (your VPS bill) regardless of how many tokens you generate. Heavy users often break even quickly.

Model choice. You pick what to run. Llama, Mistral, Phi, Gemma, Qwen — Ollama supports a long and growing list. You can swap models in seconds without touching a dashboard or waiting for a provider to add support.

The honest trade-off is this: hardware decides everything. A cheap CPU VPS will run small models slowly. Big models — the ones that rival the hosted frontier — need real GPU hardware. There is no software trick that changes that relationship.

CPU vs GPU — the key decision

This is the most important thing to understand before you provision anything.

Small models (roughly 1–3B parameters) run on CPU. Ollama will load them, infer, and return responses without a GPU. The catch is speed: CPU inference is noticeably slower than GPU inference. For a conversational back-and-forth it is usable, not fast. If you are fine waiting a few seconds per response, a CPU VPS is a perfectly reasonable starting point.

Larger models (7B and above) technically run on CPU too, but the wait time per response can stretch to a minute or more depending on the box. At that point the experience breaks down. For 7B+ models at a pace that feels like a conversation, a GPU VPS is the practical requirement. GPU memory (VRAM) is the binding constraint — the model weights need to fit.

If you are unsure which category you are in, start with a small model on a CPU VPS. You can always move to a GPU instance later. The Compose file in this guide works either way.

See Best VPS for Local LLM for specific hardware recommendations by model size.

Prerequisites

Docker and Docker Compose installed on the VPS. If you need a starting point for picking a server, Best VPS for Self-Hosting has a sized comparison.
A VPS. CPU-only is fine for small models; a GPU instance for anything larger. Sizing details in Best VPS for Local LLM.
Optional: a domain pointed at the VPS if you want HTTPS access. How to point a domain to a VPS covers the DNS side.

Step 1 — Write the Compose file

Create a directory for the project, then add this compose.yaml:

services:
  ollama:
    image: ollama/ollama:latest
    restart: unless-stopped
    volumes:
      - ollama:/root/.ollama

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    restart: unless-stopped
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - ollama

volumes:
  ollama:
  open-webui:

A few things worth noting. The OLLAMA_BASE_URL variable tells Open WebUI where to reach Ollama — it uses the Docker Compose service name (ollama) as the hostname, which works because both containers are on the same default network. The named volumes (ollama and open-webui) persist your pulled models and WebUI data across container restarts, so you do not have to re-pull models after an update.

Step 2 — Start the stack and pull a model

Start both containers in the background:

docker compose up -d

Then pull a model into Ollama. llama3.2 is a good first choice — it is small enough to run on a CPU VPS and capable enough to be genuinely useful:

docker exec -it ollama ollama pull llama3.2

The pull downloads model weights to the named volume. It only happens once; after that the model is available immediately on restart.

Open http://your-server-ip:3000 in a browser. The first account you create becomes the admin. Select your model from the dropdown and start chatting.

Step 3 — Optional HTTPS on a custom domain

Running on a raw IP and port is fine for testing. For regular use, put it behind a reverse proxy with a TLS certificate.

Nginx Proxy Manager is the easiest option on a self-hosted server — it handles certificate renewal through Let’s Encrypt automatically. See the Nginx Proxy Manager setup guide for the full walkthrough. The short version: proxy host pointing to localhost:3000, enable SSL, done.

Troubleshooting

Model too big — container OOM or Ollama crashes. Ollama loads the full model into RAM (or VRAM). If the model is larger than available memory, the process will be killed by the OS. Fix: pull a smaller model, or move to a VPS with more RAM. There is no configuration option that makes a model fit in less memory than it actually needs.

Replies are very slow on a CPU VPS. This is expected for larger models. CPU inference is inherently slower than GPU inference. The solution is either to use a smaller model or to move to a GPU instance. Reducing context length can help at the margins but will not close a large speed gap.

Open WebUI shows “Ollama not reachable” or similar. The most common cause is a misconfigured OLLAMA_BASE_URL. Check that it reads http://ollama:11434 in the Compose file, not localhost (which would refer to the WebUI container itself, not the Ollama container). Running docker compose ps confirms both containers are up.

Ollama is not using the GPU. A standard CPU VPS will never use a GPU — there is nothing to use. On a GPU VPS, Ollama needs the NVIDIA Container Toolkit installed on the host and the deploy section in the Compose file passing GPU resources into the container. Without both, it falls back to CPU silently.

Which VPS to run it on

For small models on CPU, the best value is usually a straightforward Linux VPS with a few gigabytes of RAM:

Hetzner — excellent price-to-RAM ratio for CPU instances, good European data centres.

For GPU inference (7B+ models at real conversation speed):

Vultr — GPU instances available hourly, useful for testing before committing to a monthly plan.

Full comparison including specific sizing by model: Best VPS for Local LLM and Cheapest GPU VPS for AI.

Why self-host an LLM?

CPU vs GPU — the key decision

Prerequisites

Step 1 — Write the Compose file

Step 2 — Start the stack and pull a model

Step 3 — Optional HTTPS on a custom domain

Troubleshooting

Which VPS to run it on

Frequently asked questions

Related

Nginx Proxy Manager Setup: Free HTTPS for Self-Hosted Apps (2026)

Self-Host Home Assistant on a VPS (2026): Remote Access Guide

How to Self-Host Immich on a VPS (Google Photos Alternative, 2026)