Ollama Docker¶
Deploy Ollama in Docker with persistent model storage and GPU acceleration.
Official Images¶
| Image | GPU | Notes |
|---|---|---|
ollama/ollama | NVIDIA (default) | Most common |
ollama/ollama:rocm | AMD | ROCm support |
ollama/ollama:latest | NVIDIA | Same as default |
Quick Start¶
NVIDIA GPU¶
docker run -d \
--gpus all \
-v /tank/ai/models/ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama
# Pull a model
docker exec ollama ollama pull llama3.3:70b
# Run model
docker exec -it ollama ollama run llama3.3:70b
AMD GPU (ROCm)¶
docker run -d \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
--group-add render \
-v /tank/ai/models/ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama:rocm
CPU Only¶
docker run -d \
-v /tank/ai/models/ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama
Docker Compose¶
Basic Setup (NVIDIA)¶
# docker-compose.yml
version: '3.8'
services:
ollama:
image: ollama/ollama
container_name: ollama
volumes:
- /tank/ai/models/ollama:/root/.ollama
ports:
- "11434:11434"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stopped
AMD ROCm Setup¶
version: '3.8'
services:
ollama:
image: ollama/ollama:rocm
container_name: ollama
volumes:
- /tank/ai/models/ollama:/root/.ollama
ports:
- "11434:11434"
devices:
- /dev/kfd
- /dev/dri
group_add:
- video
- render
restart: unless-stopped
Production Configuration¶
version: '3.8'
services:
ollama:
image: ollama/ollama
container_name: ollama
volumes:
- /tank/ai/models/ollama:/root/.ollama
ports:
- "127.0.0.1:11434:11434" # Local only
environment:
- OLLAMA_HOST=0.0.0.0
- OLLAMA_NUM_PARALLEL=2
- OLLAMA_MAX_LOADED_MODELS=2
- OLLAMA_KEEP_ALIVE=30m
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
limits:
memory: 100G
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/"]
interval: 30s
timeout: 10s
retries: 3
restart: unless-stopped
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
Environment Variables¶
Configure Ollama behavior via environment:
| Variable | Description | Default |
|---|---|---|
OLLAMA_HOST | Listen address:port | 127.0.0.1:11434 |
OLLAMA_MODELS | Model storage path | /root/.ollama |
OLLAMA_NUM_PARALLEL | Concurrent requests | 1 |
OLLAMA_MAX_LOADED_MODELS | Models in memory | 1 |
OLLAMA_KEEP_ALIVE | Model unload timeout | 5m |
OLLAMA_DEBUG | Debug logging | false |
environment:
- OLLAMA_HOST=0.0.0.0:11434
- OLLAMA_NUM_PARALLEL=4
- OLLAMA_MAX_LOADED_MODELS=2
- OLLAMA_KEEP_ALIVE=1h
Model Management¶
Pull Models¶
# From host
docker exec ollama ollama pull llama3.3:70b-instruct-q4_K_M
docker exec ollama ollama pull deepseek-coder-v2:16b
docker exec ollama ollama pull nomic-embed-text
# List models
docker exec ollama ollama list
Pre-Load on Startup¶
Create a script to pull models on container start:
services:
ollama:
image: ollama/ollama
volumes:
- /tank/ai/models/ollama:/root/.ollama
- ./init-models.sh:/init-models.sh:ro
entrypoint: ["/bin/bash", "-c"]
command:
- |
/bin/ollama serve &
sleep 5
/init-models.sh
wait
#!/bin/bash
# init-models.sh
ollama pull llama3.3:70b-instruct-q4_K_M
ollama pull deepseek-coder-v2:16b
Import GGUF Models¶
# Create Modelfile
cat > /tank/ai/models/ollama/Modelfile << 'EOF'
FROM /models/gguf/custom-model.gguf
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"""
PARAMETER stop "<|eot_id|>"
EOF
# Mount GGUF volume and create model
docker run --rm \
-v /tank/ai/models/ollama:/root/.ollama \
-v /tank/ai/models/gguf:/models/gguf:ro \
ollama/ollama create custom-model -f /root/.ollama/Modelfile
With Open WebUI¶
Combined Stack¶
version: '3.8'
services:
ollama:
image: ollama/ollama
container_name: ollama
volumes:
- /tank/ai/models/ollama:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stopped
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
volumes:
- /tank/ai/data/open-webui:/app/backend/data
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
depends_on:
- ollama
restart: unless-stopped
networks:
default:
name: ai-network
API Usage¶
Chat Completion (OpenAI Compatible)¶
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.3:70b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Docker?"}
]
}'
Native API¶
# Generate
curl http://localhost:11434/api/generate \
-d '{
"model": "llama3.3:70b",
"prompt": "Why is the sky blue?",
"stream": false
}'
# Chat
curl http://localhost:11434/api/chat \
-d '{
"model": "llama3.3:70b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Model Management API¶
# List models
curl http://localhost:11434/api/tags
# Show model info
curl http://localhost:11434/api/show -d '{"name": "llama3.3:70b"}'
# Pull model
curl http://localhost:11434/api/pull -d '{"name": "llama3.3:70b"}'
# Delete model
curl http://localhost:11434/api/delete -d '{"name": "old-model"}'
Multi-Instance Deployment¶
Different Models per Instance¶
version: '3.8'
services:
ollama-chat:
image: ollama/ollama
container_name: ollama-chat
volumes:
- /tank/ai/models/ollama-chat:/root/.ollama
ports:
- "11434:11434"
environment:
- OLLAMA_KEEP_ALIVE=1h
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0']
capabilities: [gpu]
ollama-code:
image: ollama/ollama
container_name: ollama-code
volumes:
- /tank/ai/models/ollama-code:/root/.ollama
ports:
- "11435:11434"
environment:
- OLLAMA_KEEP_ALIVE=1h
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['1']
capabilities: [gpu]
Monitoring¶
Health Checks¶
Resource Usage¶
Logs¶
Storage Persistence¶
Volume Structure¶
/tank/ai/models/ollama/
├── models/
│ ├── blobs/ # Model weights (large files)
│ │ └── sha256-xxx
│ └── manifests/ # Model metadata
│ └── registry.ollama.ai/
│ └── library/
│ └── llama3.3/
└── history # Chat history (optional)
Backup Models¶
# Models are in the blobs directory
# Backup manifests to remember which models
tar czf ollama-manifests.tar.gz /tank/ai/models/ollama/models/manifests
# Full backup (large)
zfs snapshot tank/ai/models/ollama@backup
Troubleshooting¶
Container Won't Start¶
GPU Not Available¶
# NVIDIA: Check toolkit
nvidia-smi
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
# AMD: Check ROCm
rocm-smi
docker run --rm --device=/dev/kfd --device=/dev/dri rocm/rocm-terminal rocminfo
Model Pull Fails¶
# Check disk space
df -h /tank/ai/models/ollama
# Clean incomplete downloads
docker exec ollama rm -rf /root/.ollama/models/blobs/*.partial
# Retry
docker exec ollama ollama pull llama3.3:70b
Out of Memory¶
# Reduce concurrent models
OLLAMA_MAX_LOADED_MODELS=1
# Use smaller quantization
ollama pull llama3.3:70b-instruct-q4_K_S
# Check current memory
docker exec ollama ollama ps
Slow Response¶
# Verify GPU is being used
docker exec ollama ollama ps
# Should show "GPU" in PROCESSOR column
# Increase parallel slots for concurrent requests
OLLAMA_NUM_PARALLEL=4
See Also¶
- Container Deployment - Container overview
- GPU Containers - GPU setup details
- Model Volumes - Storage configuration
- Ollama - Ollama reference
- Open WebUI - Web interface