Capacity Planning¶

This guide covers system-wide resource allocation strategy for a multi-workload server running Docker containers, KVM VMs, and LLM inference.

Resource Budget Overview¶

MS-S1 MAX Resources¶

Resource	Total	Notes
CPU	16 cores / 32 threads	Zen 5 architecture
Memory	128GB DDR5	Unified CPU/GPU
GPU CUs	40	RDNA 3.5, shared memory

Allocation Strategy¶

Workload	Memory	CPU Cores	Priority
Host OS & Services	4-8GB	2	High
Docker containers	8-16GB	4-6	Medium
KVM VMs	16-32GB	4-8	Medium
LLM inference	64-96GB	8-12	Low (batch)
Headroom	8-16GB	2-4	-

Warning

Memory totals should not exceed physical RAM. Overcommit leads to swapping and severe performance degradation, especially for LLM workloads.

Monitoring Current Usage¶

System-Wide View¶

# Real-time resource usage by cgroup
systemd-cgtop

# Memory breakdown
free -h
cat /proc/meminfo | grep -E "MemTotal|MemFree|MemAvailable|Buffers|Cached|SwapTotal|SwapFree"

# CPU usage
htop
# or
mpstat -P ALL 1

Docker Resources¶

# Container resource usage
docker stats --no-stream

# Total Docker memory
docker stats --no-stream --format "{{.MemUsage}}" | \
  awk -F'/' '{sum += $1} END {print sum " total"}'

VM Resources¶

# All VM stats
virsh domstats

# Memory usage per VM
virsh list --all | tail -n +3 | awk '{print $2}' | \
  xargs -I{} sh -c 'echo -n "{}: "; virsh dominfo {} 2>/dev/null | grep "Used memory"'

# Interactive monitoring
virt-top

LLM/GPU Resources¶

# GPU memory and utilization (ROCm)
rocm-smi

# Ollama model memory
curl -s http://localhost:11434/api/ps | jq '.models[] | {name, size}'

# Check unified memory pressure
cat /sys/class/drm/card0/device/mem_info_vram_used

Warning Thresholds¶

Memory Thresholds¶

Level	Available Memory	Action
Normal	>20%	No action
Warning	10-20%	Review allocations
Critical	<10%	Reduce workloads immediately

Monitoring Script¶

#!/bin/bash
# /usr/local/bin/check-resources.sh

MEM_WARN=20
MEM_CRIT=10

available=$(awk '/MemAvailable/ {print $2}' /proc/meminfo)
total=$(awk '/MemTotal/ {print $2}' /proc/meminfo)
pct=$((available * 100 / total))

if [ $pct -lt $MEM_CRIT ]; then
    echo "CRITICAL: Only ${pct}% memory available"
    # Add alerting here
elif [ $pct -lt $MEM_WARN ]; then
    echo "WARNING: ${pct}% memory available"
fi

Systemd Timer for Monitoring¶

# /etc/systemd/system/resource-check.timer
[Unit]
Description=Check system resources

[Timer]
OnBootSec=5min
OnUnitActiveSec=15min

[Install]
WantedBy=timers.target

When to Scale¶

Add Resources When¶

Memory consistently >80% used
CPU regularly sustained >70%
Swap usage >0 during normal operations
LLM inference becomes unusably slow
VM or container OOM kills occurring

Optimize First¶

Before adding resources, check for:

Unused containers/VMs: Stop what you don't need
Oversized allocations: Right-size based on actual usage
Memory leaks: Applications growing unbounded
Duplicate services: Consolidate where possible

Optimization Checklist¶

# Find idle containers
docker ps --filter "status=running" -q | xargs docker stats --no-stream | \
  awk '$3 < 1 {print $2 " is idle"}'

# VMs not using allocated memory
virsh list --name | while read vm; do
  [ -z "$vm" ] && continue
  alloc=$(virsh dominfo "$vm" | awk '/Max memory/ {print $3}')
  used=$(virsh dommemstat "$vm" 2>/dev/null | awk '/actual/ {print $2}')
  [ -n "$used" ] && echo "$vm: ${used}KB used of ${alloc}KB allocated"
done

Workload-Specific Guidelines¶

Docker Containers¶

See Docker Resource Limits for detailed configuration.

Key points:

Always set mem_limit for production containers
Use memswap_limit = mem_limit to prevent swap
Set CPU limits to prevent runaway containers
Monitor with docker stats

KVM VMs¶

See VM Resource Allocation for detailed configuration.

Key points:

Don't overcommit memory for production VMs
Use hugepages for VMs >8GB
Pin CPUs for latency-sensitive workloads
Use ballooning for flexible Linux VMs

LLM Inference¶

See Memory Management for detailed configuration.

Key points:

Model size determines minimum memory requirement
GPU CUs share system memory (no separate VRAM budget)
Context length affects memory usage significantly
Quantization trades quality for memory savings

Resource Contention¶

Priority Using Cgroups¶

Use systemd slices to prioritize workloads:

# Check current slice hierarchy
systemd-cgls

# Set CPU weight (higher = more priority)
sudo systemctl set-property docker.service CPUWeight=100
sudo systemctl set-property libvirtd.service CPUWeight=200

# Set memory limits on slices
sudo systemctl set-property user.slice MemoryMax=16G

Example Slice Configuration¶

# /etc/systemd/system/llm.slice
[Unit]
Description=LLM Inference Slice

[Slice]
CPUWeight=50
MemoryMax=96G
IOWeight=50

Assign services to slice:

# In service unit
[Service]
Slice=llm.slice

Capacity Planning Example¶

Scenario: Mixed Workload Server¶

Running:

2 Docker services (Nextcloud, Plex)
1 Windows VM (gaming)
Ollama with 70B model

Component	Memory	CPU	Notes
Host OS	4GB	2	systemd, SSH, monitoring
Nextcloud	2GB	1	Docker container
Plex	4GB	2	Docker container, transcoding
Windows VM	24GB	6	When running
Ollama 70B	80GB	8	Q4 quantization
Total	114GB	19	OK if not simultaneous

Constraint: Windows VM and 70B model can't run simultaneously.

Solution: Use smaller model (32B) when VM is active:

Scenario	Windows	LLM	Memory Used
Gaming	24GB	32B (20GB)	54GB
AI Work	Off	70B (80GB)	90GB

Quick Reference¶

Essential Commands¶

# System overview
systemd-cgtop
htop
free -h

# Docker
docker stats
docker system df

# VMs
virsh domstats
virt-top

# GPU/LLM
rocm-smi
ollama list

Key Files¶

Resource	Configuration
systemd limits	`/etc/systemd/system/*.service.d/`
Docker limits	`docker-compose.yml` or `/etc/docker/daemon.json`
VM limits	`/etc/libvirt/qemu/*.xml`
Kernel tuning	`/etc/sysctl.d/`

Capacity Planning¶

Resource Budget Overview¶

MS-S1 MAX Resources¶

Allocation Strategy¶

Monitoring Current Usage¶

System-Wide View¶

Docker Resources¶

VM Resources¶

LLM/GPU Resources¶

Warning Thresholds¶

Memory Thresholds¶

Monitoring Script¶

Systemd Timer for Monitoring¶

When to Scale¶

Add Resources When¶

Optimize First¶

Optimization Checklist¶

Workload-Specific Guidelines¶

Docker Containers¶

KVM VMs¶

LLM Inference¶

Resource Contention¶

Priority Using Cgroups¶

Example Slice Configuration¶

Capacity Planning Example¶

Scenario: Mixed Workload Server¶

Quick Reference¶

Essential Commands¶

Key Files¶

Related Documentation¶