VM Integration¶

Run LLM inference in virtual machines with GPU passthrough.

Overview¶

VM-based LLM deployment enables:

Windows tools - LM Studio, specialized applications
Isolation - Separate environment from host
GPU passthrough - Direct GPU access in VM
API sharing - Host and containers access VM-hosted models

When to Use VMs¶

Scenario	VM Benefit
Windows-only software	Run LM Studio Windows
Testing different OSes	Separate environments
Strong isolation	Security boundaries
GPU-specific drivers	Match driver to application

Architecture¶

┌─────────────────────────────────────────────────────────────────┐
│                      Linux Host (Ubuntu)                        │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                  QEMU/KVM Hypervisor                      │   │
│  │  ┌─────────────────────────────────────────────────────┐ │   │
│  │  │                Windows 11 VM                         │ │   │
│  │  │  ┌───────────────────────────────────────────────┐  │ │   │
│  │  │  │              LM Studio                         │  │ │   │
│  │  │  │         (OpenAI-compatible API)               │  │ │   │
│  │  │  │           :1234 → API endpoint                │  │ │   │
│  │  │  └───────────────────────────────────────────────┘  │ │   │
│  │  │  ┌───────────────────────────────────────────────┐  │ │   │
│  │  │  │         AMD/NVIDIA GPU (Passthrough)          │  │ │   │
│  │  │  └───────────────────────────────────────────────┘  │ │   │
│  │  └─────────────────────────────────────────────────────┘ │   │
│  └──────────────────────────────────────────────────────────┘   │
│                              │                                   │
│                       192.168.122.10:1234                        │
│                              │                                   │
│  ┌───────────────────────────┴────────────────────────────────┐ │
│  │                       Clients                               │ │
│  │  • Host applications                                        │ │
│  │  • Docker containers (via host network)                     │ │
│  │  • Other VMs                                                │ │
│  └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Topics¶

Windows LM Studio

LM Studio in Windows VM with GPU passthrough

Windows setup
API from VM

Access VM-hosted LLM APIs from host and containers

API access

Prerequisites¶

GPU passthrough configured (see GPU Passthrough)
Windows VM with GPU drivers
Sufficient RAM for host + VM + model

Quick Start¶

1. Configure GPU Passthrough¶

Follow GPU Passthrough guide.

2. Install Windows VM¶

See Windows 11 VM.

3. Install LM Studio in VM¶

Download from lmstudio.ai in the Windows VM.

4. Start API Server¶

In LM Studio: Local Server → Start Server

5. Access from Host¶

# Test connection
curl http://192.168.122.10:1234/v1/models

# Use with tools
export OPENAI_API_BASE=http://192.168.122.10:1234/v1

Resource Allocation¶

Memory Planning¶

For 128GB system running LLM in VM:

Component	Memory	Notes
Host reserved	16-32GB	OS, containers, cache
VM	96-112GB	Most for VM
Model in VM	~43GB	70B Q4
VM overhead	~4GB	Windows, apps

CPU Allocation¶

<!-- VM config -->
<vcpu>16</vcpu>
<cpu mode='host-passthrough'>
  <topology sockets='1' cores='8' threads='2'/>
</cpu>

Comparison: VM vs Container¶

Aspect	VM	Container
Overhead	Higher	Lower
Isolation	Full	Process-level
GPU access	Passthrough	NVIDIA/ROCm toolkit
Windows support	Yes	WSL only
Setup complexity	Higher	Lower
Startup time	Minutes	Seconds

Recommendation¶

Use containers when possible (Linux, GPU toolkit available)
Use VMs for Windows-only tools or strong isolation requirements