Windows LM Studio VM¶
Run LM Studio in a Windows 11 VM with GPU passthrough for local LLM inference.
Overview¶
This setup:
- Runs Windows 11 in KVM/QEMU
- Passes GPU directly to VM
- Runs LM Studio with full GPU acceleration
- Exposes OpenAI-compatible API to host network
Prerequisites¶
- GPU passthrough configured (see GPU Passthrough)
- Windows 11 VM (see Windows 11 VM)
- 96GB+ RAM for VM (for 70B models)
VM Configuration¶
Resource Allocation¶
For 70B models, allocate generously:
GPU Passthrough¶
Ensure GPU is passed through:
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
</source>
</hostdev>
Network Configuration¶
Use bridged or NAT networking for API access:
Get VM IP:
LM Studio Installation¶
In Windows VM¶
- Download LM Studio from lmstudio.ai
- Run installer
- Set storage location for models
GPU Drivers¶
Install appropriate GPU drivers in Windows:
- NVIDIA: Download from nvidia.com/drivers
- AMD: Download from amd.com/support
Verify GPU¶
In LM Studio: - Check Settings → Hardware - GPU should be detected with full VRAM
Model Download¶
In VM¶
- Open LM Studio → Search
- Download models (e.g., Llama 3.3 70B Q4_K_M)
- Models download to Windows storage
Shared Storage (Optional)¶
For faster model access, share models from host:
# On host, create Samba share
sudo apt install samba
# Configure /etc/samba/smb.conf
# In Windows, map network drive
# \\host-ip\models → Z:\
API Server Configuration¶
Start Server¶
- Click Local Server tab
- Select model
- Configure settings:
- Host:
0.0.0.0(listen on all interfaces) - Port:
1234(default) - Click Start Server
Server Settings¶
| Setting | Recommended | Notes |
|---|---|---|
| Host | 0.0.0.0 | Accept external connections |
| Port | 1234 | Default LM Studio port |
| Context Length | 8192+ | Adjust per model |
| GPU Layers | Maximum | All on GPU |
Verify Server¶
In Windows (PowerShell):
Access from Host¶
Test Connection¶
# Get VM IP
virsh domifaddr win11
# Example: 192.168.122.10
# Test API
curl http://192.168.122.10:1234/v1/models
Chat Completion¶
curl http://192.168.122.10:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b-instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Environment Setup¶
# For coding tools
export OPENAI_API_BASE=http://192.168.122.10:1234/v1
export OPENAI_API_KEY=not-needed
# Use with Aider
aider --openai-api-base http://192.168.122.10:1234/v1
Docker Container Access¶
From Containers¶
Containers can access VM API via host network:
Or use host.docker.internal (on some Docker setups):
Auto-Start Configuration¶
Windows Task Scheduler¶
Create task to start LM Studio on boot:
- Open Task Scheduler
- Create Basic Task
- Trigger: At startup
- Action: Start LM Studio
Start Server Automatically¶
LM Studio can be configured to start server on launch in settings.
Firewall Configuration¶
Windows Firewall¶
Allow inbound connections:
- Windows Defender Firewall → Advanced Settings
- Inbound Rules → New Rule
- Port → TCP 1234
- Allow connection
- Apply to all profiles
Host Firewall¶
If using UFW on host:
# Usually not needed for NAT network
# But for bridged:
sudo ufw allow from 192.168.122.0/24 to any port 1234
Performance Optimization¶
Memory Settings¶
Allocate most VM memory to Windows:
CPU Pinning¶
Pin VM CPUs for consistent performance:
<cputune>
<vcpupin vcpu='0' cpuset='8'/>
<vcpupin vcpu='1' cpuset='9'/>
<!-- ... continue for all vCPUs -->
</cputune>
Huge Pages¶
Enable huge pages for better memory performance:
# On host
echo 49152 > /proc/sys/vm/nr_hugepages
# In VM config
<memoryBacking>
<hugepages/>
</memoryBacking>
Troubleshooting¶
GPU Not Detected¶
- Verify GPU passthrough in VM config
- Check Windows Device Manager for GPU
- Update GPU drivers
API Connection Refused¶
- Verify LM Studio server is started
- Check Windows Firewall allows port 1234
- Verify VM network is working:
ping <vm-ip>
Slow Performance¶
- Check GPU is being used (Task Manager → GPU)
- Verify all GPU layers are allocated
- Check for thermal throttling
Model Loading Fails¶
- Check available memory in Task Manager
- Use smaller quantization (Q4 instead of Q6)
- Close other applications
See Also¶
- VM Integration Index - Overview
- API from VM - Detailed API access
- GPU Passthrough - GPU setup
- Windows 11 VM - VM creation