Windows LM Studio VM¶

Run LM Studio in a Windows 11 VM with GPU passthrough for local LLM inference.

Overview¶

This setup:

Runs Windows 11 in KVM/QEMU
Passes GPU directly to VM
Runs LM Studio with full GPU acceleration
Exposes OpenAI-compatible API to host network

Prerequisites¶

GPU passthrough configured (see GPU Passthrough)
Windows 11 VM (see Windows 11 VM)
96GB+ RAM for VM (for 70B models)

VM Configuration¶

Resource Allocation¶

For 70B models, allocate generously:

<memory unit='GiB'>96</memory>
<vcpu>16</vcpu>

GPU Passthrough¶

Ensure GPU is passed through:

<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
  </source>
</hostdev>

Network Configuration¶

Use bridged or NAT networking for API access:

<interface type='network'>
  <source network='default'/>
  <model type='virtio'/>
</interface>

Get VM IP:

virsh domifaddr win11
# Or in Windows: ipconfig

LM Studio Installation¶

In Windows VM¶

Download LM Studio from lmstudio.ai
Run installer
Set storage location for models

GPU Drivers¶

Install AMD drivers in Windows (the MS-S1 MAX has an AMD Strix Halo iGPU):

AMD: Download from amd.com/support

Verify GPU¶

In LM Studio: - Check Settings -> Hardware - GPU should be detected with full VRAM

Model Download¶

In VM¶

Open LM Studio -> Search
Download models (e.g., Llama 3.3 70B Q4_K_M)
Models download to Windows storage

Shared Storage (Optional)¶

For faster model access, share models from host:

# On host, create Samba share
sudo apt install samba
# Configure /etc/samba/smb.conf

# In Windows, map network drive
# \\host-ip\models -> Z:\

API Server Configuration¶

Start Server¶

Click Local Server tab
Select model
Configure settings:
Host: 0.0.0.0 (listen on all interfaces)
Port: 1234 (default)
Click Start Server

Server Settings¶

Setting	Recommended	Notes
Host	`0.0.0.0`	Accept external connections
Port	`1234`	Default LM Studio port
Context Length	`8192+`	Adjust per model
GPU Layers	Maximum	All on GPU

Verify Server¶

In Windows (PowerShell):

curl http://localhost:1234/v1/models

Access from Host¶

Test Connection¶

# Get VM IP
virsh domifaddr win11
# Example: 192.168.122.10

# Test API
curl http://192.168.122.10:1234/v1/models

Chat Completion¶

curl http://192.168.122.10:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Environment Setup¶

# For coding tools
export OPENAI_API_BASE=http://192.168.122.10:1234/v1
export OPENAI_API_KEY=not-needed

# Use with Aider
aider --openai-api-base http://192.168.122.10:1234/v1

Docker Container Access¶

From Containers¶

Containers can access VM API via host network:

services:
  app:
    environment:
      - OPENAI_API_BASE=http://192.168.122.10:1234/v1

Or use host.docker.internal (on some Docker setups):

environment:
  - OPENAI_API_BASE=http://host.docker.internal:1234/v1

Auto-Start Configuration¶

Windows Task Scheduler¶

Create task to start LM Studio on boot:

Open Task Scheduler
Create Basic Task
Trigger: At startup
Action: Start LM Studio

Start Server Automatically¶

LM Studio can be configured to start server on launch in settings.

Firewall Configuration¶

Windows Firewall¶

Allow inbound connections:

Windows Defender Firewall -> Advanced Settings
Inbound Rules -> New Rule
Port -> TCP 1234
Allow connection
Apply to all profiles

Host Firewall¶

If using UFW on host:

# Usually not needed for NAT network
# But for bridged:
sudo ufw allow from 192.168.122.0/24 to any port 1234

Performance Optimization¶

Memory Settings¶

Allocate most VM memory to Windows:

<memory unit='GiB'>96</memory>
<currentMemory unit='GiB'>96</currentMemory>

CPU Pinning¶

Pin VM CPUs for consistent performance:

<cputune>
  <vcpupin vcpu='0' cpuset='8'/>
  <vcpupin vcpu='1' cpuset='9'/>
  <!-- ... continue for all vCPUs -->
</cputune>

Huge Pages¶

Enable huge pages for better memory performance:

# On host
echo 49152 > /proc/sys/vm/nr_hugepages

# In VM config
<memoryBacking>
  <hugepages/>
</memoryBacking>

Troubleshooting¶

GPU Not Detected¶

Verify GPU passthrough in VM config
Check Windows Device Manager for GPU
Update GPU drivers

API Connection Refused¶

Verify LM Studio server is started
Check Windows Firewall allows port 1234
Verify VM network is working: ping <vm-ip>

Slow Performance¶

Check GPU is being used (Task Manager -> GPU)
Verify all GPU layers are allocated
Check for thermal throttling

Model Loading Fails¶

Check available memory in Task Manager
Use smaller quantization (Q4 instead of Q6)
Close other applications