Skip to content

Windows LM Studio VM

Run LM Studio in a Windows 11 VM with GPU passthrough for local LLM inference.

Overview

This setup:

  • Runs Windows 11 in KVM/QEMU
  • Passes GPU directly to VM
  • Runs LM Studio with full GPU acceleration
  • Exposes OpenAI-compatible API to host network

Prerequisites

VM Configuration

Resource Allocation

For 70B models, allocate generously:

<memory unit='GiB'>96</memory>
<vcpu>16</vcpu>

GPU Passthrough

Ensure GPU is passed through:

<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
  </source>
</hostdev>

Network Configuration

Use bridged or NAT networking for API access:

<interface type='network'>
  <source network='default'/>
  <model type='virtio'/>
</interface>

Get VM IP:

virsh domifaddr win11
# Or in Windows: ipconfig

LM Studio Installation

In Windows VM

  1. Download LM Studio from lmstudio.ai
  2. Run installer
  3. Set storage location for models

GPU Drivers

Install appropriate GPU drivers in Windows:

Verify GPU

In LM Studio: - Check Settings → Hardware - GPU should be detected with full VRAM

Model Download

In VM

  1. Open LM Studio → Search
  2. Download models (e.g., Llama 3.3 70B Q4_K_M)
  3. Models download to Windows storage

Shared Storage (Optional)

For faster model access, share models from host:

# On host, create Samba share
sudo apt install samba
# Configure /etc/samba/smb.conf

# In Windows, map network drive
# \\host-ip\models → Z:\

API Server Configuration

Start Server

  1. Click Local Server tab
  2. Select model
  3. Configure settings:
  4. Host: 0.0.0.0 (listen on all interfaces)
  5. Port: 1234 (default)
  6. Click Start Server

Server Settings

Setting Recommended Notes
Host 0.0.0.0 Accept external connections
Port 1234 Default LM Studio port
Context Length 8192+ Adjust per model
GPU Layers Maximum All on GPU

Verify Server

In Windows (PowerShell):

curl http://localhost:1234/v1/models

Access from Host

Test Connection

# Get VM IP
virsh domifaddr win11
# Example: 192.168.122.10

# Test API
curl http://192.168.122.10:1234/v1/models

Chat Completion

curl http://192.168.122.10:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Environment Setup

# For coding tools
export OPENAI_API_BASE=http://192.168.122.10:1234/v1
export OPENAI_API_KEY=not-needed

# Use with Aider
aider --openai-api-base http://192.168.122.10:1234/v1

Docker Container Access

From Containers

Containers can access VM API via host network:

services:
  app:
    environment:
      - OPENAI_API_BASE=http://192.168.122.10:1234/v1

Or use host.docker.internal (on some Docker setups):

environment:
  - OPENAI_API_BASE=http://host.docker.internal:1234/v1

Auto-Start Configuration

Windows Task Scheduler

Create task to start LM Studio on boot:

  1. Open Task Scheduler
  2. Create Basic Task
  3. Trigger: At startup
  4. Action: Start LM Studio

Start Server Automatically

LM Studio can be configured to start server on launch in settings.

Firewall Configuration

Windows Firewall

Allow inbound connections:

  1. Windows Defender Firewall → Advanced Settings
  2. Inbound Rules → New Rule
  3. Port → TCP 1234
  4. Allow connection
  5. Apply to all profiles

Host Firewall

If using UFW on host:

# Usually not needed for NAT network
# But for bridged:
sudo ufw allow from 192.168.122.0/24 to any port 1234

Performance Optimization

Memory Settings

Allocate most VM memory to Windows:

<memory unit='GiB'>96</memory>
<currentMemory unit='GiB'>96</currentMemory>

CPU Pinning

Pin VM CPUs for consistent performance:

<cputune>
  <vcpupin vcpu='0' cpuset='8'/>
  <vcpupin vcpu='1' cpuset='9'/>
  <!-- ... continue for all vCPUs -->
</cputune>

Huge Pages

Enable huge pages for better memory performance:

# On host
echo 49152 > /proc/sys/vm/nr_hugepages

# In VM config
<memoryBacking>
  <hugepages/>
</memoryBacking>

Troubleshooting

GPU Not Detected

  • Verify GPU passthrough in VM config
  • Check Windows Device Manager for GPU
  • Update GPU drivers

API Connection Refused

  • Verify LM Studio server is started
  • Check Windows Firewall allows port 1234
  • Verify VM network is working: ping <vm-ip>

Slow Performance

  • Check GPU is being used (Task Manager → GPU)
  • Verify all GPU layers are allocated
  • Check for thermal throttling

Model Loading Fails

  • Check available memory in Task Manager
  • Use smaller quantization (Q4 instead of Q6)
  • Close other applications

See Also