Skip to content

VM Integration

Run LLM inference in virtual machines with GPU passthrough.

Overview

VM-based LLM deployment enables:

  • Windows tools - LM Studio, specialized applications
  • Isolation - Separate environment from host
  • GPU passthrough - Direct GPU access in VM
  • API sharing - Host and containers access VM-hosted models

When to Use VMs

Scenario VM Benefit
Windows-only software Run LM Studio Windows
Testing different OSes Separate environments
Strong isolation Security boundaries
GPU-specific drivers Match driver to application

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      Linux Host (Ubuntu)                        │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                  QEMU/KVM Hypervisor                      │   │
│  │  ┌─────────────────────────────────────────────────────┐ │   │
│  │  │                Windows 11 VM                         │ │   │
│  │  │  ┌───────────────────────────────────────────────┐  │ │   │
│  │  │  │              LM Studio                         │  │ │   │
│  │  │  │         (OpenAI-compatible API)               │  │ │   │
│  │  │  │           :1234 → API endpoint                │  │ │   │
│  │  │  └───────────────────────────────────────────────┘  │ │   │
│  │  │  ┌───────────────────────────────────────────────┐  │ │   │
│  │  │  │         AMD/NVIDIA GPU (Passthrough)          │  │ │   │
│  │  │  └───────────────────────────────────────────────┘  │ │   │
│  │  └─────────────────────────────────────────────────────┘ │   │
│  └──────────────────────────────────────────────────────────┘   │
│                              │                                   │
│                       192.168.122.10:1234                        │
│                              │                                   │
│  ┌───────────────────────────┴────────────────────────────────┐ │
│  │                       Clients                               │ │
│  │  • Host applications                                        │ │
│  │  • Docker containers (via host network)                     │ │
│  │  • Other VMs                                                │ │
│  └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Topics

  • Windows LM Studio


    LM Studio in Windows VM with GPU passthrough

    Windows setup

  • API from VM


    Access VM-hosted LLM APIs from host and containers

    API access

Prerequisites

  • GPU passthrough configured (see GPU Passthrough)
  • Windows VM with GPU drivers
  • Sufficient RAM for host + VM + model

Quick Start

1. Configure GPU Passthrough

Follow GPU Passthrough guide.

2. Install Windows VM

See Windows 11 VM.

3. Install LM Studio in VM

Download from lmstudio.ai in the Windows VM.

4. Start API Server

In LM Studio: Local Server → Start Server

5. Access from Host

# Test connection
curl http://192.168.122.10:1234/v1/models

# Use with tools
export OPENAI_API_BASE=http://192.168.122.10:1234/v1

Resource Allocation

Memory Planning

For 128GB system running LLM in VM:

Component Memory Notes
Host reserved 16-32GB OS, containers, cache
VM 96-112GB Most for VM
Model in VM ~43GB 70B Q4
VM overhead ~4GB Windows, apps

CPU Allocation

<!-- VM config -->
<vcpu>16</vcpu>
<cpu mode='host-passthrough'>
  <topology sockets='1' cores='8' threads='2'/>
</cpu>

Comparison: VM vs Container

Aspect VM Container
Overhead Higher Lower
Isolation Full Process-level
GPU access Passthrough NVIDIA/ROCm toolkit
Windows support Yes WSL only
Setup complexity Higher Lower
Startup time Minutes Seconds

Recommendation

  • Use containers when possible (Linux, GPU toolkit available)
  • Use VMs for Windows-only tools or strong isolation requirements

See Also