LM Studio¶

Desktop application for running local LLMs with built-in model discovery and API server.

Overview¶

LM Studio provides:

Model discovery - Browse and download from HuggingFace
Chat interface - Built-in conversation UI
Local server - OpenAI-compatible API
Model configuration - GPU layers, context, parameters
Cross-platform - macOS, Windows, Linux

Installation¶

Download¶

Get LM Studio from lmstudio.ai:

macOS: .dmg installer (Apple Silicon or Intel)
Windows: .exe installer
Linux: .AppImage

Linux Installation¶

# Download AppImage
wget https://releases.lmstudio.ai/linux/x86/LM-Studio-x.x.x.AppImage

# Make executable
chmod +x LM-Studio-*.AppImage

# Run
./LM-Studio-*.AppImage

# Or install system-wide
sudo mv LM-Studio-*.AppImage /opt/lm-studio
sudo ln -s /opt/lm-studio /usr/local/bin/lm-studio

First Run¶

Launch LM Studio
Sign in (optional, enables cloud sync)
Select model storage location
Browse and download a model

Model Management¶

Downloading Models¶

Click Search (magnifying glass)
Search for model (e.g., "llama 3.3")
Select quantization (Q4_K_M recommended for 128GB)
Click Download

Recommended Models for 128GB¶

Model	Quantization	Size	Use Case
Llama 3.3 70B	Q4_K_M	~43GB	General
Qwen 2.5 72B	Q4_K_M	~45GB	Multilingual
DeepSeek Coder V2	Q4_K_M	Varies	Coding
Mistral 7B	Q8_0	~8GB	Fast

Model Settings¶

Configure per-model settings:

Setting	Description	Recommended
GPU Layers	Layers offloaded to GPU	Max (all)
Context Length	Max tokens in context	8192+
CPU Threads	Threads for CPU work	Auto
Batch Size	Tokens processed together	512

Chat Interface¶

Basic Usage¶

Select model from dropdown
Type message and press Enter
View streaming response

System Prompt¶

Set a system prompt in the chat settings:

You are a helpful coding assistant. Focus on Python and TypeScript.
Provide concise, well-documented code examples.

Chat Parameters¶

Parameter	Description	Default
Temperature	Randomness	0.7
Top P	Nucleus sampling	0.95
Max Tokens	Response length	2048
Repeat Penalty	Reduce repetition	1.1

Local Server¶

Enable Server¶

Click Local Server tab
Select model to serve
Click Start Server

Default: http://localhost:1234

Server Configuration¶

Setting	Default	Notes
Port	1234	Can be changed
Host	localhost	Change to 0.0.0.0 for network
CORS	Enabled	For browser clients

API Usage¶

# List models
curl http://localhost:1234/v1/models

# Chat completion
curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "loaded-model",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

With Coding Tools¶

# Set environment for tools
export OPENAI_API_BASE=http://localhost:1234/v1
export OPENAI_API_KEY=not-needed

# Use with Aider
aider --openai-api-base http://localhost:1234/v1

Performance Tips¶

Memory Settings¶

For 128GB systems:

GPU Layers: Maximum (all layers on GPU)
Context: 16384 or higher
Leave headroom: ~20-30GB for system

GPU Utilization¶

Check GPU usage:

# NVIDIA
nvidia-smi -l 1

# AMD
rocm-smi

Model Loading¶

First load takes time (disk → memory)
Keep frequently used models loaded
Use smaller quantizations for faster switching

Running in VM¶

For Windows VM with GPU passthrough:

Requirements¶

GPU passthrough configured (see GPU Passthrough)
Windows VM with GPU drivers
LM Studio Windows version

Setup¶

Install LM Studio in Windows VM
Download models (or use shared storage)
Start local server
Configure VM networking for API access

Accessing from Host¶

# VM IP (example: 192.168.122.10)
export OPENAI_API_BASE=http://192.168.122.10:1234/v1

# Test connection
curl http://192.168.122.10:1234/v1/models

CLI Mode¶

LM Studio has limited CLI support:

# Start server from CLI (if supported)
lm-studio server start --model "model-name"

For full CLI usage, consider Ollama instead.

Storage Location¶

Default Locations¶

Platform	Path
macOS	`~/.cache/lm-studio`
Windows	`C:\Users\<user>\.cache\lm-studio`
Linux	`~/.cache/lm-studio`

Custom Location¶

Change in Settings → Storage to use ZFS dataset:

/tank/ai/models/lm-studio

Comparison with Alternatives¶

Feature	LM Studio	Ollama	Jan.ai
GUI	Full	None	Full
Model discovery	Built-in	Library	Built-in
CLI	Limited	Full	Limited
Server	Yes	Yes	Yes
Container	No	Yes	No

Troubleshooting¶

Model Won't Load¶

Check available memory (70B Q4 needs ~45GB)
Reduce GPU layers if memory limited
Try smaller quantization

Slow Generation¶

Verify GPU layers are maxed
Check GPU utilization
Reduce context length if needed

Server Connection Refused¶

Verify server is started
Check port isn't blocked
For network access, change host to 0.0.0.0

GPU Not Detected¶

Update GPU drivers
Reinstall LM Studio
Check GPU compatibility