Skip to content

GPU Setup

Configure AMD integrated graphics for AI workloads on the MS-S1 MAX.

Overview

The AMD Ryzen AI Max+ 395 (Strix Halo) APU combines CPU and GPU on a single chip with access to system memory. Unlike discrete GPUs with dedicated VRAM, the integrated RDNA 3.5 graphics shares the 128GB LPDDR5X-8000 quad-channel system memory.

For a comprehensive explanation of the APU architecture, memory subsystem, and design trade-offs, see Hardware Architecture.

Quick Reference

Aspect MS-S1 MAX APU
Architecture RDNA 3.5
Compute Units 40 CUs
GPU ID gfx1151
Memory Shared 128GB LPDDR5X-8000 (quad-channel, soldered)
Bandwidth ~256 GB/s peak, ~210-220 GB/s real-world
ROCm Support Native (Ubuntu 26.04 ships ROCm 7.1 in Universe; gfx1151 supported upstream)

Why APU for LLMs?

The MS-S1 MAX's 128GB configuration enables running large models that exceed typical discrete GPU VRAM:

  • 70B models at high quantization - Full Q6 or Q8 fits in memory
  • 405B models at lower quantization - Q2-Q3 within reach
  • No model offloading - Entire model stays in accessible memory
  • Simple setup - No PCIe passthrough needed

The tradeoff is lower memory bandwidth compared to discrete GPUs, resulting in slower tokens/second. However, the ability to run larger models often outweighs raw speed.

Section Contents

Quick Start

Get from bare Ubuntu 26.04 LTS to running LLMs on GPU in one page:

  • Linux 7.0 (default), apt install rocm, VRAM allocation, Ollama

ROCm Installation

Native ROCm installation for Ubuntu 26.04:

  • APU compatibility and current support status
  • In-distro ROCm 7.1 vs upstream AMD repo (newer ROCm)
  • Verification with rocminfo and rocm-smi

Driver Updates

Keeping AMD drivers current:

  • Checking installed versions
  • Update procedures
  • Handling conflicts
  • Rollback if needed

Memory Configuration

Optimizing memory for AI workloads:

  • Software VRAM allocation with amd-ttm (108-115GB)
  • UMA Frame Buffer Size settings
  • Kernel parameter alternatives
  • Bandwidth considerations