Model Run Guide

Ready-to-use commands for serving models on ATOM with AMD Instinct MI355X / MI300X GPUs. Each model recipe below is validated in nightly CI.

Quick Start

# Pull the latest ATOM container
docker pull rocm/atom:latest

# Start the container
docker run -it --device=/dev/kfd --device=/dev/dri \
  --group-add video --ipc=host --shm-size=16G \
  --privileged --cap-add=SYS_PTRACE \
  -e HF_TOKEN=$HF_TOKEN \
  -p 8000:8000 \
  rocm/atom:latest

Supported Models

Model

Type

Precision

TP

Recipe

DeepSeek-R1-0528

MoE + MLA

FP8 / MXFP4

8

recipes/DeepSeek-R1.md

GLM-5

MoE + MLA

FP8

8

recipes/GLM-5.md

GPT-OSS-120B

MoE

FP8

1

recipes/GPT-OSS.md

Kimi-K2.5

MoE

MXFP4

4

recipes/Kimi-K2.5.md

Kimi-K2-Thinking

MoE

FP8

8

recipes/Kimi-K2-Thinking.md

Qwen3-235B

MoE

FP8

8

recipes/Qwen3-235b.md

Qwen3-Next

MoE

FP8

8

recipes/Qwen3-Next.md

vLLM Plugin Backend

ATOM also runs as a vLLM plugin backend. See recipes under recipes/atom_vllm/ for vLLM-integrated serving.

Nightly CI Benchmark Configurations

The nightly CI sweeps these configurations for every model:

ISL

OSL

Concurrency Levels

1024

1024

1, 2, 4, 8, 16, 32, 64, 128, 256

8192

1024

1, 2, 4, 8, 16, 32, 64, 128, 256

Run a benchmark against a running ATOM server:

python -m atom.benchmarks.benchmark_serving \
  --model <model_name_or_path> \
  --backend vllm --base-url http://localhost:8000 \
  --dataset-name random \
  --random-input-len 1024 --random-output-len 1024 \
  --max-concurrency 128 --num-prompts 1280 \
  --random-range-ratio 0.8 \
  --request-rate inf --ignore-eos

Key parameters:

  • --random-range-ratio 0.8 — adds ±20% jitter to sequence lengths

  • --num-prompts — typically concurrency × 10

  • --request-rate inf — closed-loop benchmarking (no inter-request delay)

  • --ignore-eos — forces full output length generation

Live Dashboard

Nightly benchmark results are published to the ATOM Benchmark Dashboard.

Competitive comparison (MI355X vs B200/B300) is available on the AI Frameworks Dashboard.