Supported Models

ATOM supports a wide range of LLM architectures optimized for AMD GPUs.

Llama Models

Meta’s Llama family:

Llama 2 (7B, 13B, 70B)
Llama 3 (8B, 70B)
CodeLlama
Llama-2-Chat

Example:

from atom import LLM

llm = LLM(model="meta-llama/Llama-2-7b-hf")

GPT Models

GPT-style architectures:

GPT-2
GPT-J
GPT-NeoX

Example:

llm = LLM(model="EleutherAI/gpt-j-6b")

Mixtral

Mixture of Experts models:

Mixtral 8x7B
Mixtral 8x22B

Example:

llm = LLM(
    model="mistralai/Mixtral-8x7B-v0.1",
    tensor_parallel_size=4
)

Other Architectures

Mistral: Mistral-7B
Falcon: Falcon-7B, Falcon-40B
MPT: MPT-7B, MPT-30B
BLOOM: BLOOM-7B1

Model Configuration

Custom model configurations:

from atom import LLM

llm = LLM(
    model="/path/to/custom/model",
    trust_remote_code=True,  # For custom architectures
    dtype="bfloat16",
    max_model_len=8192
)

Performance by Model Size

Model Size	Recommended GPU	Tensor Parallel	Batch Size
7B	1x MI250X	1	32-64
13B	1x MI250X	1	16-32
30B	2x MI250X	2	8-16
70B	4x MI300X	4	4-8

Quantization

ATOM supports quantized models for reduced memory:

llm = LLM(
    model="TheBloke/Llama-2-7B-GPTQ",
    quantization="gptq"
)

Supported quantization formats:

GPTQ
AWQ
SqueezeLLM