Supported Models

ATOM supports a wide range of LLM architectures optimized for AMD GPUs.

Llama Models

Meta’s Llama family:

  • Llama 2 (7B, 13B, 70B)

  • Llama 3 (8B, 70B)

  • CodeLlama

  • Llama-2-Chat

Example:

from atom import LLM

llm = LLM(model="meta-llama/Llama-2-7b-hf")

GPT Models

GPT-style architectures:

  • GPT-2

  • GPT-J

  • GPT-NeoX

Example:

llm = LLM(model="EleutherAI/gpt-j-6b")

Mixtral

Mixture of Experts models:

  • Mixtral 8x7B

  • Mixtral 8x22B

Example:

llm = LLM(
    model="mistralai/Mixtral-8x7B-v0.1",
    tensor_parallel_size=4
)

Other Architectures

  • Mistral: Mistral-7B

  • Falcon: Falcon-7B, Falcon-40B

  • MPT: MPT-7B, MPT-30B

  • BLOOM: BLOOM-7B1

Model Configuration

Custom model configurations:

from atom import LLM

llm = LLM(
    model="/path/to/custom/model",
    trust_remote_code=True,  # For custom architectures
    dtype="bfloat16",
    max_model_len=8192
)

Performance by Model Size

Model Size

Recommended GPU

Tensor Parallel

Batch Size

7B

1x MI250X

1

32-64

13B

1x MI250X

1

16-32

30B

2x MI250X

2

8-16

70B

4x MI300X

4

4-8

Quantization

ATOM supports quantized models for reduced memory:

llm = LLM(
    model="TheBloke/Llama-2-7B-GPTQ",
    quantization="gptq"
)

Supported quantization formats:

  • GPTQ

  • AWQ

  • SqueezeLLM