Supported Models
================

ATOM supports a wide range of LLM architectures optimized for AMD GPUs.

Llama Models
------------

Meta's Llama family:

* Llama 2 (7B, 13B, 70B)
* Llama 3 (8B, 70B)
* CodeLlama
* Llama-2-Chat

**Example:**

.. code-block:: python

   from atom import LLM

   llm = LLM(model="meta-llama/Llama-2-7b-hf")

GPT Models
----------

GPT-style architectures:

* GPT-2
* GPT-J
* GPT-NeoX

**Example:**

.. code-block:: python

   llm = LLM(model="EleutherAI/gpt-j-6b")

Mixtral
-------

Mixture of Experts models:

* Mixtral 8x7B
* Mixtral 8x22B

**Example:**

.. code-block:: python

   llm = LLM(
       model="mistralai/Mixtral-8x7B-v0.1",
       tensor_parallel_size=4
   )

Other Architectures
-------------------

* **Mistral**: Mistral-7B
* **Falcon**: Falcon-7B, Falcon-40B
* **MPT**: MPT-7B, MPT-30B
* **BLOOM**: BLOOM-7B1

Model Configuration
-------------------

Custom model configurations:

.. code-block:: python

   from atom import LLM

   llm = LLM(
       model="/path/to/custom/model",
       trust_remote_code=True,  # For custom architectures
       dtype="bfloat16",
       max_model_len=8192
   )

Performance by Model Size
-------------------------

.. list-table::
   :header-rows: 1
   :widths: 25 25 25 25

   * - Model Size
     - Recommended GPU
     - Tensor Parallel
     - Batch Size
   * - 7B
     - 1x MI250X
     - 1
     - 32-64
   * - 13B
     - 1x MI250X
     - 1
     - 16-32
   * - 30B
     - 2x MI250X
     - 2
     - 8-16
   * - 70B
     - 4x MI300X
     - 4
     - 4-8

Quantization
------------

ATOM supports quantized models for reduced memory:

.. code-block:: python

   llm = LLM(
       model="TheBloke/Llama-2-7B-GPTQ",
       quantization="gptq"
   )

Supported quantization formats:

* GPTQ
* AWQ
* SqueezeLLM