.. AITER documentation master file AITER Documentation =================== **AITER** (AMD Inference and Training Enhanced Repository) is AMD's high-performance AI operator library for ROCm, providing optimized kernels for inference and training workloads. .. image:: https://img.shields.io/badge/ROCm-Compatible-red :target: https://rocm.docs.amd.com/ :alt: ROCm Compatible .. image:: https://img.shields.io/github/license/ROCm/aiter :target: https://github.com/ROCm/aiter/blob/main/LICENSE :alt: License Why AITER? ---------- * **High Performance**: Optimized kernels using Triton, Composable Kernel (CK), and hand-written assembly * **Comprehensive**: Supports both inference and training workloads * **Flexible**: C++ and Python APIs for easy integration * **AMD Optimized**: Built specifically for AMD GPUs and the ROCm platform Quick Start ----------- Installation ^^^^^^^^^^^^ .. code-block:: bash pip install aiter # Coming soon! # For now, install from source: git clone --recursive https://github.com/ROCm/aiter.git cd aiter python3 setup.py develop Quick Example ^^^^^^^^^^^^^ .. code-block:: python import aiter import torch # Example: Flash Attention # TODO: Add actual example code Core Features ------------- Attention Kernels ^^^^^^^^^^^^^^^^^ * **Multi-Head Attention (MHA)**: Standard attention with optimized implementations * **Multi-Latent Attention (MLA)**: DeepSeek-style latent attention * **Paged Attention**: Efficient KV-cache management for serving GEMM Operations ^^^^^^^^^^^^^^^ * **Mixed Precision GEMM**: FP16, BF16, FP8, INT4 support * **Tuned GEMM**: Pre-tuned configurations for common shapes * **Fused Operations**: GEMM with activation fusion Mixture of Experts (MoE) ^^^^^^^^^^^^^^^^^^^^^^^^^ * **Fused MoE**: Optimized expert routing and computation * **Multiple Routing**: Support for various routing strategies * **Quantized Experts**: FP8 and INT4 expert weights Normalization ^^^^^^^^^^^^^ * **RMSNorm**: Root mean square normalization * **LayerNorm**: Standard layer normalization * **Fused Variants**: Combined with other operations Other Operators ^^^^^^^^^^^^^^^ * **RoPE**: Rotary position embeddings * **Quantization**: BF16/FP16 → FP8/INT4 conversion * **Element-wise**: Optimized basic operations * **Communication**: AllReduce and collective operations via Triton/Iris GPU Support ----------- AITER supports AMD GPUs with the following architectures: .. list-table:: :header-rows: 1 :widths: 20 20 30 30 * - Architecture - gfx Target - Example GPUs - ROCm Version * - CDNA 2 - gfx90a - MI210, MI250, MI250X - ROCm 5.0+ * - CDNA 3 - gfx942 - MI300A, MI300X - ROCm 6.0+ * - CDNA 3.5 - gfx950 - MI350X (upcoming) - ROCm 6.3+ Quick Links ----------- * 🚀 :doc:`quickstart` - Get started in 5 minutes * 📖 :doc:`tutorials/add_new_op` - **How to add a new operator** (step-by-step) * 🔧 :doc:`api/attention` - Flash Attention API * 💡 :doc:`tutorials/basic_usage` - Basic usage examples Table of Contents ----------------- .. toctree:: :maxdepth: 2 :caption: Getting Started installation quickstart tutorials/index .. toctree:: :maxdepth: 2 :caption: API Reference api/attention api/gemm api/moe api/normalization api/operators .. toctree:: :maxdepth: 2 :caption: Advanced Topics performance/benchmarks performance/profiling advanced/triton_kernels advanced/ck_integration .. toctree:: :maxdepth: 1 :caption: Development contributing changelog Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`