AITER Documentation
AITER (AMD Inference and Training Enhanced Repository) is AMD’s high-performance AI operator library for ROCm, providing optimized kernels for inference and training workloads.
Why AITER?
High Performance: Optimized kernels using Triton, Composable Kernel (CK), and hand-written assembly
Comprehensive: Supports both inference and training workloads
Flexible: C++ and Python APIs for easy integration
AMD Optimized: Built specifically for AMD GPUs and the ROCm platform
Quick Start
Installation
pip install aiter # Coming soon!
# For now, install from source:
git clone --recursive https://github.com/ROCm/aiter.git
cd aiter
python3 setup.py develop
Quick Example
import aiter
import torch
# Example: Flash Attention
# TODO: Add actual example code
Core Features
Attention Kernels
Multi-Head Attention (MHA): Standard attention with optimized implementations
Multi-Latent Attention (MLA): DeepSeek-style latent attention
Paged Attention: Efficient KV-cache management for serving
GEMM Operations
Mixed Precision GEMM: FP16, BF16, FP8, INT4 support
Tuned GEMM: Pre-tuned configurations for common shapes
Fused Operations: GEMM with activation fusion
Mixture of Experts (MoE)
Fused MoE: Optimized expert routing and computation
Multiple Routing: Support for various routing strategies
Quantized Experts: FP8 and INT4 expert weights
Normalization
RMSNorm: Root mean square normalization
LayerNorm: Standard layer normalization
Fused Variants: Combined with other operations
Other Operators
RoPE: Rotary position embeddings
Quantization: BF16/FP16 → FP8/INT4 conversion
Element-wise: Optimized basic operations
Communication: AllReduce and collective operations via Triton/Iris
GPU Support
AITER supports AMD GPUs with the following architectures:
Architecture |
gfx Target |
Example GPUs |
ROCm Version |
|---|---|---|---|
CDNA 2 |
gfx90a |
MI210, MI250, MI250X |
ROCm 5.0+ |
CDNA 3 |
gfx942 |
MI300A, MI300X |
ROCm 6.0+ |
CDNA 3.5 |
gfx950 |
MI350X (upcoming) |
ROCm 6.3+ |
Quick Links
🚀 Quickstart - Get started in 5 minutes
📖 How to Add a New Operator - How to add a new operator (step-by-step)
🔧 Attention Operations - Flash Attention API
💡 Basic Usage - Basic usage examples
Table of Contents
Getting Started
API Reference