AITER Documentation

AITER (AMD Inference and Training Enhanced Repository) is AMD’s high-performance AI operator library for ROCm, providing optimized kernels for inference and training workloads.

Why AITER?

High Performance: Optimized kernels using Triton, Composable Kernel (CK), and hand-written assembly
Comprehensive: Supports both inference and training workloads
Flexible: C++ and Python APIs for easy integration
AMD Optimized: Built specifically for AMD GPUs and the ROCm platform

Quick Start

Installation

pip install aiter  # Coming soon!

# For now, install from source:
git clone --recursive https://github.com/ROCm/aiter.git
cd aiter
python3 setup.py develop

Quick Example

import aiter
import torch

# Example: Flash Attention
# TODO: Add actual example code

Core Features

Attention Kernels

Multi-Head Attention (MHA): Standard attention with optimized implementations
Multi-Latent Attention (MLA): DeepSeek-style latent attention
Paged Attention: Efficient KV-cache management for serving

GEMM Operations

Mixed Precision GEMM: FP16, BF16, FP8, INT4 support
Tuned GEMM: Pre-tuned configurations for common shapes
Fused Operations: GEMM with activation fusion

Mixture of Experts (MoE)

Fused MoE: Optimized expert routing and computation
Multiple Routing: Support for various routing strategies
Quantized Experts: FP8 and INT4 expert weights

Normalization

RMSNorm: Root mean square normalization
LayerNorm: Standard layer normalization
Fused Variants: Combined with other operations

Other Operators

RoPE: Rotary position embeddings
Quantization: BF16/FP16 → FP8/INT4 conversion
Element-wise: Optimized basic operations
Communication: AllReduce and collective operations via Triton/Iris

GPU Support

AITER supports AMD GPUs with the following architectures:

Architecture	gfx Target	Example GPUs	ROCm Version
CDNA 2	gfx90a	MI210, MI250, MI250X	ROCm 5.0+
CDNA 3	gfx942	MI300A, MI300X	ROCm 6.0+
CDNA 3.5	gfx950	MI350X (upcoming)	ROCm 6.3+

Quick Links

🚀 Quickstart - Get started in 5 minutes
📖 How to Add a New Operator - How to add a new operator (step-by-step)
🔧 Attention Operations - Flash Attention API
💡 Basic Usage - Basic usage examples