AITER

Getting Started

  • Installation
    • Requirements
      • System Requirements
      • Software Dependencies
    • Installation Methods
      • Method 1: From PyPI (Recommended)
      • Method 2: From Source
        • Basic Installation
        • Development Mode (JIT)
        • Precompiled Installation
      • Environment Variables
        • Example Configurations
      • Method 3: Docker
    • Verifying Installation
    • Optional: Triton Communication Support
    • Troubleshooting
      • ROCm Not Found
      • Compilation Errors
      • Import Errors
    • Next Steps
  • Quickstart
    • Installation
    • Verify Installation
    • First Example: Flash Attention
    • Variable-Length Sequences
    • Mixture of Experts (MoE)
    • RMSNorm
    • Performance Tips
    • Next Steps
    • Common Issues
    • Get Help
  • Tutorials
    • Basic Usage
      • Installation Check
      • Hello World: Flash Attention
        • Understanding the Parameters
        • Comparing with PyTorch
      • Working with Different Precisions
      • RMSNorm Example
      • Batched Operations
      • Error Handling
      • Memory Management
      • Next Steps
      • Common Gotchas
    • How to Add a New Operator
      • Overview
      • Step 1: Define the Operator Interface
      • Step 2: Implement the ROCm Kernel
      • Step 3: Create Python Bindings
      • Step 4: Update Build Configuration
      • Step 5: Add Tests
      • Step 6: Build and Install
      • Step 7: Register in Main Module
      • Advanced: Optimizations
        • Use CK (Composable Kernel) for Better Performance
        • Use Triton for Easier Kernel Development
      • Common Patterns
        • Pattern 1: Fused Operations
        • Pattern 2: In-Place Operations
        • Pattern 3: Autograd Support
      • Best Practices
      • Debugging Tips
        • Print Kernel Launches
        • Check for Memory Errors
        • Profile Your Operator
      • Example: Complete RMSNorm Implementation
      • Next Steps
      • Contributing
    • Tutorial Overview
      • Basic Tutorials
      • Advanced Topics
      • Integration Guides
    • Prerequisites
    • Example Data
    • Jupyter Notebooks
    • Running Examples
    • Community Examples
    • Contributing Tutorials

API Reference

  • Attention Operations
    • Flash Attention
    • Flash Attention with KV Cache
    • Grouped Query Attention (GQA)
    • Multi-Query Attention (MQA)
    • Variable Sequence Attention
    • Supported Architectures
    • Performance Characteristics
    • See Also
  • GEMM Operations
    • Grouped GEMM
    • Batched GEMM
    • Fused GEMM Operations
      • GEMM + Bias
      • GEMM + GELU
      • GEMM + ReLU
    • CUTLASS-style GEMM
    • Sparse GEMM
    • INT8 Quantized GEMM
    • Performance Characteristics
    • Optimization Tips
    • Example: Optimal MoE Forward Pass
    • See Also
  • Core Operators
    • RMSNorm
    • LayerNorm
    • SoftMax
    • GELU
    • SwiGLU
    • Rotary Position Embedding (RoPE)
    • Sampling Operations
      • Top-K Sampling
      • Top-P (Nucleus) Sampling
    • Performance Notes
    • Supported Data Types
    • See Also
AITER
  • Search


© Copyright 2026, AMD.

Built with Sphinx using a theme provided by Read the Docs.