AITER
Getting Started
Installation
Requirements
System Requirements
Software Dependencies
Installation Methods
Method 1: From PyPI (Recommended)
Method 2: From Source
Basic Installation
Development Mode (JIT)
Precompiled Installation
Environment Variables
Example Configurations
Method 3: Docker
Verifying Installation
Optional: Triton Communication Support
Troubleshooting
ROCm Not Found
Compilation Errors
Import Errors
Next Steps
Quickstart
Installation
Verify Installation
First Example: Flash Attention
Variable-Length Sequences
Mixture of Experts (MoE)
RMSNorm
Performance Tips
Next Steps
Common Issues
Get Help
Tutorials
Basic Usage
Installation Check
Hello World: Flash Attention
Understanding the Parameters
Comparing with PyTorch
Working with Different Precisions
RMSNorm Example
Batched Operations
Error Handling
Memory Management
Next Steps
Common Gotchas
How to Add a New Operator
Overview
Step 1: Define the Operator Interface
Step 2: Implement the ROCm Kernel
Step 3: Create Python Bindings
Step 4: Update Build Configuration
Step 5: Add Tests
Step 6: Build and Install
Step 7: Register in Main Module
Advanced: Optimizations
Use CK (Composable Kernel) for Better Performance
Use Triton for Easier Kernel Development
Common Patterns
Pattern 1: Fused Operations
Pattern 2: In-Place Operations
Pattern 3: Autograd Support
Best Practices
Debugging Tips
Print Kernel Launches
Check for Memory Errors
Profile Your Operator
Example: Complete RMSNorm Implementation
Next Steps
Contributing
Tutorial Overview
Basic Tutorials
Advanced Topics
Integration Guides
Prerequisites
Example Data
Jupyter Notebooks
Running Examples
Community Examples
Contributing Tutorials
API Reference
Attention Operations
Flash Attention
Flash Attention with KV Cache
Grouped Query Attention (GQA)
Multi-Query Attention (MQA)
Variable Sequence Attention
Supported Architectures
Performance Characteristics
See Also
GEMM Operations
Grouped GEMM
Batched GEMM
Fused GEMM Operations
GEMM + Bias
GEMM + GELU
GEMM + ReLU
CUTLASS-style GEMM
Sparse GEMM
INT8 Quantized GEMM
Performance Characteristics
Optimization Tips
Example: Optimal MoE Forward Pass
See Also
Core Operators
RMSNorm
LayerNorm
SoftMax
GELU
SwiGLU
Rotary Position Embedding (RoPE)
Sampling Operations
Top-K Sampling
Top-P (Nucleus) Sampling
Performance Notes
Supported Data Types
See Also
AITER
Search
Please activate JavaScript to enable the search functionality.