FlyDSL

Getting Started

  • Installation
    • Prerequisites
    • Step 1: Build LLVM/MLIR
    • Step 2: Build FlyDSL
    • Step 3: Install FlyDSL
    • Step 4: Verify Installation
    • Troubleshooting
  • Quick Start
    • A Minimal Vector Add Kernel
    • Key Concepts
    • Compilation Pipeline
    • AOT Pre-compilation
    • Next Steps

Guides

  • Architecture & Compilation Pipeline Guide
    • Quick Reference
    • 1. Project Structure
    • 2. Architecture
    • 3. Compilation Pipeline
      • 3.1 High-Level Flow
      • 3.2 Pipeline Stages in Detail
      • 3.3 JIT Compilation Flow
    • 4. Key Abstractions
      • 4.1 @flyc.jit — Host Launcher
      • 4.2 @flyc.kernel — GPU Kernel
      • 4.3 KernelLauncher
      • 4.4 JITCFunction
      • 4.5 DslType / JitArgument Protocols
      • 4.6 ASTRewriter
    • 5. Environment Variables
      • 5.1 Compilation Options (FLYDSL_COMPILE_*)
      • 5.2 Debug Options (FLYDSL_DEBUG_*)
      • 5.3 Runtime Options (FLYDSL_RUNTIME_*)
      • 5.4 Architecture Detection Priority
    • 6. Target Hardware
    • 7. IR Dump Workflow
    • 8. Source Files
  • Layout Algebra Guide
    • Quick Reference
    • 1. Core Types
      • IntTuple Patterns
    • 2. Construction
      • Python API (via flydsl.expr)
    • 3. Coordinate Mapping
      • crd2idx — Coordinate to Index
      • idx2crd — Index to Coordinate (inverse)
      • Pure-Arith Helpers (kernels/layout_utils.py)
      • Example
    • 4. Query Operations
    • 5. Layout Algebra
      • 5.1 Composition: composition(A, B)
      • 5.2 Complement: complement(tiler, target_size)
      • 5.3 Coalesce: coalesce(layout)
      • 5.4 Right Inverse: right_inverse(layout)
      • 5.5 Recast Layout: recast_layout(layout, old_bits, new_bits)
    • 6. Product Operations
    • 7. Divide Operations
    • 8. Structural Operations
      • select(int_tuple, indices)
      • group(int_tuple, begin, end)
      • append(base, elem) / prepend(base, elem)
      • zip(lhs, rhs)
      • slice(src, coord)
    • 9. MemRef / View / Copy Operations
      • MemRef Operations
      • View and Offset
      • Copy and GEMM Atoms
    • 10. Nested / Hierarchical Layouts
    • 11. IntTuple Arithmetic
    • 12. Printf Debugging
    • 13. Decision Tree
    • 14. Source Files
  • Kernel Authoring Guide
    • Quick Reference
    • 1. Basic Kernel Pattern
      • 1.1 @flyc.kernel + @flyc.jit
      • 1.2 How It Works
    • 2. Parameter Types
      • 2.1 fx.Tensor
      • 2.2 fx.Constexpr[T]
      • 2.3 fx.Int32
      • 2.4 fx.Stream
      • 2.5 Custom Argument Types
    • 3. Thread / Block Hierarchy
    • 4. Expression API (flydsl.expr)
      • 4.1 Arithmetic (fx.arith)
      • 4.2 Vector Operations (fx.vector)
      • 4.3 Buffer Operations (fx.buffer_ops)
      • 4.4 ROCm Intrinsics (fx.rocdl)
      • 4.5 GPU Operations (fx.gpu)
    • 5. Control Flow
      • 5.1 Python Loops → MLIR SCF
      • 5.2 const_expr()
    • 6. Shared Memory (LDS)
      • 6.1 SmemAllocator
      • 6.2 Finalizing LDS Allocation
      • 6.3 LDS Capacity
    • 7. Launch Configuration
      • 7.1 KernelLauncher.launch()
      • 7.2 Dynamic Grid/Block Dimensions
    • 8. Synchronization
    • 9. Compilation & Caching
      • 9.1 Automatic Caching
      • 9.2 Cache Invalidation
      • 9.3 Disabling Cache
      • 9.4 Compile-Only Mode
    • 10. Debugging
      • 10.1 Dumping IR
      • 10.2 Printing IR
      • 10.3 AST Diff
    • 11. Complete Example: Preshuffle GEMM
    • 12. Decision Tree
    • 13. Source Files
  • Pre-built Kernel Library Guide
    • Quick Reference
    • 1. Normalization Kernels
      • 1.1 LayerNorm (kernels/layernorm_kernel.py)
      • 1.2 RMSNorm (kernels/rmsnorm_kernel.py)
    • 2. Softmax Kernel
      • 2.1 Softmax (kernels/softmax_kernel.py)
    • 3. GEMM Kernel
      • 3.1 Preshuffle GEMM (kernels/preshuffle_gemm.py)
    • 4. Shared Utilities
      • 4.1 Reduction Helpers (kernels/reduce.py)
      • 4.2 MFMA Epilogues (kernels/mfma_epilogues.py)
      • 4.3 Preshuffle Pipeline (kernels/mfma_preshuffle_pipeline.py)
      • 4.4 Layout Utilities (kernels/layout_utils.py)
    • 5. Kernel API Comparison
      • New API (GEMM)
    • 6. Kernel Decision Tree
    • 7. Source Files
    • 8. Test Files
  • Testing & Benchmarking Guide
    • Quick Reference
    • 1. Test Categories
      • 1.1 MLIR Lit Tests (tests/mlir/)
      • 1.2 Python IR Tests (tests/pyir/)
      • 1.3 GPU Kernel Tests (tests/kernels/)
      • 1.4 AOT Examples (tests/python/examples/)
    • 2. Test Runner Scripts
      • 2.1 scripts/run_tests.sh
      • 2.2 scripts/run_benchmark.sh
    • 3. Pytest Configuration
      • 3.1 tests/conftest.py
    • 4. Performance Measurement
      • 4.1 tests/test_common.py
      • 4.2 tests/kernels/benchmark_common.py
    • 5. Compilation Utilities (tests/utils.py)
      • compile_to_hsaco()
      • Weight Utilities
    • 6. Writing New Tests
      • 6.1 PyIR Test Pattern (No GPU)
      • 6.2 GPU Kernel Test Pattern (New API)
      • 6.3 Benchmark Test Pattern
    • 7. GEMM Test CLI Arguments
    • 8. Test Configuration via Environment Variables
    • 9. IR Dump Workflow
      • Via MlirCompiler
      • Dedicated IR Dump Script
    • 10. Source Files
  • CuTe Layout Algebra Reference for FlyDSL
    • 1. Overview
      • 1.1 What is the CuTe Layout Algebra?
      • 1.2 FlyDSL as an AMD Implementation
    • 2. Layout Algebra Fundamentals
      • 2.1 Core Types
      • 2.2 Query Operations
      • 2.3 Coordinate Mapping
      • 2.4 Layout Algebra Operations
        • Composition
        • Complement
        • Coalesce
        • Products
        • Divides
        • Partitioning Utilities
    • 3. FlyDSL Kernel Development
    • 4. Thread and Block Hierarchy
    • 5. Tensor Creation and Memory
      • 5.1 Tensor Construction
      • 5.2 Memory Hierarchy
      • 5.3 Swizzling (Bank Conflict Avoidance)
    • 6. Data Movement
      • 6.1 Copy Atoms and Tiled Copies
      • 6.2 Buffer Loads (AMD-specific)
    • 7. Compute Operations (MFMA)
    • 8. Synchronization
    • 9. Compilation and Execution
      • 9.1 Compilation Pipeline
      • 9.2 Environment Variables
    • 10. Complete Example: GEMM with Layout Algebra
    • 11. References
      • CuTe Layout Algebra (BSD-3-Clause)
      • FlyDSL Source Files

API Reference

  • FlyDSL Python DSL
    • Core Module
    • Expression API (flydsl.expr)
    • Compiler API (flydsl.compiler)
  • Compiler & Pipeline
    • @flyc.kernel and @flyc.jit
    • Compilation Flow
    • Tensor Arguments
    • Buffer Operations
    • ROCDL Operations
    • fly-opt CLI
  • Pre-built Kernels
    • GEMM Kernels
    • MoE (Mixture-of-Experts) Kernels
    • Paged Attention
    • Normalization
    • Softmax
    • Reduction
    • Utilities

Tutorials

  • Tutorials
    • Basic Usage
      • Setting Up the Environment
      • Understanding Layouts
      • Defining a Kernel
      • Launching Kernels
      • Next Steps
    • Kernel Development
      • Tiled Copies
      • MFMA Instructions
      • Shared Memory (LDS)
      • Performance Optimization
      • Reference Implementations
FlyDSL
  • Search


© Copyright 2024-2026, Advanced Micro Devices, Inc..

Built with Sphinx using a theme provided by Read the Docs.