Introduction
This documentation was created to provide a detailed breakdown of all facets of Omniperf. In addition to a full deployment guide with installation instructions, we also explain the design of the tool and each of its components. If you are new to Omniperf, these chapters can be followed in order to gradually acquaint you with the tool and progressively introduce its more advanced features.
This project is proudly open source, and we welcome all feedback! For more details on how to contribute, please see our Contribution Guide.
Browse Omniperf source code on Github
What is Omniperf
Omniperf is a kernel level profiling tool for Machine Learning/HPC workloads running on AMD Instinct ™ MI accelerators. AMD’s Instinct ™ MI accelerators are Data Center GPUs designed for compute and with some graphics functions disabled or removed. Omniperf is currently built on top of rocProf to monitor hardware performance counters. The Omniperf tool primarily targets accelerators in the MI100, MI200, and MI300 families. Development is in progress to support Radeon ™ RDNA ™ GPUs.
Features
The Omniperf tool performs profiling based on all available hardware counters for the target accelerator. It provides high level performance analysis features including System Speed-of-Light, Hardware block level Speed-of-Light, Memory Chart Analysis, Roofline Analysis, Baseline Comparisons, and more…
Both command line analysis and GUI analysis are supported.
Detailed Feature List:
MI100 support
MI200 support
Standalone GUI Analyzer
Grafana/MongoDB GUI Analyzer
Dispatch Filtering
Kernel Filtering
GPU ID Filtering
Baseline Comparison
Multi-Normalizations
System Info Panel
System Speed-of-Light Panel
Kernel Statistic Panel
Memory Chart Analysis Panel
Roofline Analysis Panel (Supported on MI200 only, Ubuntu 20.04, SLES 15 SP3 or RHEL8)
Command Processor (CP) Panel
Workgroup Manager (SPI) Panel
Wavefront Launch Panel
Compute Unit - Instruction Mix Panel
Compute Unit - Pipeline Panel
Local Data Share (LDS) Panel
Instruction Cache Panel
Scalar L1D Cache Panel
L1 Address Processing Unit, a.k.a. Texture Addresser (TA) / L1 Backend Data Processing Unit, a.k.a. Texture Data (TD) panel(s)
Vector L1D Cache Panel
L2 Cache Panel
L2 Cache (per-Channel) Panel
Compatible SoCs
Platform |
Status |
---|---|
Vega 20 (MI50/60) |
No support |
MI100 |
Supported |
MI200 |
Supported |
MI300 |
Supported |