Features¶
Overview¶
OmniTrace is designed to be highly extensible. Internally, it leverages the timemory performance analysis toolkit to manage extensions, resources, data, etc.
Data Collection Modes¶
Dynamic instrumentation
Runtime instrumentation
Instrument executable and shared libraries at runtime
Binary rewriting
Generate a new executable and/or library with instrumentation built-in
Statistical sampling
Periodic software interrupts per-thread
Process-level sampling
Background thread records process-, system- and device-level metrics while the application executes
Causal profiling
Quantifies the potential impact of optimizations in parallel codes
Data Analysis¶
High-level summary profiles with mean/min/max/stddev statistics
Low overhead, memory efficient
Ideal for running at scale
Comprehensive traces
Every individual event/measurement
Application speedup predictions resulting from potential optimizations in functions and lines of code (causal profiling)
Parallelism API Support¶
HIP
HSA
Pthreads
MPI
Kokkos-Tools (KokkosP)
OpenMP-Tools (OMPT)
GPU Metrics¶
GPU hardware counters
HIP API tracing
HIP kernel tracing
HSA API tracing
HSA operation tracing
System-level sampling (via rocm-smi)
Memory usage
Power usage
Temperature
Utilization
CPU Metrics¶
CPU hardware counters sampling and profiles
CPU frequency sampling
Various timing metrics
Wall time
CPU time (process and/or thread)
CPU utilization (process and/or thread)
User CPU time
Kernel CPU time
Various memory metrics
High-water mark (sampling and profiles)
Memory page allocation
Virtual memory usage
Network statistics
I/O metrics
… many more
Third-party API support¶
TAU
LIKWID
Caliper
CrayPAT
VTune
NVTX
ROCTX