LLNL

This section provides instructions for running user-mode Omnistat on LLNL’s Tuolumne supercomputer with pre-installed versions from AMD Research.

Prerequisites:

User account on Tuolumne
Familiarity with Flux job submission
Be a member of the omnistat-dev Unix group (shared install currently limited to these users)

Running jobs on Tuolumne

Omnistat is preinstalled on Tuolumne so users only need to setup their module environment appropriately and add several commands to their Flux job scripts. There are two primary options for setup which we highlight below with example Flux job scripts.

Standard usage

In the standard case, users load the omnistat module and include relevant commands in their job script to initiate data collection at job startup and then tear-down the process after application executions have completed. A simple job example demonstrating this approach is as follows:

Listing 18 Example Tuolumne Flux job using standard Omnistat module to sample at 1s intervals

#!/bin/bash
#flux: -N 2
#flux: -n 16
#flux: -t 1h
#flux: -q pdebug

jobid=`flux getattr jobid`

# Beginning of job - load Omnistat environment and enable data collection
ml use /usr/workspace/omnistat-dev/modulefiles/
ml omnistat
${OMNISTAT_DIR}/omnistat-usermode --start --interval 1.0

# Run your snazzy application
flux run -N 2 sleep 120

# Tear down job and generate summary
${OMNISTAT_DIR}/omnistat-usermode --stopexporters
${OMNISTAT_DIR}/omnistat-query --job ${jobid} --interval 1.0 --pdf omnistat.${jobid}.pdf
${OMNISTAT_DIR}/omnistat-usermode --stopserver

If all runs well, the omnistat-query command above will include a short summary output directly within the Flux job output. Representative output from an example job is as follows:

Listing 19 Example telemetry summary report card in user-mode from Tuolomne.

----------------------------------------------------------------------
Omnistat Report Card for Job Id: f2cZ3xxxxxx
----------------------------------------------------------------------

Job Overview (Num Nodes = 2, Machine = LLNL Tuolumne)
--> Start time = 2025-12-11 13:32:48
--> End   time = 2025-12-11 13:35:49
--> Duration   = 181 secs

GPU Statistics:

           | Utilization (%)  |  Memory Use (%)  | Temperature (C)  |    Power (W)     | Energy (kWh) |
     GPU # |    Max     Mean  |    Max     Mean  |    Max     Mean  |    Max     Mean  |    Total     |
    ---------------------------------------------------------------------------------------------------
         0 |  100.00   26.48  |   63.42   20.03  |   63.00   51.78  |  549.00  227.39  |   2.27e-02   |
         1 |  100.00   27.00  |   63.41   20.03  |   67.00   54.95  |  549.00  229.38  |   2.29e-02   |
         2 |  100.00   25.46  |   63.41   19.94  |   60.00   49.92  |  549.00  212.08  |   2.12e-02   |
         3 |  100.00   25.54  |   63.41   19.95  |   64.00   51.66  |  549.00  222.37  |   2.22e-02   |

Approximate Total GPU Energy Consumed = 8.90e-02 kWh

Host Statistics:

   # CPUs | CPU Utilization (%)  |    Memory Use (%)    |    I/O Data Totals   |
          |    Max       Mean    |    Max       Mean    |    Read      Write   |
   -----------------------------------------------------------------------------
      192 |   26.21      5.71    |   69.44     26.56    |  143.3 GB   138.2 GB |

--
Query interval = 1.0 secs
Query execution time = 1.0 secs
Version = 1.10.0

Wrapper usage

The omnistat module used in the previous example relies on the system provided cray-python module along with extending the python environment to resolve necessary dependencies. Teams maintaining and deploying their own custom Python stack for application execution may wish to avoid any environment pollution and the local install also includes a convenience wrapper named omnistat-wrapper. This wrapper variant can be used as an alternative to run desired Omnistat commands in a sandbox without changing the remaining execution environment.

The following Flux job script example highlights use of the wrapper utility before and after executing a GPU application:

Listing 20 Example Tuolumne Flux job using omnistat-wrapper, highlighting changes needed to run Omnistat

#!/bin/bash
#flux: -N 2
#flux: -n 16
#flux: -t 1h
#flux: -q pdebug

jobid=`flux getattr jobid`

# Setup and launch Omnistat (wrapper version)
ml use /usr/workspace/omnistat-dev/modulefiles/
ml omnistat-wrapper
${OMNISTAT_WRAPPER} usermode --start --interval 1.0

# Your fine GPU application here
flux run -N 2 ./your_application

# Tear down Omnistat
${OMNISTAT_WRAPPER} usermode --stopexporters
${OMNISTAT_WRAPPER} query --job ${jobid} --interval 1.0 --pdf omnistat.${jobid}.pdf
${OMNISTAT_WRAPPER} usermode --stopserver

Storage

By default, Omnistat databases are stored in the user’s Lustre file system under an omnistat directory, organized by Flux id as follows: /p/lustre5/${USER}/omnistat/${FLUX_ENCLOSING_ID}

It is possible to override the default path using the OMNISTAT_VICTORIA_DATADIR environment variable prior to starting up Omnistat. If saving to a node-local storage location, please ensure to move the data after Omnistat teardown to a shared file system to avoid losing the data after job release.

Note

Omnistat databases require flock support, which is available in Lustre and local filesystems like /tmp.

Data Analysis

After job completion, transfer the archived Omnistat data to your local machine for analysis using the Docker environment described in the user-mode guide.