Iris Class#
Warning
The Gluon API is experimental and may undergo breaking changes in future releases.
Requirements#
The Gluon backend requires:
ROCm 7.0 or later
Triton commit
aafec417bded34db6308f5b3d6023daefae43905
or later
Factory Function#
Prefer using the convenience factory over calling the constructor directly:
- iris(heap_size=1073741824)[source]#
Create and return a Gluon-based Iris instance with the specified heap size. :param heap_size: Size of the heap in bytes. Defaults to 1GB. :type heap_size: int
- Returns:
An initialized Gluon-based Iris instance
- Return type:
IrisGluon
Example
>>> import iris.iris_gluon as iris_gl >>> ctx = iris_gl.iris(2**30) # 1GB heap >>> backend = ctx.get_backend() >>> tensor = ctx.zeros(1024, 1024)
Core Methods#
- IrisGluon.get_device_context()[source]#
Get the device context tensor for Gluon kernels.
Returns a tensor encoding: [cur_rank, num_ranks, heap_base_0, heap_base_1, …]
- Returns:
Encoded context data as int64 tensor on device
- Return type:
torch.Tensor
Example
>>> ctx = iris_gluon.iris() >>> context_tensor = ctx.get_device_context() >>> >>> @gluon.jit >>> def kernel(IrisDeviceCtx: gl.constexpr, context_tensor): >>> ctx = IrisDeviceCtx.initialize(context_tensor) >>> data = ctx.load(buffer, 1)
- IrisGluon.get_backend()[source]#
Legacy method for backward compatibility. Use get_device_context() for Gluon kernels.
- Returns:
Device context tensor
- Return type:
torch.Tensor
- IrisGluon.get_heap_bases()[source]#
Return the tensor of symmetric heap base addresses for all ranks.
- Returns:
A 1D tensor of uint64 heap base addresses
- Return type:
torch.Tensor
- IrisGluon.get_device()[source]#
Get the underlying device where the Iris symmetric heap resides.
- Returns:
The CUDA device of Iris-managed memory
- Return type:
torch.device
- IrisGluon.get_cu_count()[source]#
Get the number of compute units (CUs) for the current GPU.
- Returns:
Number of compute units on this rank’s GPU
- Return type:
Logging Helpers#
Use Iris-aware logging that automatically annotates each message with the current rank and world size. This is helpful when debugging multi-rank programs.
Broadcast Helper#
Broadcast data from a source rank to all ranks. This method automatically detects whether the value is a tensor/array or a scalar and uses the appropriate broadcast mechanism.