Device Functions#
Warning
The Gluon API is experimental and may undergo breaking changes in future releases.
Device-side functions provided by Iris Gluon for remote memory operations and atomics. These methods are part of the IrisDeviceCtx
aggregate used within Gluon kernels.
Iris Gluon: Gluon-based Multi-GPU Communication Framework
This module provides a Gluon-based implementation of Iris that uses the @aggregate decorator with Gluon’s @gluon.jit to encapsulate the Iris backend struct, eliminating the need to pass heap_bases around manually.
Key Features: - Uses Gluon’s @gluon.jit decorator for device-side methods - Encapsulates heap_bases and rank info in IrisDeviceCtx aggregate - Provides same functionality as original Iris with improved ergonomics
Example
>>> import iris.iris_gluon as iris_gl
>>> ctx = iris_gl.iris(heap_size=2**30) # 1GB heap
>>> context_tensor = ctx.get_device_context() # Get context tensor
>>>
>>> @gluon.jit
>>> def kernel(IrisDeviceCtx: gl.constexpr, context_tensor):
>>> ctx = IrisDeviceCtx.initialize(context_tensor)
>>> data = ctx.load(buffer, 1)
- class IrisDeviceCtx(cur_rank, num_ranks, heap_bases)[source]
Gluon device-side context that decodes the tensor from Iris.get_device_context().
This aggregate encapsulates the heap_bases pointer and provides device-side methods for memory operations and atomics using Gluon.
- Parameters:
cur_rank (<MagicMock name='mock.tensor' id='139796306165904'>)
num_ranks (<MagicMock name='mock.tensor' id='139796306165904'>)
heap_bases (<MagicMock name='mock.tensor' id='139796306165904'>)
- cur_rank
Current rank ID
- Type:
<MagicMock name=’mock.tensor’ id=’139796306165904’>
- num_ranks
Total number of ranks
- Type:
<MagicMock name=’mock.tensor’ id=’139796306165904’>
- heap_bases
Pointer to array of heap base addresses for all ranks
- Type:
<MagicMock name=’mock.tensor’ id=’139796306165904’>
- __init__(cur_rank, num_ranks, heap_bases)[source]
- initialize()[source]
Initialize IrisDeviceCtx from the encoded tensor.
The context tensor has the format: [cur_rank, num_ranks, heap_base_0, heap_base_1, …]
- Parameters:
context_tensor – Pointer to encoded context data
- Returns:
Initialized device context
- Return type:
IrisDeviceCtx
- load(pointer, from_rank, mask=None)[source]
Loads a value from the specified rank’s memory location to the current rank.
- Parameters:
pointer – Pointer in the from_rank’s address space
from_rank – The rank ID from which to read the data
mask – Optional mask for conditional loading
- Returns:
The loaded value from the target memory location
Example
>>> # Load from rank 1 to current rank >>> data = ctx.load(buffer + offsets, 1, mask=mask)
- store(pointer, value, to_rank, mask=None)[source]
Writes data from the current rank to the specified rank’s memory location.
- Parameters:
pointer – Pointer in the current rank’s address space
value – The value to store
to_rank – The rank ID to which the data will be written
mask – Optional mask for conditional storing
Example
>>> # Store from current rank to rank 1 >>> ctx.store(buffer + offsets, values, 1, mask=mask)
- get(from_ptr, to_ptr, from_rank, mask=None)[source]
Copies data from the specified rank’s memory to the current rank’s local memory.
- Parameters:
from_ptr – Pointer to remote memory in from_rank’s address space
to_ptr – Pointer to local memory in current rank
from_rank – The rank ID from which to read the data
mask – Optional mask for conditional operations
Example
>>> # Copy from rank 1 to current rank's local memory >>> ctx.get(remote_ptr + offsets, local_ptr + offsets, 1, mask=mask)
- put(from_ptr, to_ptr, to_rank, mask=None)[source]
Copies data from the current rank’s local memory to the specified rank’s memory.
- Parameters:
from_ptr – Pointer to local memory in current rank
to_ptr – Pointer to remote memory in to_rank’s address space
to_rank – The rank ID to which the data will be written
mask – Optional mask for conditional operations
Example
>>> # Copy from current rank's local memory to rank 1 >>> ctx.put(local_ptr + offsets, remote_ptr + offsets, 1, mask=mask)
- copy(src_ptr, dst_ptr, from_rank, to_rank, mask=None)[source]
Copies data from the specified rank’s memory into the destination rank’s memory.
This function performs the transfer by translating src_ptr from the from_rank’s address space to the to_rank’s address space, performing a masked load from the translated source, and storing the loaded data to dst_ptr in the to_rank memory location. If from_rank and to_rank are the same, this function performs a local copy operation. It is undefined behaviour if neither from_rank nor to_rank is the cur_rank.
- Parameters:
src_ptr – Pointer in the from_rank’s local memory from which to read data
dst_ptr – Pointer in the to_rank’s local memory where the data will be written
from_rank – The rank ID that owns src_ptr (source rank)
to_rank – The rank ID that will receive the data (destination rank)
mask – Optional mask for conditional operations
Example
>>> # Copy from rank 1 to rank 0 (current rank must be either 1 or 0) >>> ctx.copy(remote_ptr + offsets, local_ptr + offsets, 1, 0, mask=mask)
- atomic_add(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic add at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to add
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically add to rank 1's memory >>> old_val = ctx.atomic_add(buffer, 5, 1)
- atomic_sub(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Atomically subtracts data from the specified rank’s memory location.
- Parameters:
pointer – Pointer in the current rank’s address space
val – The value to subtract
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically subtract from rank 1's memory >>> old_val = ctx.atomic_sub(buffer, 3, 1)
- atomic_cas(pointer, cmp, val, to_rank, sem=None, scope=None)[source]
Atomically compares and exchanges the specified rank’s memory location.
- Parameters:
pointer – Pointer in the current rank’s address space
cmp – The expected value to compare
val – The new value to write if comparison succeeds
to_rank – The rank ID to which the atomic operation will be performed
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Compare-and-swap on rank 1's memory >>> old_val = ctx.atomic_cas(flag + pid, 0, 1, 1, sem="release", scope="sys")
- atomic_xchg(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic exchange at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to exchange
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Exchange value with rank 1's memory >>> old_val = ctx.atomic_xchg(buffer, 99, 1)
- atomic_xor(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic xor at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to xor
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically XOR with rank 1's memory >>> old_val = ctx.atomic_xor(buffer, 0xFF, 1)
- atomic_and(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic and at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to and
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically AND with rank 1's memory >>> old_val = ctx.atomic_and(buffer, 0x0F, 1)
- atomic_or(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic or at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to or
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically OR with rank 1's memory >>> old_val = ctx.atomic_or(buffer, 0xF0, 1)
- atomic_min(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic min at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to compare and potentially store
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically compute minimum with rank 1's memory >>> old_val = ctx.atomic_min(buffer, 10, 1)
- atomic_max(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic max at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to compare and potentially store
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically compute maximum with rank 1's memory >>> old_val = ctx.atomic_max(buffer, 100, 1)
- class IrisGluon(heap_size=1073741824)[source]
Gluon-based Iris class for multi-GPU communication and memory management.
This class provides the same functionality as the original Iris class but uses Gluon’s @aggregate decorator to encapsulate the backend state.
- Parameters:
heap_size (int) – Size of the symmetric heap in bytes. Default: 1GB (2^30)
Example
>>> ctx = iris_gluon.iris(heap_size=2**31) # 2GB heap >>> backend = ctx.get_backend() # Get Gluon aggregate >>> tensor = ctx.zeros(1000, 1000, dtype=torch.float32)
- __init__(heap_size=1073741824)[source]
- debug(message)[source]
Log a debug message with rank information.
- info(message)[source]
Log an info message with rank information.
- warning(message)[source]
Log a warning message with rank information.
- error(message)[source]
Log an error message with rank information.
- get_device_context()[source]
Get the device context tensor for Gluon kernels.
Returns a tensor encoding: [cur_rank, num_ranks, heap_base_0, heap_base_1, …]
- Returns:
Encoded context data as int64 tensor on device
- Return type:
torch.Tensor
Example
>>> ctx = iris_gluon.iris() >>> context_tensor = ctx.get_device_context() >>> >>> @gluon.jit >>> def kernel(IrisDeviceCtx: gl.constexpr, context_tensor): >>> ctx = IrisDeviceCtx.initialize(context_tensor) >>> data = ctx.load(buffer, 1)
- get_backend()[source]
Legacy method for backward compatibility. Use get_device_context() for Gluon kernels.
- Returns:
Device context tensor
- Return type:
torch.Tensor
- get_heap_bases()[source]
Return the tensor of symmetric heap base addresses for all ranks.
- Returns:
A 1D tensor of uint64 heap base addresses
- Return type:
torch.Tensor
- barrier()[source]
Synchronize all ranks using a distributed barrier.
- get_device()[source]
Get the underlying device where the Iris symmetric heap resides.
- Returns:
The CUDA device of Iris-managed memory
- Return type:
torch.device
- get_cu_count()[source]
Get the number of compute units (CUs) for the current GPU.
- Returns:
Number of compute units on this rank’s GPU
- Return type:
- get_num_ranks()[source]
Get the total number of ranks.
- Returns:
The total number of ranks in the distributed system
- Return type:
- broadcast(data, src_rank=0)[source]
Broadcast data from source rank to all ranks.
- Parameters:
data – Data to broadcast (scalar or tensor)
src_rank – Source rank for broadcast (default: 0)
- Returns:
The broadcasted data
- zeros(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)[source]
Create a tensor filled with zeros on the symmetric heap.
- Parameters:
size – Shape of the tensor
dtype – Data type (default: torch.float32)
device – Device (must match Iris device)
layout – Layout (default: torch.strided)
requires_grad – Whether to track gradients
- Returns:
Zero-initialized tensor on the symmetric heap
- Return type:
torch.Tensor
- ones(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)[source]
Returns a tensor filled with the scalar value 1, with the shape defined by the variable argument size. The tensor is allocated on the Iris symmetric heap.
- Parameters:
*size (int...) – a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple.
- Keyword Arguments:
out (Tensor, optional) – the output tensor.
dtype (torch.dtype, optional) – the desired data type of returned tensor. Default: if None, uses a global default (see torch.set_default_dtype()).
layout (torch.layout, optional) – the desired layout of returned Tensor. Default: torch.strided. Note: Iris tensors always use torch.strided regardless of this parameter.
device (torch.device, optional) – the desired device of returned tensor. Default: if None, uses the current device for the default tensor type.
requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: False.
Example
>>> ctx = iris_gluon.iris(1 << 20) >>> tensor = ctx.ones(2, 3) >>> print(tensor.shape) # torch.Size([2, 3]) >>> print(tensor[0]) # tensor([1., 1., 1.], device='cuda:0')
- full(size, fill_value, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)[source]
Creates a tensor of size size filled with fill_value. The tensor’s dtype is inferred from fill_value. The tensor is allocated on the Iris symmetric heap.
- Parameters:
size (int...) – a list, tuple, or torch.Size of integers defining the shape of the output tensor.
fill_value (Scalar) – the value to fill the output tensor with.
- Keyword Arguments:
out (Tensor, optional) – the output tensor.
dtype (torch.dtype, optional) – the desired data type of returned tensor. Default: if None, uses a global default (see torch.set_default_dtype()).
layout (torch.layout, optional) – the desired layout of returned Tensor. Default: torch.strided. Note: Iris tensors always use torch.strided regardless of this parameter.
device (torch.device, optional) – the desired device of returned tensor. Default: if None, uses the current device for the default tensor type.
requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: False.
Example
>>> ctx = iris_gluon.iris(1 << 20) >>> tensor = ctx.full((2, 3), 3.14) >>> print(tensor.shape) # torch.Size([2, 3]) >>> print(tensor[0]) # tensor([3.1400, 3.1400, 3.1400], device='cuda:0')
- zeros_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format)[source]
Returns a tensor filled with the scalar value 0, with the same size as input, allocated on the Iris symmetric heap.
- Parameters:
input (Tensor) – the size of input will determine size of the output tensor.
- Keyword Arguments:
dtype (torch.dtype, optional) – the desired data type of returned Tensor. Default: if None, defaults to the dtype of input.
layout (torch.layout, optional) – the desired layout of returned tensor. Default: if None, defaults to the layout of input. Note: Iris tensors are always contiguous (strided).
device (torch.device, optional) – the desired device of returned tensor. Default: if None, defaults to the device of input. Must be compatible with this Iris instance.
requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: False.
memory_format (torch.memory_format, optional) – the desired memory format of returned Tensor. Default: torch.preserve_format.
Example
>>> ctx = iris_gluon.iris(1 << 20) >>> input_tensor = ctx.ones(2, 3) >>> zeros_tensor = ctx.zeros_like(input_tensor) >>> print(zeros_tensor.shape) # torch.Size([2, 3])
- iris(heap_size=1073741824)[source]
Create and return a Gluon-based Iris instance with the specified heap size. :param heap_size: Size of the heap in bytes. Defaults to 1GB. :type heap_size: int
- Returns:
An initialized Gluon-based Iris instance
- Return type:
IrisGluon
Example
>>> import iris.iris_gluon as iris_gl >>> ctx = iris_gl.iris(2**30) # 1GB heap >>> backend = ctx.get_backend() >>> tensor = ctx.zeros(1024, 1024)
Initialization#
initialize#
- IrisDeviceCtx.initialize()[source]
Initialize IrisDeviceCtx from the encoded tensor.
The context tensor has the format: [cur_rank, num_ranks, heap_base_0, heap_base_1, …]
- Parameters:
context_tensor – Pointer to encoded context data
- Returns:
Initialized device context
- Return type:
IrisDeviceCtx
Memory transfer operations#
load#
- IrisDeviceCtx.load(pointer, from_rank, mask=None)[source]
Loads a value from the specified rank’s memory location to the current rank.
- Parameters:
pointer – Pointer in the from_rank’s address space
from_rank – The rank ID from which to read the data
mask – Optional mask for conditional loading
- Returns:
The loaded value from the target memory location
Example
>>> # Load from rank 1 to current rank >>> data = ctx.load(buffer + offsets, 1, mask=mask)
store#
- IrisDeviceCtx.store(pointer, value, to_rank, mask=None)[source]
Writes data from the current rank to the specified rank’s memory location.
- Parameters:
pointer – Pointer in the current rank’s address space
value – The value to store
to_rank – The rank ID to which the data will be written
mask – Optional mask for conditional storing
Example
>>> # Store from current rank to rank 1 >>> ctx.store(buffer + offsets, values, 1, mask=mask)
copy#
- IrisDeviceCtx.copy(src_ptr, dst_ptr, from_rank, to_rank, mask=None)[source]
Copies data from the specified rank’s memory into the destination rank’s memory.
This function performs the transfer by translating src_ptr from the from_rank’s address space to the to_rank’s address space, performing a masked load from the translated source, and storing the loaded data to dst_ptr in the to_rank memory location. If from_rank and to_rank are the same, this function performs a local copy operation. It is undefined behaviour if neither from_rank nor to_rank is the cur_rank.
- Parameters:
src_ptr – Pointer in the from_rank’s local memory from which to read data
dst_ptr – Pointer in the to_rank’s local memory where the data will be written
from_rank – The rank ID that owns src_ptr (source rank)
to_rank – The rank ID that will receive the data (destination rank)
mask – Optional mask for conditional operations
Example
>>> # Copy from rank 1 to rank 0 (current rank must be either 1 or 0) >>> ctx.copy(remote_ptr + offsets, local_ptr + offsets, 1, 0, mask=mask)
get#
- IrisDeviceCtx.get(from_ptr, to_ptr, from_rank, mask=None)[source]
Copies data from the specified rank’s memory to the current rank’s local memory.
- Parameters:
from_ptr – Pointer to remote memory in from_rank’s address space
to_ptr – Pointer to local memory in current rank
from_rank – The rank ID from which to read the data
mask – Optional mask for conditional operations
Example
>>> # Copy from rank 1 to current rank's local memory >>> ctx.get(remote_ptr + offsets, local_ptr + offsets, 1, mask=mask)
put#
- IrisDeviceCtx.put(from_ptr, to_ptr, to_rank, mask=None)[source]
Copies data from the current rank’s local memory to the specified rank’s memory.
- Parameters:
from_ptr – Pointer to local memory in current rank
to_ptr – Pointer to remote memory in to_rank’s address space
to_rank – The rank ID to which the data will be written
mask – Optional mask for conditional operations
Example
>>> # Copy from current rank's local memory to rank 1 >>> ctx.put(local_ptr + offsets, remote_ptr + offsets, 1, mask=mask)
Atomic operations#
atomic_add#
- IrisDeviceCtx.atomic_add(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic add at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to add
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically add to rank 1's memory >>> old_val = ctx.atomic_add(buffer, 5, 1)
atomic_sub#
- IrisDeviceCtx.atomic_sub(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Atomically subtracts data from the specified rank’s memory location.
- Parameters:
pointer – Pointer in the current rank’s address space
val – The value to subtract
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically subtract from rank 1's memory >>> old_val = ctx.atomic_sub(buffer, 3, 1)
atomic_cas#
- IrisDeviceCtx.atomic_cas(pointer, cmp, val, to_rank, sem=None, scope=None)[source]
Atomically compares and exchanges the specified rank’s memory location.
- Parameters:
pointer – Pointer in the current rank’s address space
cmp – The expected value to compare
val – The new value to write if comparison succeeds
to_rank – The rank ID to which the atomic operation will be performed
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Compare-and-swap on rank 1's memory >>> old_val = ctx.atomic_cas(flag + pid, 0, 1, 1, sem="release", scope="sys")
atomic_xchg#
- IrisDeviceCtx.atomic_xchg(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic exchange at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to exchange
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Exchange value with rank 1's memory >>> old_val = ctx.atomic_xchg(buffer, 99, 1)
atomic_xor#
- IrisDeviceCtx.atomic_xor(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic xor at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to xor
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically XOR with rank 1's memory >>> old_val = ctx.atomic_xor(buffer, 0xFF, 1)
atomic_and#
- IrisDeviceCtx.atomic_and(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic and at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to and
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically AND with rank 1's memory >>> old_val = ctx.atomic_and(buffer, 0x0F, 1)
atomic_or#
- IrisDeviceCtx.atomic_or(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic or at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to or
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically OR with rank 1's memory >>> old_val = ctx.atomic_or(buffer, 0xF0, 1)
atomic_min#
- IrisDeviceCtx.atomic_min(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic min at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to compare and potentially store
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically compute minimum with rank 1's memory >>> old_val = ctx.atomic_min(buffer, 10, 1)
atomic_max#
- IrisDeviceCtx.atomic_max(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic max at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to compare and potentially store
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically compute maximum with rank 1's memory >>> old_val = ctx.atomic_max(buffer, 100, 1)