Device Functions#
Warning
The Gluon API is experimental and may undergo breaking changes in future releases.
Device-side functions provided by Iris Gluon for remote memory operations and atomics. These methods are part of the IrisDeviceCtx aggregate used within Gluon kernels.
Initialization#
initialize#
- static IrisDeviceCtx.initialize(context_tensor)[source]
Initialize IrisDeviceCtx from the encoded tensor.
The context tensor has the format: [cur_rank, num_ranks, heap_base_0, heap_base_1, …]
- Parameters:
context_tensor – Pointer to encoded context data
- Returns:
Initialized device context
- Return type:
IrisDeviceCtx
Memory transfer operations#
load#
- IrisDeviceCtx.load(pointer, from_rank, mask=None)[source]
Loads a value from the specified rank’s memory location to the current rank.
- Parameters:
pointer – Pointer in the from_rank’s address space
from_rank – The rank ID from which to read the data
mask – Optional mask for conditional loading
- Returns:
The loaded value from the target memory location
Example
>>> # Load from rank 1 to current rank >>> data = ctx.load(buffer + offsets, 1, mask=mask)
store#
- IrisDeviceCtx.store(pointer, value, to_rank, mask=None)[source]
Writes data from the current rank to the specified rank’s memory location.
- Parameters:
pointer – Pointer in the current rank’s address space
value – The value to store
to_rank – The rank ID to which the data will be written
mask – Optional mask for conditional storing
Example
>>> # Store from current rank to rank 1 >>> ctx.store(buffer + offsets, values, 1, mask=mask)
copy#
- IrisDeviceCtx.copy(src_ptr, dst_ptr, from_rank, to_rank, mask=None)[source]
Copies data from the specified rank’s memory into the destination rank’s memory.
This function performs the transfer by translating src_ptr from the from_rank’s address space to the to_rank’s address space, performing a masked load from the translated source, and storing the loaded data to dst_ptr in the to_rank memory location. If from_rank and to_rank are the same, this function performs a local copy operation. It is undefined behaviour if neither from_rank nor to_rank is the cur_rank.
- Parameters:
src_ptr – Pointer in the from_rank’s local memory from which to read data
dst_ptr – Pointer in the to_rank’s local memory where the data will be written
from_rank – The rank ID that owns src_ptr (source rank)
to_rank – The rank ID that will receive the data (destination rank)
mask – Optional mask for conditional operations
Example
>>> # Copy from rank 1 to rank 0 (current rank must be either 1 or 0) >>> ctx.copy(remote_ptr + offsets, local_ptr + offsets, 1, 0, mask=mask)
get#
- IrisDeviceCtx.get(from_ptr, to_ptr, from_rank, mask=None)[source]
Copies data from the specified rank’s memory to the current rank’s local memory.
- Parameters:
from_ptr – Pointer to remote memory in from_rank’s address space
to_ptr – Pointer to local memory in current rank
from_rank – The rank ID from which to read the data
mask – Optional mask for conditional operations
Example
>>> # Copy from rank 1 to current rank's local memory >>> ctx.get(remote_ptr + offsets, local_ptr + offsets, 1, mask=mask)
put#
- IrisDeviceCtx.put(from_ptr, to_ptr, to_rank, mask=None)[source]
Copies data from the current rank’s local memory to the specified rank’s memory.
- Parameters:
from_ptr – Pointer to local memory in current rank
to_ptr – Pointer to remote memory in to_rank’s address space
to_rank – The rank ID to which the data will be written
mask – Optional mask for conditional operations
Example
>>> # Copy from current rank's local memory to rank 1 >>> ctx.put(local_ptr + offsets, remote_ptr + offsets, 1, mask=mask)
Atomic operations#
atomic_add#
- IrisDeviceCtx.atomic_add(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic add at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to add
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically add to rank 1's memory >>> old_val = ctx.atomic_add(buffer, 5, 1)
atomic_sub#
- IrisDeviceCtx.atomic_sub(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Atomically subtracts data from the specified rank’s memory location.
- Parameters:
pointer – Pointer in the current rank’s address space
val – The value to subtract
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically subtract from rank 1's memory >>> old_val = ctx.atomic_sub(buffer, 3, 1)
atomic_cas#
- IrisDeviceCtx.atomic_cas(pointer, cmp, val, to_rank, sem=None, scope=None)[source]
Atomically compares and exchanges the specified rank’s memory location.
- Parameters:
pointer – Pointer in the current rank’s address space
cmp – The expected value to compare
val – The new value to write if comparison succeeds
to_rank – The rank ID to which the atomic operation will be performed
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Compare-and-swap on rank 1's memory >>> old_val = ctx.atomic_cas(flag + pid, 0, 1, 1, sem="release", scope="sys")
atomic_xchg#
- IrisDeviceCtx.atomic_xchg(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic exchange at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to exchange
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Exchange value with rank 1's memory >>> old_val = ctx.atomic_xchg(buffer, 99, 1)
atomic_xor#
- IrisDeviceCtx.atomic_xor(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic xor at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to xor
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically XOR with rank 1's memory >>> old_val = ctx.atomic_xor(buffer, 0xFF, 1)
atomic_and#
- IrisDeviceCtx.atomic_and(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic and at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to and
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically AND with rank 1's memory >>> old_val = ctx.atomic_and(buffer, 0x0F, 1)
atomic_or#
- IrisDeviceCtx.atomic_or(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic or at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to or
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically OR with rank 1's memory >>> old_val = ctx.atomic_or(buffer, 0xF0, 1)
atomic_min#
- IrisDeviceCtx.atomic_min(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic min at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to compare and potentially store
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically compute minimum with rank 1's memory >>> old_val = ctx.atomic_min(buffer, 10, 1)
atomic_max#
- IrisDeviceCtx.atomic_max(pointer, val, to_rank, mask=None, sem=None, scope=None)[source]
Performs an atomic max at the specified rank’s memory location.
- Parameters:
pointer – The memory location in the current rank’s address space
val – The value to compare and potentially store
to_rank – The rank ID to which the atomic operation will be performed
mask – Optional mask for conditional operations
sem – Memory semantics (acquire, release, acq_rel, relaxed)
scope – Scope of synchronization (gpu, cta, sys)
- Returns:
The value at the memory location before the atomic operation
Example
>>> # Atomically compute maximum with rank 1's memory >>> old_val = ctx.atomic_max(buffer, 100, 1)