genome_entropy.encode3di.multi_gpu

Multi-GPU asynchronous encoding for protein to 3Di conversion.

Classes

MultiGPUEncoder(model_name, encoder_class[, ...])

Manages multi-GPU encoding of amino acid sequences to 3Di tokens.

class genome_entropy.encode3di.multi_gpu.MultiGPUEncoder(model_name, encoder_class, gpu_ids=None)[source]

Manages multi-GPU encoding of amino acid sequences to 3Di tokens.

This class distributes encoding batches across multiple GPUs using asyncio for parallel processing. It handles GPU allocation, load balancing, and error recovery.

Parameters:

model_name (str)
encoder_class (type)
gpu_ids (List[int] | None)

__init__(model_name, encoder_class, gpu_ids=None)[source]

Initialize multi-GPU encoder.

Parameters:

model_name (str) – HuggingFace model identifier
encoder_class (type) – Encoder class to instantiate (e.g., ProstT5ThreeDiEncoder)
gpu_ids (List[int] | None) – List of GPU IDs to use. If None, auto-discover available GPUs. If empty list or None after discovery, falls back to single GPU.

property num_gpus: int: Number of GPUs being used.

is_multi_gpu()[source]

Check if using multiple GPUs.

Return type:: bool

async encode_batch_async(encoder_idx, batch)[source]

Encode a single batch on a specific GPU asynchronously.

Parameters:

encoder_idx (int) – Index of encoder/GPU to use
batch (List[IndexedSeq]) – List of IndexedSeq objects to encode

Returns:

Tuple of (original_indices, encoded_3di_sequences)

Return type:

Tuple[List[int], List[str]]

async encode_all_batches_async(batches, total_sequences)[source]

Encode all batches across multiple GPUs asynchronously.

Parameters:

batches (List[List[IndexedSeq]]) – List of batches to encode
total_sequences (int) – Total number of sequences

Returns:

List of encoded 3Di sequences in original input order

Raises:

EncodingError – If encoding fails

Return type:

List[str]

encode_multi_gpu(aa_sequences, token_budget_batches_fn, encoding_size, skip_model_loading=False)[source]

Encode sequences using multiple GPUs.

This is a synchronous wrapper around the async encoding method.

Parameters:

aa_sequences (List[str]) – List of preprocessed amino acid sequences
token_budget_batches_fn (Callable[[List[str], int], Iterator[Any]]) – Function to create batches under token budget
encoding_size (int) – Maximum size (approx. amino acids) per batch
skip_model_loading (bool) – If True, skip model loading (assumes models already loaded). This is useful when the encoder is being reused across multiple calls.

Returns:

List of 3Di token sequences (one per input sequence)

Return type:

List[str]