genome_entropy.encode3di.multi_gpu

Multi-GPU asynchronous encoding for protein to 3Di conversion.

Classes

MultiGPUEncoder(model_name, encoder_class[, ...])

Manages multi-GPU encoding of amino acid sequences to 3Di tokens.

class genome_entropy.encode3di.multi_gpu.MultiGPUEncoder(model_name, encoder_class, gpu_ids=None)[source]

Manages multi-GPU encoding of amino acid sequences to 3Di tokens.

This class distributes encoding batches across multiple GPUs using asyncio for parallel processing. It handles GPU allocation, load balancing, and error recovery.

Parameters:
__init__(model_name, encoder_class, gpu_ids=None)[source]

Initialize multi-GPU encoder.

Parameters:
  • model_name (str) – HuggingFace model identifier

  • encoder_class (type) – Encoder class to instantiate (e.g., ProstT5ThreeDiEncoder)

  • gpu_ids (List[int] | None) – List of GPU IDs to use. If None, auto-discover available GPUs. If empty list or None after discovery, falls back to single GPU.

property num_gpus: int

Number of GPUs being used.

is_multi_gpu()[source]

Check if using multiple GPUs.

Return type:

bool

async encode_batch_async(encoder_idx, batch)[source]

Encode a single batch on a specific GPU asynchronously.

Parameters:
  • encoder_idx (int) – Index of encoder/GPU to use

  • batch (List[IndexedSeq]) – List of IndexedSeq objects to encode

Returns:

Tuple of (original_indices, encoded_3di_sequences)

Return type:

Tuple[List[int], List[str]]

async encode_all_batches_async(batches, total_sequences)[source]

Encode all batches across multiple GPUs asynchronously.

Parameters:
  • batches (List[List[IndexedSeq]]) – List of batches to encode

  • total_sequences (int) – Total number of sequences

Returns:

List of encoded 3Di sequences in original input order

Raises:

EncodingError – If encoding fails

Return type:

List[str]

encode_multi_gpu(aa_sequences, token_budget_batches_fn, encoding_size, skip_model_loading=False)[source]

Encode sequences using multiple GPUs.

This is a synchronous wrapper around the async encoding method.

Parameters:
  • aa_sequences (List[str]) – List of preprocessed amino acid sequences

  • token_budget_batches_fn (Callable[[List[str], int], Iterator[Any]]) – Function to create batches under token budget

  • encoding_size (int) – Maximum size (approx. amino acids) per batch

  • skip_model_loading (bool) – If True, skip model loading (assumes models already loaded). This is useful when the encoder is being reused across multiple calls.

Returns:

List of 3Di token sequences (one per input sequence)

Return type:

List[str]