genome_entropy.encode3di.multi_gpu
Multi-GPU asynchronous encoding for protein to 3Di conversion.
Classes
|
Manages multi-GPU encoding of amino acid sequences to 3Di tokens. |
- class genome_entropy.encode3di.multi_gpu.MultiGPUEncoder(model_name, encoder_class, gpu_ids=None)[source]
Manages multi-GPU encoding of amino acid sequences to 3Di tokens.
This class distributes encoding batches across multiple GPUs using asyncio for parallel processing. It handles GPU allocation, load balancing, and error recovery.
- async encode_batch_async(encoder_idx, batch)[source]
Encode a single batch on a specific GPU asynchronously.
- async encode_all_batches_async(batches, total_sequences)[source]
Encode all batches across multiple GPUs asynchronously.
- Parameters:
batches (List[List[IndexedSeq]]) – List of batches to encode
total_sequences (int) – Total number of sequences
- Returns:
List of encoded 3Di sequences in original input order
- Raises:
EncodingError – If encoding fails
- Return type:
- encode_multi_gpu(aa_sequences, token_budget_batches_fn, encoding_size, skip_model_loading=False)[source]
Encode sequences using multiple GPUs.
This is a synchronous wrapper around the async encoding method.
- Parameters:
aa_sequences (List[str]) – List of preprocessed amino acid sequences
token_budget_batches_fn (Callable[[List[str], int], Iterator[Any]]) – Function to create batches under token budget
encoding_size (int) – Maximum size (approx. amino acids) per batch
skip_model_loading (bool) – If True, skip model loading (assumes models already loaded). This is useful when the encoder is being reused across multiple calls.
- Returns:
List of 3Di token sequences (one per input sequence)
- Return type: