genome_entropy.encode3di.encoding
Core encoding functions for amino acid to 3Di conversion.
Functions
|
Encode amino acid sequences to 3Di tokens. |
|
Format seconds as H:MM:SS (or M:SS for < 1 hour). |
Get current CUDA memory allocation and reservation in GB. |
|
|
Preprocess amino acid sequences for ProstT5 encoding. |
|
Process batches of sequences and return results in original order. |
- genome_entropy.encode3di.encoding.preprocess_sequences(aa_sequences)[source]
Preprocess amino acid sequences for ProstT5 encoding.
- genome_entropy.encode3di.encoding.format_seconds(seconds)[source]
Format seconds as H:MM:SS (or M:SS for < 1 hour).
- genome_entropy.encode3di.encoding.get_memory_info()[source]
Get current CUDA memory allocation and reservation in GB.
- genome_entropy.encode3di.encoding.process_batches(batches_iter, encode_batch_fn, total_sequences, total_batches)[source]
Process batches of sequences and return results in original order.
- Parameters:
- Returns:
List of encoded 3Di sequences in original input order
- Raises:
EncodingError – If encoding fails
RuntimeError – If some sequences were not encoded
- Return type:
- genome_entropy.encode3di.encoding.encode(aa_sequences, encode_batch_fn, token_budget_batches_fn, encoding_size)[source]
Encode amino acid sequences to 3Di tokens.
This is a standalone encoding function that orchestrates the encoding pipeline.
- Parameters:
aa_sequences (List[str]) – List of amino acid sequences (uppercase, standard 20 AAs)
encode_batch_fn (Callable[[List[str]], List[str]]) – Function that encodes a batch of preprocessed sequences
token_budget_batches_fn (Callable[[List[str], int], Iterator[Any]]) – Function that batches sequences under token budget
encoding_size (int) – Maximum size (approx. amino acids) to encode per batch
- Returns:
List of 3Di token sequences (one per input sequence)
- Raises:
EncodingError – If encoding fails
- Return type: