genome_entropy.entropy.shannon

Shannon entropy calculation for sequences.

Functions

calculate_entropies_for_sequences(sequences)

Calculate entropy for multiple sequences.

calculate_sequence_entropy(sequence[, ...])

Calculate entropy for a biological sequence.

shannon_entropy(sequence[, alphabet, normalize])

Calculate Shannon entropy of a sequence.

Classes

EntropyReport(dna_entropy_global, ...)

Report containing entropy values at different representation levels.

class genome_entropy.entropy.shannon.EntropyReport(dna_entropy_global, orf_nt_entropy, protein_aa_entropy, three_di_entropy, alphabet_sizes)[source]

Report containing entropy values at different representation levels.

Parameters:
dna_entropy_global

Entropy of the entire input DNA sequence

Type:

float

orf_nt_entropy

Dictionary mapping ORF IDs to their nucleotide entropy

Type:

Dict[str, float]

protein_aa_entropy

Dictionary mapping ORF IDs to their amino acid entropy

Type:

Dict[str, float]

three_di_entropy

Dictionary mapping ORF IDs to their 3Di token entropy

Type:

Dict[str, float]

alphabet_sizes

Dictionary with alphabet sizes for each representation

Type:

Dict[str, int]

dna_entropy_global: float
orf_nt_entropy: Dict[str, float]
protein_aa_entropy: Dict[str, float]
three_di_entropy: Dict[str, float]
alphabet_sizes: Dict[str, int]
__init__(dna_entropy_global, orf_nt_entropy, protein_aa_entropy, three_di_entropy, alphabet_sizes)
Parameters:
Return type:

None

genome_entropy.entropy.shannon.shannon_entropy(sequence, alphabet=None, normalize=False)[source]

Calculate Shannon entropy of a sequence.

Shannon entropy: H = -Σ(p_i × log₂(p_i)) where p_i is the frequency of symbol i.

Parameters:
  • sequence (str) – String to calculate entropy for

  • alphabet (Set[str] | None) – Optional set of symbols in the alphabet for normalization

  • normalize (bool) – If True, normalize entropy by max possible entropy (log₂|alphabet|)

Returns:

Shannon entropy value (bits) - Returns 0.0 for empty sequences - Returns normalized entropy in [0, 1] if normalize=True

Return type:

float

Examples

>>> shannon_entropy("AAAA")
0.0
>>> shannon_entropy("ACGT")
2.0
>>> shannon_entropy("ACGT", normalize=True, alphabet=set("ACGT"))
1.0
genome_entropy.entropy.shannon.calculate_sequence_entropy(sequence, alphabet=None, normalize=False)[source]

Calculate entropy for a biological sequence.

Convenience wrapper around shannon_entropy that handles common preprocessing (e.g., converting to uppercase).

Parameters:
  • sequence (str) – Biological sequence (DNA, protein, 3Di tokens)

  • alphabet (Set[str] | None) – Optional alphabet for normalization

  • normalize (bool) – Whether to normalize by alphabet size

Returns:

Shannon entropy in bits (or normalized to [0, 1])

Return type:

float

genome_entropy.entropy.shannon.calculate_entropies_for_sequences(sequences, alphabet=None, normalize=False)[source]

Calculate entropy for multiple sequences.

Parameters:
  • sequences (Dict[str, str]) – Dictionary mapping IDs to sequences

  • alphabet (Set[str] | None) – Optional alphabet for normalization

  • normalize (bool) – Whether to normalize by alphabet size

Returns:

Dictionary mapping IDs to entropy values

Return type:

Dict[str, float]