genome_entropy Documentation

Python CI Python 3.10+ MIT License

Welcome to the documentation for genome_entropy, a complete bioinformatics pipeline that converts DNA sequences → ORFs → proteins → 3Di structural tokens, computing Shannon entropy at each representation level.

Overview

genome_entropy enables researchers to:

  • Extract Open Reading Frames (ORFs) from DNA sequences

  • Translate ORFs to protein sequences using customizable genetic codes

  • Predict structural alphabet tokens (3Di) directly from sequences using ProstT5

  • Calculate and compare Shannon entropy at DNA, ORF, protein, and 3Di levels

  • Process data efficiently with GPU acceleration (CUDA, MPS, or CPU)

Key Features

🧬 ORF Finding

Extract Open Reading Frames from DNA sequences using customizable genetic codes

🔄 Translation

Convert ORFs to protein sequences with support for all NCBI genetic code tables

🏗️ 3Di Encoding

Predict structural alphabet tokens directly from sequences using ProstT5

📊 Entropy Analysis

Calculate Shannon entropy at DNA, ORF, protein, and 3Di levels

GPU Acceleration

Auto-detect and use CUDA, MPS (Apple Silicon), or CPU

🔧 Modular CLI

Run complete pipeline or individual steps

📝 Comprehensive Logging

Configurable log levels and output to file or STDOUT

Getting Started

Reference

Development

Citation

If you use this software, please cite:

License

MIT License - see LICENSE file for details.

Indices and tables