genome_entropy
  • Installation
    • System Requirements
    • Python Dependencies
    • Installation Methods
      • From Source (Recommended)
    • Installing External Dependencies
      • get_orfs Binary
    • Verifying Installation
    • GPU Support
      • CUDA (NVIDIA GPUs)
      • MPS (Apple Silicon)
      • CPU-Only
    • Downloading Models
    • Troubleshooting
      • ModuleNotFoundError
      • get_orfs Not Found
      • CUDA Out of Memory
      • Model Download Fails
      • Permission Errors
    • Next Steps
  • Quick Start Guide
    • Prerequisites
    • Basic Usage
      • Complete Pipeline
      • Step-by-Step Pipeline
    • Example Output
    • Common Use Cases
      • Use GPU for Faster Processing
      • Use Different Genetic Code
      • Filter Short ORFs
      • Enable Debug Logging
      • Log to File
      • Pre-download Models
      • Estimate Optimal Token Size
    • Input File Format
    • Tips for Large Datasets
    • Example Workflow
    • Performance Benchmarks
    • Next Steps
  • User Guide
    • Pipeline Overview
    • Understanding ORFs
      • What is an ORF?
      • Reading Frames
      • ORF Properties
    • Genetic Code Tables
      • Common Tables
      • Key Differences
    • Understanding 3Di
      • What is 3Di?
      • Why 3Di?
      • 3Di Alphabet
    • Shannon Entropy
      • What is Entropy?
      • Interpretation
      • Normalized Entropy
      • Entropy in Biology
    • Data Flow
      • Step 1: Input (FASTA)
      • Step 2: ORF Finding
      • Step 3: Translation
      • Step 4: 3Di Encoding
      • Step 5: Entropy Calculation
    • Performance Considerations
      • GPU vs CPU
      • Memory Management
      • Token Size Estimation
    • Best Practices
      • Choosing Parameters
      • Logging
      • Large Datasets
      • Quality Control
    • Common Patterns
      • Entropy Comparisons
      • Structural Predictions
    • Troubleshooting
      • Common Issues
      • Performance Issues
    • Next Steps
  • CLI Commands Reference
    • Global Options
    • Commands
      • run
      • orf
      • translate
      • encode3di
      • entropy
      • download
      • estimate-tokens
    • Common Workflows
      • Standard Analysis
      • Step-by-Step Analysis
      • Optimizing Performance
    • Exit Codes
    • Genetic Code Tables
    • Environment Variables
    • Next Steps
  • API Reference
    • Core Modules
      • genome_entropy.orf
        • genome_entropy.orf.finder
        • genome_entropy.orf.types
      • genome_entropy.translate
        • genome_entropy.translate.translator
      • genome_entropy.encode3di
        • ProstT5ThreeDiEncoder
        • ModernProstThreeDiEncoder
        • ThreeDiRecord
        • IndexedSeq
        • estimate_token_size()
        • generate_random_protein()
        • generate_combined_proteins()
        • genome_entropy.encode3di.encoder
        • genome_entropy.encode3di.encoding
        • genome_entropy.encode3di.gpu_utils
        • genome_entropy.encode3di.modernprost
        • genome_entropy.encode3di.multi_gpu
        • genome_entropy.encode3di.prostt5
        • genome_entropy.encode3di.token_estimator
        • genome_entropy.encode3di.types
      • genome_entropy.entropy
        • genome_entropy.entropy.shannon
      • genome_entropy.pipeline
        • PipelineResult
        • run_pipeline()
        • calculate_pipeline_entropy()
        • UnifiedPipelineResult
        • UnifiedFeature
        • FeatureLocation
        • FeatureDNA
        • FeatureProtein
        • FeatureThreeDi
        • FeatureMetadata
        • FeatureEntropy
        • genome_entropy.pipeline.runner
        • genome_entropy.pipeline.types
      • genome_entropy.io
        • genome_entropy.io.fasta
        • genome_entropy.io.genbank
        • genome_entropy.io.jsonio
    • ORF Finding
      • Types
        • OrfRecord
      • Finder
        • find_orfs()
        • reverse_complement()
    • Translation
      • Translator
        • ProteinRecord
        • translate_orf()
        • translate_orfs()
    • 3Di Encoding
      • ProstT5ThreeDiEncoder
        • ProstT5ThreeDiEncoder.__init__()
        • ProstT5ThreeDiEncoder.token_budget_batches()
        • ProstT5ThreeDiEncoder.encode()
        • ProstT5ThreeDiEncoder.encode_proteins()
      • ModernProstThreeDiEncoder
        • ModernProstThreeDiEncoder.__init__()
        • ModernProstThreeDiEncoder.token_budget_batches()
        • ModernProstThreeDiEncoder.encode()
        • ModernProstThreeDiEncoder.encode_proteins()
      • ThreeDiRecord
        • ThreeDiRecord.protein
        • ThreeDiRecord.three_di
        • ThreeDiRecord.method
        • ThreeDiRecord.model_name
        • ThreeDiRecord.inference_device
        • ThreeDiRecord.protein
        • ThreeDiRecord.three_di
        • ThreeDiRecord.method
        • ThreeDiRecord.model_name
        • ThreeDiRecord.inference_device
        • ThreeDiRecord.__init__()
      • IndexedSeq
        • IndexedSeq.idx
        • IndexedSeq.seq
        • IndexedSeq.__init__()
      • estimate_token_size()
      • generate_random_protein()
      • generate_combined_proteins()
      • Types
        • ThreeDiRecord
        • IndexedSeq
      • Encoder
        • ProstT5ThreeDiEncoder
      • Encoding Functions
        • preprocess_sequences()
        • format_seconds()
        • get_memory_info()
        • process_batches()
        • encode()
      • Token Estimator
        • generate_random_protein()
        • generate_combined_proteins()
        • estimate_token_size()
    • Entropy Calculation
      • Shannon Entropy
        • EntropyReport
        • shannon_entropy()
        • calculate_sequence_entropy()
        • calculate_entropies_for_sequences()
    • Pipeline
      • PipelineResult
        • PipelineResult.input_id
        • PipelineResult.input_dna_length
        • PipelineResult.orfs
        • PipelineResult.proteins
        • PipelineResult.three_dis
        • PipelineResult.entropy
        • PipelineResult.input_id
        • PipelineResult.input_dna_length
        • PipelineResult.orfs
        • PipelineResult.proteins
        • PipelineResult.three_dis
        • PipelineResult.entropy
        • PipelineResult.__init__()
      • run_pipeline()
      • calculate_pipeline_entropy()
      • UnifiedPipelineResult
        • UnifiedPipelineResult.schema_version
        • UnifiedPipelineResult.input_id
        • UnifiedPipelineResult.input_dna_length
        • UnifiedPipelineResult.dna_entropy_global
        • UnifiedPipelineResult.alphabet_sizes
        • UnifiedPipelineResult.features
        • UnifiedPipelineResult.schema_version
        • UnifiedPipelineResult.input_id
        • UnifiedPipelineResult.input_dna_length
        • UnifiedPipelineResult.dna_entropy_global
        • UnifiedPipelineResult.alphabet_sizes
        • UnifiedPipelineResult.features
        • UnifiedPipelineResult.__init__()
      • UnifiedFeature
        • UnifiedFeature.orf_id
        • UnifiedFeature.location
        • UnifiedFeature.dna
        • UnifiedFeature.protein
        • UnifiedFeature.three_di
        • UnifiedFeature.metadata
        • UnifiedFeature.entropy
        • UnifiedFeature.orf_id
        • UnifiedFeature.location
        • UnifiedFeature.dna
        • UnifiedFeature.protein
        • UnifiedFeature.three_di
        • UnifiedFeature.metadata
        • UnifiedFeature.entropy
        • UnifiedFeature.__init__()
      • FeatureLocation
        • FeatureLocation.start
        • FeatureLocation.end
        • FeatureLocation.strand
        • FeatureLocation.frame
        • FeatureLocation.start
        • FeatureLocation.end
        • FeatureLocation.strand
        • FeatureLocation.frame
        • FeatureLocation.__init__()
      • FeatureDNA
        • FeatureDNA.nt_sequence
        • FeatureDNA.length
        • FeatureDNA.nt_sequence
        • FeatureDNA.length
        • FeatureDNA.__init__()
      • FeatureProtein
        • FeatureProtein.aa_sequence
        • FeatureProtein.length
        • FeatureProtein.aa_sequence
        • FeatureProtein.length
        • FeatureProtein.__init__()
      • FeatureThreeDi
        • FeatureThreeDi.encoding
        • FeatureThreeDi.length
        • FeatureThreeDi.method
        • FeatureThreeDi.model_name
        • FeatureThreeDi.inference_device
        • FeatureThreeDi.encoding
        • FeatureThreeDi.length
        • FeatureThreeDi.method
        • FeatureThreeDi.model_name
        • FeatureThreeDi.inference_device
        • FeatureThreeDi.__init__()
      • FeatureMetadata
        • FeatureMetadata.parent_id
        • FeatureMetadata.table_id
        • FeatureMetadata.has_start_codon
        • FeatureMetadata.has_stop_codon
        • FeatureMetadata.in_genbank
        • FeatureMetadata.parent_id
        • FeatureMetadata.table_id
        • FeatureMetadata.has_start_codon
        • FeatureMetadata.has_stop_codon
        • FeatureMetadata.in_genbank
        • FeatureMetadata.__init__()
      • FeatureEntropy
        • FeatureEntropy.dna_entropy
        • FeatureEntropy.protein_entropy
        • FeatureEntropy.three_di_entropy
        • FeatureEntropy.dna_entropy
        • FeatureEntropy.protein_entropy
        • FeatureEntropy.three_di_entropy
        • FeatureEntropy.__init__()
      • Runner
        • PipelineResult
        • run_pipeline()
        • calculate_pipeline_entropy()
    • I/O
      • FASTA I/O
        • read_fasta()
        • read_fasta_iter()
        • write_fasta()
      • JSON I/O
        • to_json_dict()
        • convert_pipeline_result_to_unified()
        • write_json()
        • read_json()
    • Configuration
    • Errors
      • OrfEntropyError
      • ConfigurationError
      • InputError
      • OrfFinderError
      • TranslationError
      • EncodingError
      • ModelError
      • DeviceError
      • PipelineError
    • Logging
      • configure_logging()
      • get_logger()
      • is_configured()
      • get_log_file()
      • get_log_level()
      • set_log_level()
    • Usage Examples
      • ORF Finding
      • Translation
      • 3Di Encoding
      • Token Estimation
      • Shannon Entropy
      • Complete Pipeline
      • I/O Operations
      • Error Handling
      • Custom Logging
      • Advanced: Custom Batching
    • Data Classes
      • OrfRecord
      • ThreeDiRecord
      • EntropyReport
    • Type Hints
    • Next Steps
  • Token Size Estimation for 3Di Encoding
    • Overview
    • Module Structure
      • New Module Organization
      • Key Components
    • Token Size Estimation
      • Purpose
      • Usage
        • Via CLI
        • Via Python API
      • How It Works
      • Output
    • Backward Compatibility
    • Testing
    • Examples
  • Development Guide
    • Setting Up Development Environment
      • Clone and Install
      • Install External Tools
    • Project Structure
    • Code Style and Standards
      • Type Hints
      • Docstrings
      • Code Formatting
      • Linting
      • Type Checking
    • Testing
      • Test Organization
      • Running Tests
      • Writing Tests
      • Integration Tests
    • Git Workflow
      • Branching
      • Commit Messages
      • Pre-commit Checks
    • Adding New Features
      • 1. Design the API
      • 2. Implement Core Logic
      • 3. Add Tests
      • 4. Update Documentation
      • 5. Add CLI Command (if needed)
    • Debugging
      • Using Logging
      • Interactive Debugging
      • Profiling
      • Memory Profiling
    • CI/CD
      • GitHub Actions
      • Local CI Emulation
    • Release Process
      • 1. Update Version
      • 2. Update Changelog
      • 3. Create Release
    • Common Tasks
      • Adding a New Encoder
      • Adding a New Genetic Code
      • Optimizing Performance
    • Resources
    • Getting Help
    • Next Steps
  • Changelog
    • [0.1.0] - 2026-01-19
      • Added
      • Features
      • Known Limitations
    • [Unreleased]
genome_entropy
  • Search


© Copyright 2026, Rob Edwards.

Built with Sphinx using a theme provided by Read the Docs.