Installation

System Requirements

Python: 3.10 or higher
Operating System: Linux, macOS, or Windows with WSL
Optional: NVIDIA GPU with CUDA support for accelerated inference

Python Dependencies

The following packages are automatically installed with genome_entropy:

PyTorch >= 2.0.0 (GPU support optional)
Transformers >= 4.30.0 (HuggingFace)
pygenetic-code >= 0.20.0
typer >= 0.9.0
tqdm >= 4.65.0
protobuf >= 6.33.1
sentencepiece >= 0.2.1

Installation Methods

From Source (Recommended)

Clone the repository:

git clone https://github.com/linsalrob/genome_entropy.git
cd genome_entropy

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the package:
```
pip install -e .
```
(Optional) Install development dependencies:
```
pip install -e ".[dev]"
```

Installing External Dependencies

get_orfs Binary

The ORF finder requires the get_orfs binary from https://github.com/linsalrob/get_orfs

Build Instructions:

# Clone the repository
git clone https://github.com/linsalrob/get_orfs.git /tmp/get_orfs
cd /tmp/get_orfs

# Build using CMake
mkdir build && cd build
cmake ..
make
cmake --install . --prefix ..

# Add to PATH
export PATH="/tmp/get_orfs/bin:$PATH"

# Or set environment variable
export GET_ORFS_PATH=/tmp/get_orfs/bin/get_orfs

Requirements for building get_orfs:

C++ compiler (g++ or clang++)
CMake >= 3.10
Make

Verifying Installation

Check that the installation was successful:

# Check CLI is available
genome_entropy --version

# Check get_orfs is available
which get_orfs
# Or check environment variable
echo $GET_ORFS_PATH

# Run a simple test
genome_entropy download --help

GPU Support

CUDA (NVIDIA GPUs)

If you have an NVIDIA GPU with CUDA support, install PyTorch with CUDA:

# For CUDA 11.8
pip install torch --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
pip install torch --index-url https://download.pytorch.org/whl/cu121

Verify CUDA is available:

import torch
print(torch.cuda.is_available())  # Should print True

MPS (Apple Silicon)

For Apple Silicon Macs (M1, M2, M3), MPS acceleration is automatically detected if PyTorch >= 2.0.0 is installed:

import torch
print(torch.backends.mps.is_available())  # Should print True

CPU-Only

For CPU-only systems, no special configuration is needed. The pipeline will automatically use CPU.

Downloading Models

Pre-download the ProstT5 model to avoid delays during first use:

genome_entropy download --model Rostlab/ProstT5_fp16

This downloads the model to your HuggingFace cache directory (typically ~/.cache/huggingface/).

Troubleshooting

ModuleNotFoundError

Error: ModuleNotFoundError: No module named 'genome_entropy'

Solution: Make sure you installed the package:

cd /path/to/genome_entropy
pip install -e .

get_orfs Not Found

Error: get_orfs binary not found

Solution: Install get_orfs and add to PATH or set the GET_ORFS_PATH environment variable:

export GET_ORFS_PATH=/path/to/get_orfs/bin/get_orfs

CUDA Out of Memory

Error: RuntimeError: CUDA out of memory

Solution: Use CPU or reduce batch size:

# Use CPU
genome_entropy run --input data.fasta --output results.json --device cpu

# Or reduce batch size
genome_entropy encode3di --input proteins.json --output 3di.json --batch-size 1

Model Download Fails

Error: Connection errors when downloading models

Solution:

Check internet connection
Verify HuggingFace cache permissions: ls -la ~/.cache/huggingface/

Try downloading manually:

python -c "from transformers import AutoModel; AutoModel.from_pretrained('Rostlab/ProstT5_fp16')"

Permission Errors

Error: Permission denied when installing packages

Solution: Use a virtual environment (recommended) or install with --user flag:

pip install --user -e .

Next Steps

Continue to Quick Start Guide for basic usage examples
See CLI Commands Reference for complete command reference
Read User Guide for detailed pipeline documentation