genome_entropy.io.fasta

FASTA file reading and writing utilities.

Functions

read_fasta(fasta_path)

Read a FASTA file and return a dictionary of sequence_id -> sequence.

read_fasta_iter(fasta_path)

Read a FASTA file and yield (sequence_id, sequence) tuples.

write_fasta(sequences, output_path[, line_width])

Write sequences to a FASTA file.

genome_entropy.io.fasta.read_fasta(fasta_path)[source]

Read a FASTA file and return a dictionary of sequence_id -> sequence.

Automatically detects and handles gzipped files (ending in .gz).

Parameters:

fasta_path (str | Path) – Path to FASTA file (plain text or gzipped)

Returns:

Dictionary mapping sequence IDs to sequences

Raises:
Return type:

Dict[str, str]

genome_entropy.io.fasta.read_fasta_iter(fasta_path)[source]

Read a FASTA file and yield (sequence_id, sequence) tuples.

Memory-efficient iterator for large FASTA files. Automatically detects and handles gzipped files (ending in .gz).

Parameters:

fasta_path (str | Path) – Path to FASTA file (plain text or gzipped)

Yields:

Tuples of (sequence_id, sequence)

Raises:
Return type:

Iterator[Tuple[str, str]]

genome_entropy.io.fasta.write_fasta(sequences, output_path, line_width=80)[source]

Write sequences to a FASTA file.

Automatically compresses output if filename ends with .gz.

Parameters:
  • sequences (Dict[str, str]) – Dictionary mapping sequence IDs to sequences

  • output_path (str | Path) – Path to output FASTA file (plain text or .gz for compressed)

  • line_width (int) – Maximum line width for sequence lines (default: 80)

Return type:

None