View on GitHub


Robs manual for the computational genomics and bioinformatics class.


Kraken2 uses k-mers to identify the taxonomy of the microbes in your sample. In essence, they have taken all complete genomes, and then identified all k-mers that are unique to each taxonomic level. Through some nifty computing, and special data structures, they have figured out how to search this very efficiently.

There are a wide range of pre-built kraken databases that you can download, so you do not need to go to the effort of building them yourself.

When installing Kraken2, I recommend setting the KRAKEN2_DB_PATH and KRAKEN2_DEFAULT_DB variables, and then you do not need to specify them on the command line.

To run Kraken2, use this incantation:

kraken2 --paired --threads 4 --report kraken_taxonomy.txt --output kraken_output.txt \
	fastq/reads_1.fastq fastq/reads_2.fastq

This will output two files:

For more information about Kraken2, see the wiki page