Prepared databases

Modern databases

We provide a number of pre-built collections and indexed databases that you can use with sourmash. As of August 2025, we provide databases in zip and RocksDB formats; older databases are available in a variety of legacy formats.

GTDB RS220 – Bacterial and Archaeal genomes from GTDB RS220.

GTDB RS226 – Bacterial and Archaeal genomes from GTDB RS226.

NCBI Viruses (Jan 2025) – All viruses from NCBI (NCBI:txid10239) as of January 2025.

NCBI Eukaryotes (Jan 2025) – All eukaryotic reference genomes from NCBI (NCBI:txid2759) as of January 2025.

Database formats and sourmash versions

Zip format databases can be used with sourmash v4.1.0 and later (May 2021), while RocksDB databases can be used with sourmash v4.9.0 and later (May 2025). All older database formats work with these versions of sourmash as well, and we always recommend using the latest version available.

Legacy database information (2024 and before)

Legacy databases are available here.