Below are publications from the sourmash team.

sourmash fundamentals

Lightweight compositional analysis of metagenomes with FracMinHash and minimum metagenome covers, Irber et al., 2022. This is the core technical paper describing both FracMinHash and sourmash gather.

Large-scale sequence comparisons with sourmash, Pierce et al., 2019. This is the original sourmash use case paper.

Evaluation and benchmarking

Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets, Portik et al., 2022. This paper shows that sourmash is extremely sensitive and very specific for taxonomic profiling.

Advanced uses of sourmash

Single-cell transcriptomics for the 99.9% of species without reference genomes, Botvinnik et al., 2021. This paper uses sourmash (and many other techniques!) to analyze single cell data from the Chinese horseshoe bat.

Meta-analysis of metagenomes via machine learning and assembly graphs reveals strain switches in Crohn’s disease, Reiter et al., 2022. This paper uses sourmash and spacegraphcats to detect and analyze strain-specific signals in fecal microbiomes from the iHMP.

Protein k-mers enable assembly-free microbial metapangenomics, Reiter et al., 2022. This paper develops a technique to use protein k-mers to analyze metapangenome graphs from metagenomes.

Additional works

Dr. Luiz Irber’s PhD thesis, Decentralizing Indices for Genomic Data, describes several additional features of the sourmash ecosystem, including wort, which monitors the SRA for new data sets and sketches them automatically.