Below are publications from the sourmash team.
Lightweight compositional analysis of metagenomes with FracMinHash and minimum metagenome covers, Irber et al., 2022. This is the core technical paper describing both FracMinHash and
Large-scale sequence comparisons with sourmash, Pierce et al., 2019. This is the original sourmash use case paper.
Evaluation and benchmarking¶
Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets, Portik et al., 2022. This paper shows that sourmash is extremely sensitive and very specific for taxonomic profiling.
Biogeographic Distribution of Five Antarctic Cyanobacteria Using Large-Scale k-mer Searching with sourmash branchwater, Lumian et al., 2022. This paper uses sourmash and branchwater to search ~500,000 public metagenomes for 5 query genomes, validates the results using mapping, and discusses the biogeography of the query species.
Sourmash Branchwater Enables Lightweight Petabyte-Scale Sequence Search, Irber et al., 2022. This paper describes the technical underpinnings of the first version of sourmash branchwater, for petabase-scale search.
Advanced uses of sourmash¶
Single-cell transcriptomics for the 99.9% of species without reference genomes, Botvinnik et al., 2021. This paper uses sourmash (and many other techniques!) to analyze single cell data from the Chinese horseshoe bat.
Meta-analysis of metagenomes via machine learning and assembly graphs reveals strain switches in Crohn’s disease, Reiter et al., 2022. This paper uses sourmash and spacegraphcats to detect and analyze strain-specific signals in fecal microbiomes from the iHMP.
Protein k-mers enable assembly-free microbial metapangenomics, Reiter et al., 2022. This paper develops a technique to use protein k-mers to analyze metapangenome graphs from metagenomes.
Dr. Luiz Irber’s PhD thesis, Decentralizing Indices for Genomic Data, describes several additional features of the sourmash ecosystem, including wort, which monitors the SRA for new data sets and sketches them automatically.