Full table of contents¶
sourmash
Python API examples- A first example: two k-mers
- Introduction: k-mers, molecule types, and hashing.
- Set operations on hashes
- Creating MinHash sketches programmatically, from genome files
- Plotting dendrograms and matrices
- Saving and loading signature files
- Going from signatures back to MinHash objects and their hashes -
- Advanced features of sourmash MinHash objects -
scaled
andnum
- Working with indexed collections of signatures
sourmash
Python API- Classifying signatures:
search
,gather
, andlca
methods.- Searching for similar samples with
search
. - Analyzing metagenomic samples with
gather
- Taxonomic profiling with sourmash
- Abundance weighting
- What commands should I use?
- Appendix A: how
sourmash gather
works. - Appendix B: sourmash gather and signatures with abundance information
- Appendix C: sourmash gather output examples
- Appendix D: Gather CSV output columns
- Appendix E: Prefetch CSV output columns
- Searching for similar samples with
- Using sourmash from the command line
- sourmash databases - advanced usage information.
- Prepared databases
- Types of databases
- Taxonomic Information (for non-LCA databases)
- Downloading and using the databases
- GTDB R08-RS214 - DNA databases
- Genbank genomes from March 2022
- GTDB R07-RS207 - DNA databases
- GTDB R06-RS202 - DNA databases
- Appendix: database use and construction details
- Appendix: Memory and time requirements
- Appendix: legacy databases
- sourmash plugins via Python entry points
- Developer information
- Frequently Asked Questions (FAQ)
- How is sourmash different from mash?
- What are the drawbacks to FracMinHash and sourmash?
- How can I better understand FracMinHash and sourmash intuitively?
- What papers should I read to better understand the FracMinHash approach used by sourmash?
- What k-mer size(s) should I use with sourmash?
- What scaled values should I use with sourmash?
- What threshold-bp value should I use with
sourmash prefetch
andsourmash gather
? - How do k-mer-based analyses compare with read mapping?
- Can I use sourmash to determine the best reference genome for mapping my reads?
- How do I get the sequences for a particular reference genome from a metagenome, using sourmash?
- How does memory usage for sourmash change with k-mer size?
- Can sourmash run with multiple theads?
- Funding
- Welcome to sourmash!
- An introduction to k-mers for genome comparison and analysis
- Legacy Databases
- Additional information on sourmash
- Using
sourmash
output with R and other languages - Building plots from
sourmash compare
output - Publications
- Releasing a new version of sourmash
- Computational requirements
- sourmash: working with private collections of signatures
- Some sourmash command line examples!
- A guide to the internal design and structure of sourmash
- Signatures and sketches
- K-mer sizes
- Molecule types - DNA, protein, Dayhoff, and hydrophobic-polar
- Manifests
- Index implementations
- Speeding up
gather
andsearch
- Taxonomy and assigning lineages
- Picklists
- Online and streaming; and adding to collections of sketches.
- Formats natively understood by sourmash
sourmash sketch
documentation- Storing SBTs
- Support, Versioning, and Migration
- Installing sourmash
- The first sourmash tutorial - making signatures, comparing, and searching
- Analyzing the genomic and taxonomic composition of an environmental genome using GTDB and sample-specific MAGs with sourmash
- Install sourmash
- Create a working subdirectory
- Download a database and a taxonomy spreadsheet.
- Download and prepare sample reads
- Find matching genomes with
sourmash gather
- Build a taxonomic summary of the metagenome
- Interlude: why reference-based analyses are problematic for environmental metagenomes
- Update gather with information from MAGs
- Classify the taxonomy of the MAGs; update metagenome classification
- Interlude: where we are and what we’ve done so far
- Summary and concluding thoughts
- Analyzing Metagenome Composition using the LIN taxonomic framework
- Quick Insights from Sequencing Data with sourmash
- Objectives
- Introduction to k-mers
- Why k-mers, though? Why not just work with the full read sequences?
- Long k-mers are species specific
- Using k-mers to compare samples
- Installing sourmash
- Creating signatures
- Compare many RNA-seq samples quickly
- Detect Eukaryotic Contamination in Raw RNA Sequencing data
- Compare reads to assemblies
- Make and search a database quickly.
- What’s in my metagenome?
- Final thoughts on sourmash
- Using sourmash LCA to do taxonomic classification
- sourmash tutorials and notebooks
- Using the
LCA_Database
API - Using sourmash: a practical guide