sourmash tutorials and notebooks¶
The first three tutorials!¶
These tutorials are command line tutorials that should work on Mac OS X and Linux. They require about 5 GB of disk space and 5 GB of RAM.
Background and details¶
These next three tutorials are all notebooks that you can view, run yourself, or run interactively online via the binder service.
Advanced tutorials and more information¶
For more information on analyzing sequencing data with sourmash, check out our longer tutorial.
Read using sourmash taxonomy
with the Life Identification Number (LIN) taxonomic framework for some of our newer taxonomic features.
If you are a Python programmer, you might also be interested in our API examples as well as a short guide to Using the LCA_Database
API.
If you prefer R, we have a short guide to using sourmash output with R.
Customizing matrix and dendrogram plots in Python¶
If you’re interested in customizing the output of sourmash plot
,
which produces comparison matrices and dendrograms, please see
Building plots from sourmash compare
output.
Contents:¶
- The first sourmash tutorial - making signatures, comparing, and searching
- Using sourmash LCA to do taxonomic classification
- An introduction to k-mers for genome comparison and analysis
- Some sourmash command line examples!
- sourmash: working with private collections of signatures
- Quick Insights from Sequencing Data with sourmash
- Objectives
- Introduction to k-mers
- Why k-mers, though? Why not just work with the full read sequences?
- Long k-mers are species specific
- Using k-mers to compare samples
- Installing sourmash
- Creating signatures
- Compare many RNA-seq samples quickly
- Detect Eukaryotic Contamination in Raw RNA Sequencing data
- Compare reads to assemblies
- Make and search a database quickly.
- What’s in my metagenome?
- Final thoughts on sourmash
- Analyzing the genomic and taxonomic composition of an environmental genome using GTDB and sample-specific MAGs with sourmash
- Install sourmash
- Create a working subdirectory
- Download a database and a taxonomy spreadsheet.
- Download and prepare sample reads
- Find matching genomes with
sourmash gather
- Build a taxonomic summary of the metagenome
- Interlude: why reference-based analyses are problematic for environmental metagenomes
- Update gather with information from MAGs
- Classify the taxonomy of the MAGs; update metagenome classification
- Interlude: where we are and what we’ve done so far
- Summary and concluding thoughts
sourmash
Python API examples- A first example: two k-mers
- Introduction: k-mers, molecule types, and hashing.
- Set operations on hashes
- Creating MinHash sketches programmatically, from genome files
- Plotting dendrograms and matrices
- Saving and loading signature files
- Going from signatures back to MinHash objects and their hashes -
- Advanced features of sourmash MinHash objects -
scaled
andnum
- Working with indexed collections of signatures
- Using the
LCA_Database
API - Using
sourmash
output with R and other languages