Welcome to sourmash!¶
sourmash is a command-line tool and Python library for computing MinHash sketches from DNA sequences, comparing them to each other, and plotting the results. This allows you to estimate sequence similarity between even very large data sets quickly and accurately.
To use sourmash, you must be comfortable with the UNIX command line; programmers may find the Python library and API useful as well.
sourmashprovides command line utilities for creating, comparing, and searching MinHash sketches, as well as plotting and clustering sketches by distance (see the command-line docs).
sourmashsupports saving, loading, and communication of MinHash sketches via JSON, a ~human-readable & editable format.
sourmashalso has a simple Python API for interacting with sketches, including support for online updating and querying of sketches (see the API docs).
sourmashisn’t terribly slow, and relies on an underlying CPython module.
sourmashis developed on GitHub and is freely and openly available under the BSD 3-clause license. Please see the README for more information on development, support, and contributing.
- Using sourmash from the command line
- Computational requirements
- Additional information on sourmash