Additional information on sourmash

Other MinHash implementations for DNA

In addition to mash, also see:

If you are interested in exactly how these MinHash approaches calculate the hashes of DNA sequences, please see some simple Python code in sourmash, utils/compute-dna-mh-another-way.py.

Blog posts

We have a number of blog posts on sourmash and MinHash more generally:

JSON format for the signature

The JSON format is not necessarily final; this is a TODO item for future releases. In particular, we’d like to update it to store more metadata for samples.

Interoperability with mash

The default sketches computed by sourmash and mash are comparable, but we are still working on ways to convert the file formats.