Additional information on sourmash ================================== Other MinHash implementations for DNA ------------------------------------- In addition to `mash `__, also see: * `RKMH: Read Classification by Kmers `__. * `mashtree `__ for building trees using Mash distances. * `Finch: a Mash implementation in Rust `__. Quote, "Fast sketches, count histograms, better filtering." If you are interested in exactly how these MinHash approaches calculate the hashes of DNA sequences, please see some simple Python code in sourmash, `utils/compute-dna-mh-another-way.py `__. Blog posts ---------- We have a number of blog posts on sourmash and MinHash more generally: * `Applying MinHash to cluster RNAseq samples `__ * `MinHash signatures as ways to find samples, and collaborators? `__ * `Efficiently searching MinHash Sketch collections `__ - indexing and search 42,000 bacterial genomes with Sequence Bloom Trees. * `Quickly searching all the microbial genomes, mark 2 - now with archaea, phage, fungi, and protists! `__ - indexing and searching 50,000 microbial genomes, round 2. * `What metadata should we put in MinHash Sketch signatures? `__ - crowdsourcing ideas for what metadata belongs in a signature file. * `Minhashing all the things (part 1): microbial genomes `__ - on approaches to computing MinHashes for large collections of public data. JSON format for the signature ----------------------------- The JSON format is not necessarily final; this is a TODO item for future releases. In particular, we'd like to update it to store more metadata for samples. Interoperability with mash -------------------------- The default sketches computed by sourmash and mash are comparable, but we are still `working on ways to convert the file formats `__. Developing sourmash ------------------- Please see: .. toctree:: :maxdepth: 2 developer release