{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# sourmash: working with private collections of signatures\n", "\n", "### Running this notebook.\n", "\n", "You can run this notebook interactively via mybinder; click on this button:\n", "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dib-lab/sourmash/latest?labpath=doc%2Fsourmash-collections.ipynb)\n", "\n", "A rendered version of this notebook is available at [sourmash.readthedocs.io](https://sourmash.readthedocs.io) under \"Tutorials and notebooks\".\n", "\n", "You can also get this notebook from the [doc/ subdirectory of the sourmash github repository](https://github.com/dib-lab/sourmash/tree/latest/doc). See [binder/environment.yaml](https://github.com/dib-lab/sourmash/blob/latest/binder/environment.yml) for installation dependencies.\n", "\n", "### What is this?\n", "\n", "This is a Jupyter Notebook using Python 3. If you are running this via [binder](https://mybinder.org), you can use Shift-ENTER to run cells, and double click on code cells to edit them.\n", "\n", "Contact: C. Titus Brown, ctbrown@ucdavis.edu. Please [file issues on GitHub](https://github.com/dib-lab/sourmash/issues/) if you have any questions or comments!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## download a bunch of genomes" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/t/dev/sourmash/doc/big_genomes\n", " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 459 100 459 0 0 1017 0 --:--:-- --:--:-- --:--:-- 1017\n", "100 61.1M 100 61.1M 0 0 2932k 0 0:00:21 0:00:21 --:--:-- 3468k\n" ] } ], "source": [ "!mkdir -p big_genomes\n", "!curl -L https://osf.io/8uxj9/?action=download | (cd big_genomes && tar xzf -)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## compute signatures for each file" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/t/dev/sourmash/doc/big_genomes\n", "\u001b[K\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kcomputing signatures for files: 0.fa, 1.fa, 10.fa, 11.fa, 12.fa, 13.fa, 14.fa, 15.fa, 16.fa, 17.fa, 18.fa, 19.fa, 2.fa, 20.fa, 21.fa, 22.fa, 23.fa, 24.fa, 25.fa, 26.fa, 27.fa, 28.fa, 29.fa, 3.fa, 30.fa, 31.fa, 32.fa, 33.fa, 34.fa, 35.fa, 36.fa, 37.fa, 38.fa, 39.fa, 4.fa, 40.fa, 41.fa, 42.fa, 43.fa, 44.fa, 45.fa, 46.fa, 47.fa, 48.fa, 49.fa, 5.fa, 50.fa, 51.fa, 52.fa, 53.fa, 54.fa, 55.fa, 56.fa, 57.fa, 58.fa, 59.fa, 6.fa, 60.fa, 61.fa, 62.fa, 63.fa, 7.fa, 8.fa, 9.fa\n", "\u001b[KComputing a total of 1 signature(s).\n", "\u001b[K... reading sequences from 0.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 0.fa\n", "\u001b[Ksaved signature(s) to 0.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 1.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 1.fa\n", "\u001b[Ksaved signature(s) to 1.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 10.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 10.fa\n", "\u001b[Ksaved signature(s) to 10.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 11.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 11.fa\n", "\u001b[Ksaved signature(s) to 11.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 12.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 12.fa\n", "\u001b[Ksaved signature(s) to 12.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 13.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 13.fa\n", "\u001b[Ksaved signature(s) to 13.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 14.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 14.fa\n", "\u001b[Ksaved signature(s) to 14.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 15.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 15.fa\n", "\u001b[Ksaved signature(s) to 15.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 16.fa\n", "\u001b[Kcalculated 1 signatures for 4 sequences in 16.fa\n", "\u001b[Ksaved signature(s) to 16.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 17.fa\n", "\u001b[Kcalculated 1 signatures for 2 sequences in 17.fa\n", "\u001b[Ksaved signature(s) to 17.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 18.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 18.fa\n", "\u001b[Ksaved signature(s) to 18.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 19.fa\n", "\u001b[Kcalculated 1 signatures for 9 sequences in 19.fa\n", "\u001b[Ksaved signature(s) to 19.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 2.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 2.fa\n", "\u001b[Ksaved signature(s) to 2.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 20.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 20.fa\n", "\u001b[Ksaved signature(s) to 20.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 21.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 21.fa\n", "\u001b[Ksaved signature(s) to 21.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 22.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 22.fa\n", "\u001b[Ksaved signature(s) to 22.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 23.fa\n", "\u001b[Kcalculated 1 signatures for 5 sequences in 23.fa\n", "\u001b[Ksaved signature(s) to 23.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 24.fa\n", "\u001b[Kcalculated 1 signatures for 3 sequences in 24.fa\n", "\u001b[Ksaved signature(s) to 24.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 25.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 25.fa\n", "\u001b[Ksaved signature(s) to 25.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 26.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 26.fa\n", "\u001b[Ksaved signature(s) to 26.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 27.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 27.fa\n", "\u001b[Ksaved signature(s) to 27.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 28.fa\n", "\u001b[Kcalculated 1 signatures for 3 sequences in 28.fa\n", "\u001b[Ksaved signature(s) to 28.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 29.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 29.fa\n", "\u001b[Ksaved signature(s) to 29.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 3.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 3.fa\n", "\u001b[Ksaved signature(s) to 3.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 30.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 30.fa\n", "\u001b[Ksaved signature(s) to 30.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 31.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 31.fa\n", "\u001b[Ksaved signature(s) to 31.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 32.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 32.fa\n", "\u001b[Ksaved signature(s) to 32.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 33.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 33.fa\n", "\u001b[Ksaved signature(s) to 33.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 34.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 34.fa\n", "\u001b[Ksaved signature(s) to 34.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 35.fa\n", "\u001b[Kcalculated 1 signatures for 7 sequences in 35.fa\n", "\u001b[Ksaved signature(s) to 35.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 36.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 36.fa\n", "\u001b[Ksaved signature(s) to 36.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 37.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 37.fa\n", "\u001b[Ksaved signature(s) to 37.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 38.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 38.fa\n", "\u001b[Ksaved signature(s) to 38.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 39.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 39.fa\n", "\u001b[Ksaved signature(s) to 39.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 4.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 4.fa\n", "\u001b[Ksaved signature(s) to 4.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 40.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 40.fa\n", "\u001b[Ksaved signature(s) to 40.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 41.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 41.fa\n", "\u001b[Ksaved signature(s) to 41.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 42.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 42.fa\n", "\u001b[Ksaved signature(s) to 42.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 43.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 43.fa\n", "\u001b[Ksaved signature(s) to 43.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 44.fa\n", "\u001b[Kcalculated 1 signatures for 2 sequences in 44.fa\n", "\u001b[Ksaved signature(s) to 44.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 45.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 45.fa\n", "\u001b[Ksaved signature(s) to 45.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 46.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 46.fa\n", "\u001b[Ksaved signature(s) to 46.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 47.fa\n", "\u001b[Kcalculated 1 signatures for 2 sequences in 47.fa\n", "\u001b[Ksaved signature(s) to 47.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 48.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 48.fa\n", "\u001b[Ksaved signature(s) to 48.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 49.fa\n", "\u001b[Kcalculated 1 signatures for 228 sequences in 49.fa\n", "\u001b[Ksaved signature(s) to 49.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 5.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 5.fa\n", "\u001b[Ksaved signature(s) to 5.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 50.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 50.fa\n", "\u001b[Ksaved signature(s) to 50.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 51.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 51.fa\n", "\u001b[Ksaved signature(s) to 51.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 52.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 52.fa\n", "\u001b[Ksaved signature(s) to 52.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 53.fa\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[Kcalculated 1 signatures for 1 sequences in 53.fa\n", "\u001b[Ksaved signature(s) to 53.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 54.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 54.fa\n", "\u001b[Ksaved signature(s) to 54.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 55.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 55.fa\n", "\u001b[Ksaved signature(s) to 55.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 56.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 56.fa\n", "\u001b[Ksaved signature(s) to 56.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 57.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 57.fa\n", "\u001b[Ksaved signature(s) to 57.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 58.fa\n", "\u001b[Kcalculated 1 signatures for 30 sequences in 58.fa\n", "\u001b[Ksaved signature(s) to 58.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 59.fa\n", "\u001b[Kcalculated 1 signatures for 5 sequences in 59.fa\n", "\u001b[Ksaved signature(s) to 59.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 6.fa\n", "\u001b[Kcalculated 1 signatures for 76 sequences in 6.fa\n", "\u001b[Ksaved signature(s) to 6.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 60.fa\n", "\u001b[Kcalculated 1 signatures for 11 sequences in 60.fa\n", "\u001b[Ksaved signature(s) to 60.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 61.fa\n", "\u001b[Kcalculated 1 signatures for 47 sequences in 61.fa\n", "\u001b[Ksaved signature(s) to 61.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 62.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 62.fa\n", "\u001b[Ksaved signature(s) to 62.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 63.fa\n", "\u001b[Kcalculated 1 signatures for 4 sequences in 63.fa\n", "\u001b[Ksaved signature(s) to 63.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 7.fa\n", "\u001b[Kcalculated 1 signatures for 3 sequences in 7.fa\n", "\u001b[Ksaved signature(s) to 7.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 8.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in 8.fa\n", "\u001b[Ksaved signature(s) to 8.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from 9.fa\n", "\u001b[Kcalculated 1 signatures for 3 sequences in 9.fa\n", "\u001b[Ksaved signature(s) to 9.fa.sig. Note: signature license is CC0.\n" ] } ], "source": [ "!cd big_genomes/ && sourmash sketch dna -p k=31,scaled=1000 --name-from-first *.fa" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compare them all" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kloaded 1 sigs from 'big_genomes/0.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/1.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/10.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/11.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/12.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/13.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/14.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/15.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/16.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/17.fa.sig'10 sigs total\n", "\u001b[Kloaded 1 sigs from 'big_genomes/18.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/19.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/2.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/20.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/21.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/22.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/23.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/24.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/25.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/26.fa.sig'20 sigs total\n", "\u001b[Kloaded 1 sigs from 'big_genomes/27.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/28.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/29.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/3.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/30.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/31.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/32.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/33.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/34.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/35.fa.sig'30 sigs total\n", "\u001b[Kloaded 1 sigs from 'big_genomes/36.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/37.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/38.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/39.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/4.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/40.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/41.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/42.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/43.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/44.fa.sig'40 sigs total\n", "\u001b[Kloaded 1 sigs from 'big_genomes/45.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/46.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/47.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/48.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/49.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/5.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/50.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/51.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/52.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/53.fa.sig'50 sigs total\n", "\u001b[Kloaded 1 sigs from 'big_genomes/54.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/55.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/56.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/57.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/58.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/59.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/6.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/60.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/61.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/62.fa.sig'60 sigs total\n", "\u001b[Kloaded 1 sigs from 'big_genomes/63.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/7.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/8.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/9.fa.sig'g'\n", "\u001b[Kloaded 64 signatures total. \n", "\u001b[K\n", "min similarity in matrix: 0.000\n", "\u001b[Ksaving labels to: compare_all.mat.labels.txt\n", "\u001b[Ksaving comparison matrix to: compare_all.mat\n", "\u001b[K\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kloading comparison matrix from compare_all.mat...\n", "\u001b[K...got 64 x 64 matrix.\n", "\u001b[Kloading labels from compare_all.mat.labels.txt\n", "\u001b[Ksaving histogram of matrix values => compare_all.mat.hist.png\n", "\u001b[Kwrote dendrogram to: compare_all.mat.dendro.png\n", "\u001b[Kwrote numpy distance matrix to: compare_all.mat.matrix.png\n" ] } ], "source": [ "!sourmash compare big_genomes/*.sig -o compare_all.mat\n", "!sourmash plot compare_all.mat" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import Image\n", "Image(filename='compare_all.mat.matrix.png') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## make a fast(er) search database for all of them" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kloading 64 files into SBT\n", "\u001b[Kloaded 1 sigs from 'big_genomes/0.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/1.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/10.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/11.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/12.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/13.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/14.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/15.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/16.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/17.fa.sig'10 sigs total\n", "\u001b[Kloaded 1 sigs from 'big_genomes/18.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/19.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/2.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/20.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/21.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/22.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/23.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/24.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/25.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/26.fa.sig'20 sigs total\n", "\u001b[Kloaded 1 sigs from 'big_genomes/27.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/28.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/29.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/3.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/30.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/31.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/32.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/33.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/34.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/35.fa.sig'30 sigs total\n", "\u001b[Kloaded 1 sigs from 'big_genomes/36.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/37.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/38.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/39.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/4.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/40.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/41.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/42.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/43.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/44.fa.sig'40 sigs total\n", "\u001b[Kloaded 1 sigs from 'big_genomes/45.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/46.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/47.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/48.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/49.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/5.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/50.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/51.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/52.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/53.fa.sig'50 sigs total\n", "\u001b[Kloaded 1 sigs from 'big_genomes/54.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/55.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/56.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/57.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/58.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/59.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/6.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/60.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/61.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/62.fa.sig'60 sigs total\n", "\u001b[Kloaded 1 sigs from 'big_genomes/63.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/7.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/8.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'big_genomes/9.fa.sig'g'\n", "\u001b[K\n", "\u001b[Kloaded 64 sigs; saving SBT under \"all-genomes\"\n", "\u001b[KFinished saving nodes, now saving SBT index file.\n", "\u001b[KFinished saving SBT index, available at /Users/t/dev/sourmash/doc/all-genomes.sbt.zip\n", "\n" ] } ], "source": [ "!sourmash index -k 31 all-genomes big_genomes/*.sig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can now use this to search, and gather." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r", "\u001b[K\r\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\r\n", "\r", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\r\n", "\r\n", "\r", "\u001b[KCannot open file 'shew_os185.fa.sig'\r\n" ] } ], "source": [ "!sourmash search shew_os185.fa.sig all-genomes --threshold=0.001" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kcomputing signatures for files: fake-metagenome.fa\n", "\u001b[KComputing a total of 1 signature(s).\n", "\u001b[K... reading sequences from fake-metagenome.fa\n", "\u001b[Kcalculated 1 signatures for 3 sequences in fake-metagenome.fa\n", "\u001b[Ksaved signature(s) to fake-metagenome.fa.sig. Note: signature license is CC0.\n" ] } ], "source": [ "# (make fake metagenome again, just in case)\n", "!cat genomes/*.fa > fake-metagenome.fa\n", "!rm -f fake-metagenome.fa.sig\n", "!sourmash sketch dna -p k=31,scaled=1000 fake-metagenome.fa" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kselect query k=31 automatically.\n", "\u001b[Kloaded query: fake-metagenome.fa... (k=31, DNA)\n", "\u001b[Kloaded 1 databases. \n", "\n", "\n", "overlap p_query p_match\n", "--------- ------- -------\n", "0.5 Mbp 42.2% 10.5% NC_011663.1 Shewanella baltica OS223,...\n", "499.0 kbp 38.4% 18.5% CP001071.1 Akkermansia muciniphila AT...\n", "0.5 Mbp 19.4% 4.9% NC_009665.1 Shewanella baltica OS185,...\n", "\n", "found 3 matches total;\n", "the recovered matches hit 100.0% of the query\n", "\n" ] } ], "source": [ "!sourmash gather fake-metagenome.fa.sig all-genomes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## build a database with taxonomic information --\n", "\n", "for this, we need to provide a metadata file that contains accession => tax information." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
accessiontaxidsuperkingdomphylumclassorderfamilygenusspeciesstrain
0AE000782224325ArchaeaEuryarchaeotaArchaeoglobiArchaeoglobalesArchaeoglobaceaeArchaeoglobusArchaeoglobus fulgidusArchaeoglobus fulgidus DSM 4304
1NC_000909243232ArchaeaEuryarchaeotaMethanococciMethanococcalesMethanocaldococcaceaeMethanocaldococcusMethanocaldococcus jannaschiiMethanocaldococcus jannaschii DSM 2661
2NC_003272103690BacteriaCyanobacteriaNaNNostocalesNostocaceaeNostocNostoc sp. PCC 7120NaN
3AE009441178306ArchaeaCrenarchaeotaThermoproteiThermoprotealesThermoproteaceaePyrobaculumPyrobaculum aerophilumPyrobaculum aerophilum str. IM2
4AE009950186497ArchaeaEuryarchaeotaThermococciThermococcalesThermococcaceaePyrococcusPyrococcus furiosusPyrococcus furiosus DSM 3638
.................................
59NZ_ABZS01000228432331BacteriaAquificaeAquificaeAquificalesHydrogenothermaceaeSulfurihydrogenibiumSulfurihydrogenibium yellowstonenseSulfurihydrogenibium yellowstonense SS-5
60NZ_JGWU010000011458259BacteriaProteobacteriaBetaproteobacteriaBurkholderialesAlcaligenaceaeBordetellaBordetella bronchisepticaBordetella bronchiseptica D989
61NZ_FWDH0100000331899BacteriaFirmicutesClostridiaThermoanaerobacteralesThermoanaerobacterales Family III. Incertae SedisCaldicellulosiruptorCaldicellulosiruptor besciiNaN
62NC_009972316274BacteriaChloroflexiChloroflexiaHerpetosiphonalesHerpetosiphonaceaeHerpetosiphonHerpetosiphon aurantiacusHerpetosiphon aurantiacus DSM 785
63NC_005213228908ArchaeaNanoarchaeotaNaNNanoarchaealesNanoarchaeaceaeNanoarchaeumNanoarchaeum equitansNanoarchaeum equitans Kin4-M
\n", "

64 rows × 10 columns

\n", "
" ], "text/plain": [ " accession taxid superkingdom phylum class \\\n", "0 AE000782 224325 Archaea Euryarchaeota Archaeoglobi \n", "1 NC_000909 243232 Archaea Euryarchaeota Methanococci \n", "2 NC_003272 103690 Bacteria Cyanobacteria NaN \n", "3 AE009441 178306 Archaea Crenarchaeota Thermoprotei \n", "4 AE009950 186497 Archaea Euryarchaeota Thermococci \n", ".. ... ... ... ... ... \n", "59 NZ_ABZS01000228 432331 Bacteria Aquificae Aquificae \n", "60 NZ_JGWU01000001 1458259 Bacteria Proteobacteria Betaproteobacteria \n", "61 NZ_FWDH01000003 31899 Bacteria Firmicutes Clostridia \n", "62 NC_009972 316274 Bacteria Chloroflexi Chloroflexia \n", "63 NC_005213 228908 Archaea Nanoarchaeota NaN \n", "\n", " order family \\\n", "0 Archaeoglobales Archaeoglobaceae \n", "1 Methanococcales Methanocaldococcaceae \n", "2 Nostocales Nostocaceae \n", "3 Thermoproteales Thermoproteaceae \n", "4 Thermococcales Thermococcaceae \n", ".. ... ... \n", "59 Aquificales Hydrogenothermaceae \n", "60 Burkholderiales Alcaligenaceae \n", "61 Thermoanaerobacterales Thermoanaerobacterales Family III. Incertae Sedis \n", "62 Herpetosiphonales Herpetosiphonaceae \n", "63 Nanoarchaeales Nanoarchaeaceae \n", "\n", " genus species \\\n", "0 Archaeoglobus Archaeoglobus fulgidus \n", "1 Methanocaldococcus Methanocaldococcus jannaschii \n", "2 Nostoc Nostoc sp. PCC 7120 \n", "3 Pyrobaculum Pyrobaculum aerophilum \n", "4 Pyrococcus Pyrococcus furiosus \n", ".. ... ... \n", "59 Sulfurihydrogenibium Sulfurihydrogenibium yellowstonense \n", "60 Bordetella Bordetella bronchiseptica \n", "61 Caldicellulosiruptor Caldicellulosiruptor bescii \n", "62 Herpetosiphon Herpetosiphon aurantiacus \n", "63 Nanoarchaeum Nanoarchaeum equitans \n", "\n", " strain \n", "0 Archaeoglobus fulgidus DSM 4304 \n", "1 Methanocaldococcus jannaschii DSM 2661 \n", "2 NaN \n", "3 Pyrobaculum aerophilum str. IM2 \n", "4 Pyrococcus furiosus DSM 3638 \n", ".. ... \n", "59 Sulfurihydrogenibium yellowstonense SS-5 \n", "60 Bordetella bronchiseptica D989 \n", "61 NaN \n", "62 Herpetosiphon aurantiacus DSM 785 \n", "63 Nanoarchaeum equitans Kin4-M \n", "\n", "[64 rows x 10 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas\n", "df = pandas.read_csv('podar-lineage.csv')\n", "df" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[KBuilding LCA database with ksize=31 scaled=10000 moltype=DNA.\n", "\u001b[Kexamining spreadsheet headers...\n", "\u001b[K** assuming column 'accession' is identifiers in spreadsheet\n", "\u001b[K64 distinct identities in spreadsheet out of 64 rows.\n", "\u001b[K64 distinct lineages in spreadsheet out of 64 rows.\n", "\u001b[K... loaded 64 signatures.H01000003.1 Caldicellulo (64 of 64); skipped 0 so far\n", "\u001b[Kloaded 19993 hashes at ksize=31 scaled=10000\n", "\u001b[K64 assigned lineages out of 64 distinct lineages in spreadsheet.\n", "\u001b[K64 identifiers used out of 64 distinct identifiers in spreadsheet.\n", "\u001b[Ksaving to LCA DB: taxdb.lca.json\n" ] } ], "source": [ "!sourmash lca index podar-lineage.csv taxdb big_genomes/*.sig -C 3 --split-identifiers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This database 'taxdb.lca.json' can be used for search and gather as above:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r", "\u001b[K\r\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\r\n", "\r", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\r\n", "\r\n", "\r", "\u001b[Kselect query k=31 automatically.\r\n", "\r", "\u001b[Kloaded query: fake-metagenome.fa... (k=31, DNA)\r\n", "\r", "\u001b[Kloading from taxdb.lca.json...\r", "\r", "\u001b[Kloaded LCA taxdb.lca.json\r", "\r", "\u001b[K \r", "\r", "\u001b[Kloaded 1 databases.\r\n", "\r\n", "\r\n", "overlap p_query p_match\r\n", "--------- ------- -------\r\n", "0.6 Mbp 46.7% 11.6% NC_011663.1 Shewanella baltica OS223,...\r\n", "0.5 Mbp 38.7% 19.3% CP001071.1 Akkermansia muciniphila AT...\r\n", "0.5 Mbp 14.6% 3.9% NC_009665.1 Shewanella baltica OS185,...\r\n", "\r\n", "found 3 matches total;\r\n", "the recovered matches hit 100.0% of the query\r\n", "\r\n" ] } ], "source": [ "!sourmash gather fake-metagenome.fa.sig taxdb.lca.json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "...but can also be used for taxonomic summarization:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r", "\u001b[K\r\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\r\n", "\r", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\r\n", "\r\n", "\r", "\u001b[K\r", "\u001b[K\r", "\u001b[K... loading database taxdb.lca.json\r", "\r", "\u001b[K\r", "\u001b[K\r", "\u001b[Kloaded 1 LCA databases. ksize=31, scaled=10000 moltype=DNA\r\n", "\r", "\u001b[Kfinding query signatures...\r\n", "\r", "\u001b[K\r", "\u001b[K\r", "\u001b[K... loading fake-metagenome.fa (file 1 of 1)\r", "38.7% 53 Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila;Akkermansia muciniphila ATCC BAA-835 fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "38.7% 53 Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "38.7% 53 Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "38.7% 53 Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "38.7% 53 Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "38.7% 53 Bacteria;Verrucomicrobia;Verrucomicrobiae fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "38.7% 53 Bacteria;Verrucomicrobia fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "100.0% 137 Bacteria fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "61.3% 84 Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Shewanellaceae;Shewanella;Shewanella baltica fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "61.3% 84 Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Shewanellaceae;Shewanella fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "61.3% 84 Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Shewanellaceae fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "61.3% 84 Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "61.3% 84 Bacteria;Proteobacteria;Gammaproteobacteria fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "61.3% 84 Bacteria;Proteobacteria fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "22.6% 31 Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Shewanellaceae;Shewanella;Shewanella baltica;Shewanella baltica OS223 fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "14.6% 20 Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Shewanellaceae;Shewanella;Shewanella baltica;Shewanella baltica OS185 fake-metagenome.fa.sig:4e1ac0cf fake-metagenome.fa\r\n", "\r", "\u001b[K\r", "\u001b[K\r", "\u001b[Kloaded 1 signatures from 1 files total.\r\n" ] } ], "source": [ "!sourmash lca summarize --query fake-metagenome.fa.sig --db taxdb.lca.json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other pointers\n", "\n", "[Sourmash: a practical guide](https://sourmash.readthedocs.io/en/latest/using-sourmash-a-guide.html)\n", "\n", "[Classifying signatures taxonomically](https://sourmash.readthedocs.io/en/latest/classifying-signatures.html)\n", "\n", "[Pre-built search databases](https://sourmash.readthedocs.io/en/latest/databases.html)\n", "\n", "## A full list of notebooks\n", "\n", "[An introduction to k-mers for genome comparison and analysis](kmers-and-minhash.ipynb)\n", "\n", "[Some sourmash command line examples!](sourmash-examples.ipynb)\n", "\n", "[Working with private collections of signatures.](sourmash-collections.ipynb)\n", "\n", "[Using the LCA_Database API.](using-LCA-database-API.ipynb)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python (myenv)", "language": "python", "name": "myenv" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 2 }