{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Some sourmash command line examples!\n", "\n", "[sourmash](https://sourmash.readthedocs.io/en/latest/) is research software from the Lab for Data Intensive Biology at UC Davis. It implements MinHash and modulo hash.\n", "\n", "Below are some examples of using sourmash. They are computed in a Jupyter Notebook so you can run them yourself if you like!\n", "\n", "Sourmash works on *signature files*, which are just saved collections of hashes.\n", "\n", "Let's try it out!\n", "\n", "### Running this notebook.\n", "\n", "You can run this notebook interactively via mybinder; click on this button:\n", "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dib-lab/sourmash/latest?labpath=doc%2Fsourmash-examples.ipynb)\n", "\n", "A rendered version of this notebook is available at [sourmash.readthedocs.io](https://sourmash.readthedocs.io) under \"Tutorials and notebooks\".\n", "\n", "You can also get this notebook from the [doc/ subdirectory of the sourmash github repository](https://github.com/dib-lab/sourmash/tree/latest/doc). See [binder/environment.yaml](https://github.com/dib-lab/sourmash/blob/latest/binder/environment.yml) for installation dependencies.\n", "\n", "### What is this?\n", "\n", "This is a Jupyter Notebook using Python 3. If you are running this via [binder](https://mybinder.org), you can use Shift-ENTER to run cells, and double click on code cells to edit them.\n", "\n", "Contact: C. Titus Brown, ctbrown@ucdavis.edu. Please [file issues on GitHub](https://github.com/dib-lab/sourmash/issues/) if you have any questions or comments!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compute scaled signatures" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kcomputing signatures for files: genomes/akkermansia.fa, genomes/shew_os185.fa, genomes/shew_os223.fa\n", "\u001b[KComputing a total of 1 signature(s).\n", "\u001b[K... reading sequences from genomes/akkermansia.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in genomes/akkermansia.fa\n", "\u001b[Ksaved signature(s) to akkermansia.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from genomes/shew_os185.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in genomes/shew_os185.fa\n", "\u001b[Ksaved signature(s) to shew_os185.fa.sig. Note: signature license is CC0.\n", "\u001b[K... reading sequences from genomes/shew_os223.fa\n", "\u001b[Kcalculated 1 signatures for 1 sequences in genomes/shew_os223.fa\n", "\u001b[Ksaved signature(s) to shew_os223.fa.sig. Note: signature license is CC0.\n" ] } ], "source": [ "!rm -f *.sig\n", "!sourmash sketch dna -p k=21,k=31,k=51,scaled=1000 genomes/*.fa --name-from-first -f" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This outputs three signature files, each containing three signatures (one calculated at k=21, one at k=31, and one at k=51)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "akkermansia.fa.sig shew_os185.fa.sig shew_os223.fa.sig\r\n" ] } ], "source": [ "ls *.sig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now use these signature files for various comparisons." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Search multiple signatures with a query\n", "\n", "The below command queries all of the signature files in the directory with the `shew_os223` signature and finds the best Jaccard similarity:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r", "\u001b[K\r\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\r\n", "\r", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\r\n", "\r\n", "\r", "\u001b[Kselecting specified query k=31\r\n", "\r", "\u001b[Kloaded query: NC_011663.1 Shewanella baltica... (k=31, DNA)\r\n", "\r", "\u001b[Kloading from akkermansia.fa.sig...\r", "\r", "\u001b[Kloaded 1 signatures from akkermansia.fa.sig\r", "\r", "\u001b[Kloading from shew_os185.fa.sig...\r", "\r", "\u001b[Kloaded 1 signatures from shew_os185.fa.sig\r", "\r", "\u001b[Kloading from shew_os223.fa.sig...\r", "\r", "\u001b[Kloaded 1 signatures from shew_os223.fa.sig\r", "\r", "\u001b[K \r", "\r", "\u001b[Kloaded 3 signatures.\r\n", "\r\n", "2 matches:\r\n", "similarity match\r\n", "---------- -----\r\n", "100.0% NC_011663.1 Shewanella baltica OS223, complete genome\r\n", " 22.8% NC_009665.1 Shewanella baltica OS185, complete genome\r\n" ] } ], "source": [ "!sourmash search -k 31 shew_os223.fa.sig *.sig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The below command uses Jaccard containment instead of Jaccard similarity:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kselecting specified query k=31\n", "\u001b[Kloaded query: NC_011663.1 Shewanella baltica... (k=31, DNA)\n", "\u001b[Kloaded 3 signatures. \n", "\n", "2 matches:\n", "similarity match\n", "---------- -----\n", "100.0% NC_011663.1 Shewanella baltica OS223, complete genome\n", " 37.3% NC_009665.1 Shewanella baltica OS185, complete genome\n" ] } ], "source": [ "!sourmash search -k 31 shew_os223.fa.sig *.sig --containment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Performing all-by-all queries\n", "\n", "We can also compare all three signatures:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kloaded 1 sigs from 'akkermansia.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'shew_os185.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'shew_os223.fa.sig'g'\n", "\u001b[Kloaded 3 signatures total. \n", "\u001b[K\n", "0-CP001071.1 Akke...\t[1. 0. 0.]\n", "1-NC_009665.1 She...\t[0. 1. 0.228]\n", "2-NC_011663.1 She...\t[0. 0.228 1. ]\n", "min similarity in matrix: 0.000\n" ] } ], "source": [ "!sourmash compare -k 31 *.sig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "...and produce a similarity matrix that we can use for plotting:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kloaded 1 sigs from 'akkermansia.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'shew_os185.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'shew_os223.fa.sig'g'\n", "\u001b[Kloaded 3 signatures total. \n", "\u001b[K\n", "0-CP001071.1 Akke...\t[1. 0. 0.]\n", "1-NC_009665.1 She...\t[0. 1. 0.228]\n", "2-NC_011663.1 She...\t[0. 0.228 1. ]\n", "min similarity in matrix: 0.000\n", "\u001b[Ksaving labels to: genome_compare.mat.labels.txt\n", "\u001b[Ksaving comparison matrix to: genome_compare.mat\n" ] } ], "source": [ "!sourmash compare -k 31 *.sig -o genome_compare.mat" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kloading comparison matrix from genome_compare.mat...\n", "\u001b[K...got 3 x 3 matrix.\n", "\u001b[Kloading labels from genome_compare.mat.labels.txt\n", "\u001b[Ksaving histogram of matrix values => genome_compare.mat.hist.png\n", "\u001b[Kwrote dendrogram to: genome_compare.mat.dendro.png\n", "\u001b[Kwrote numpy distance matrix to: genome_compare.mat.matrix.png\n", "0\tCP001071.1 Akkermansia muciniphila ATCC BAA-835, complete genome\n", "1\tNC_009665.1 Shewanella baltica OS185, complete genome\n", "2\tNC_011663.1 Shewanella baltica OS223, complete genome\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABEwAAAMgCAYAAAA5kPcVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAwJUlEQVR4nO3df5BX5X0v8M/zXYTV4JKqcQXdIFYdydAkc5emgmVSbVyDjjdJ00rrHUkUOiEYLVJNg6SK1Ia20zCYRDCOWuqUWK4aWzOzJW6vEzXVdCLBTlqdaW9jXKJLGDDjgj9Yge/9A9jcfXYV2HPg7H55vZjzh2e/+3yfL3KY5X3e53lSvV6vBwAAAAD9alVPAAAAAGCkEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAwCj35JNPxuWXXx6TJk2KlFL8wz/8w0G/54knnoj29vZobm6Os846K+66664jP1EYRQQmAAAAo9zrr78eH/rQh+Ib3/jGIb3+xRdfjEsvvTRmzZoVmzZtiptvvjmuv/76ePjhh4/wTGH0SPV6vV71JAAAAChHSikeeeSR+OQnP/mOr/mTP/mTePTRR+OFF17oP7dgwYL4t3/7t3jmmWeOwixh5BtT9QTYZ+/evfHKK6/EiSeeGCmlqqcDDa1er8eOHTti0qRJUasd+aKd6xsARpfD/Vnhrbfeir6+viMyj/xnh3HjxsW4ceMKj/3MM89ER0fHgHOXXHJJ3HvvvfH222/HcccdV/g9YLQTmIwQr7zySrS1tVU9DTimbN68Oc4444wj/j6ubwAYnQ7lZ4W33norjn/Pr0Tsfav09x8/fnzs3LlzwLlbb701li1bVnjsLVu2RGtr64Bzra2tsXv37ti2bVtMnDix8HvAaCcwGSFOPPHEiNj3l3JLS0vFs4HG1tvbG21tbf3X3ZH2y+v7iWhpGX9U3hMO5rnt/7fqKcAAHbP+d9VTgH71vW9HX8+jh/SzQl9fX8Tet2LcxP8ZUSuxlbH37djZ8+igfx+U0S45IG+vHFitQSMW9hGYjBAH/lJqaWkRmMBRcrR+GPjl9T1eYMKIMb7vhKqnAAOkMv+hCSU5nJ8VUtO4Uv8c19O+R4GO1L8PTjvttNiyZcuAc1u3bo0xY8bEySefXPr7wWhklxwAAIBjzIwZM6Krq2vAucceeyymT59u/RLYT2ACAABQUIoUKWolHofXhN25c2c899xz8dxzz0XEvm2Dn3vuueju7o6IiCVLlsTcuXP7X79gwYJ46aWXYvHixfHCCy/EfffdF/fee2/ceOONpf2ewGjnkRwAAIBR7tlnn40LL7yw/78XL14cERGf+cxnYu3atdHT09MfnkRETJkyJTo7O+OGG26IO++8MyZNmhRf+9rX4tOf/vRRnzuMVAITAACAglKqRUolFvgPc6zf+q3f6l+0dShr164ddO6jH/1o/OhHPzrcmcExwyM5AAAAABkNEwAAgIKqbpgA5XMVAgAAAGQ0TAAAAApKKUVKh7ezzUEGLG8sYFg0TAAAAAAyGiYAAACF1aLc+9HubUPVXIUAAAAAGQ0TAACAguySA41HYAIAAFCQwAQaj6sQAAAAIKNhAgAAUFCKWiSLvkJDcRUCAAAAZDRMAAAACrKGCTQeVyEAAABARsMEAACgoBQlN0zc24bKuQoBAAAAMhomAAAARaVUasOknlJpYwHDo2ECAAAAkNEwAQAAKCjt/1XmeEC1NEwAAAAAMhomAAAABaVU7i455e64AwyHqxAAAAAgo2ECAABQkIYJNB6BCQAAQEECE2g8rkIAAACAjIYJAABAYbUo9360e9tQNVchAAAAQEbDBAAAoCBrmEDjcRUCAAAAZDRMAAAACtIwgcbjKgQAAADIaJgAAAAUlKIWqcT70WWOBQyPqxAAAAAgo2ECAABQUEqp5DVMUmljAcOjYQIAAACQ0TABAAAoaF/DpLxWiIYJVE/DBAAAACCjYQIAAFBQSrWS1zBxbxuqJjABAAAoyLbC0HhchQAAAAAZDRMAAICCPJIDjcdVCAAAAJDRMAEAAChIwwQaj6sQAAAAIKNhAgAAUJBdcqDxuAoBAAAAMhomAAAARaXavqPM8YBKuQoBAAAAMhomAAAABdklBxqPqxAAAAAgo2ECAABQUEopUkqljgdUS8MEAAAAIKNhAgAAUFCKFKnE+9EpNEygagITAACAokpe9NW2wlA9VyEAAABARsMEAACgqJT2HWWOB1RKwwQAAAAgo2ECAABQVIpyb0fvLXEsYFg0TAAAAAAyGiYAAABFWcMEGo6GCQAAAEBGwwQAAKAoDRNoOBomAAAAABkNEwAAgKJqUe7taLe2oXIuQwAAAICMhgkAAEBRKUXdGibQUDRMAAAAADICE4CDePzxx+Oaa66J8847L97znvfE6aefHp/4xCdi48aNVU8NABgp0hE4gEoJTAAOYs2aNfHTn/40/uiP/ig6OzvjjjvuiK1bt8b5558fjz/+eNXTAwBGgloq/wAqZQ0TgIO4884749RTTx1w7uMf/3icffbZ8ZWvfCUuuuiiimYGAAAcKQITgIPIw5KIiPHjx8cHPvCB2Lx5cwUzAgBGnJTKXajVoq9QOYEJlajX6/Hm23uqngbHqDf6dhce47XXXosf/ehH2iUAANCgBCYcdfV6PX73rmdi40u/qHoqHKP27nqj8BjXXnttvP7667F06dISZgQAjHplL9SqYAKVE5hw1L359h5hCaPan/7pn8a6devi61//erS3t1c9HQAA4AgQmFCpZ7/8sThhbFPV0+AY09vbGxNXDe97b7vttrj99tvjz//8z+MLX/hCqfMCAEaxsne2sUsOVE5gQqVOGNsUJ4z1x5Cja/cw/8zddtttsWzZsli2bFncfPPNJc8KAAAYSfxLFeAQ/Nmf/VksW7YsvvzlL8ett95a9XQAgJHGLjnQcAQmAAfx1a9+NW655Zb4+Mc/Hpdddln84Ac/GPD1888/v6KZAQAAR4rABOAgvvOd70RExIYNG2LDhg2Dvl6v14/2lACAkcYuOdBwBCYAB/G9732v6ikAAABHmcAEAACgKLvkQMOpVT0BAAAAgJFGwwQAAKAoa5hAwxGYAAAAFFSPFPUStwKuS0ygch7JAQAAAMgITAAAAIo6sOhrmcdhWr16dUyZMiWam5ujvb09nnrqqXd9/bp16+JDH/pQnHDCCTFx4sS4+uqrY/v27cP9HYCGIzABAAAY5davXx+LFi2KpUuXxqZNm2LWrFkxe/bs6O7uHvL13//+92Pu3Lkxb968+I//+I948MEH44c//GHMnz//KM8cRi6BCQAAQFHpCByHYeXKlTFv3ryYP39+TJ06NVatWhVtbW2xZs2aIV//gx/8IM4888y4/vrrY8qUKfGbv/mb8bnPfS6effbZw/zg0LgEJgAAACNUb2/vgGPXrl2DXtPX1xcbN26Mjo6OAec7Ojri6aefHnLcmTNnxs9+9rPo7OyMer0eP//5z+Ohhx6Kyy677Ih8DhiNBCYAAABFpVT+ERFtbW0xYcKE/mPFihWD3nrbtm2xZ8+eaG1tHXC+tbU1tmzZMuR0Z86cGevWrYs5c+bE2LFj47TTTov3vve98fWvf7383xsYpQQmAAAAI9TmzZvjtdde6z+WLFnyjq9N2bbG9Xp90LkDnn/++bj++uvjlltuiY0bN8aGDRvixRdfjAULFpQ6fxjNxlQ9AQAAgFFvmDvbvOt4EdHS0hItLS3v+tJTTjklmpqaBrVJtm7dOqh1csCKFSviggsuiJtuuikiIj74wQ/Ge97znpg1a1bcfvvtMXHixBI+BIxuGiYAAACj2NixY6O9vT26uroGnO/q6oqZM2cO+T1vvPFG1GoD/znY1NQUEfuaKYCGCQAAQHHD2NnmoOMdhsWLF8dVV10V06dPjxkzZsTdd98d3d3d/Y/YLFmyJF5++eW4//77IyLi8ssvjz/8wz+MNWvWxCWXXBI9PT2xaNGi+MhHPhKTJk0q8YPA6CUwAQAAGOXmzJkT27dvj+XLl0dPT09MmzYtOjs7Y/LkyRER0dPTE93d3f2v/+xnPxs7duyIb3zjG/HHf/zH8d73vjcuuuii+Mu//MuqPgKMOAITAACAov6/nW1KG+8wLVy4MBYuXDjk19auXTvo3HXXXRfXXXfdYb8PHCusYQIAAACQ0TABAAAoagQ0TIByCUwAAACKqkW5/X3PAkDlXIYAAAAAGQ0TAACAolKU/EhOeUMBw6NhAgAAAJDRMAEAACgqRbmtEA0TqJyGCQAAAEBGwwQAAKCgei1FvVZeLaTMsYDh0TABAAAAyGiYAAAAFJVSybvkaJhA1TRMAAAAADIaJgAAAEXZJQcajoYJAAAAQEbDBAAAoKiUIsrc2cYaJlA5DRMAAACAjIYJAABAUXbJgYYjMAEAACjKoq/QcDySAwAAAJDRMAEAACiqVvKir2WOBQyLhgkAAABARsMEAACgKA0TaDgaJgAAAAAZDRMAAICC6mnfUeZ4QLU0TAAAAAAyGiYAAABFWcMEGo6GCQAAAEBGwwQAAKColPYdZY4HVErDBAAAACCjYQIAAFCUNUyg4WiYAAAAAGQ0TAAAAIqqRbm3o93ahsoJTAAAAIqy6Cs0HLklAAAAQEbDBAAAoCiLvkLD0TABAAAAyGiYAAAAFFSPFPUS1x2ph4YJVE3DBAAAACCjYQIAAFCUbYWh4bgMAQAAADIaJgAAAEXZJQcajoYJAAAAQEbDBAAAoKiU9h1ljgdUSsMEAAAAIKNhAgAAUJQ1TKDhaJgAAAAAZDRMAAAAikr7jzLHAyolMAEAACioXktRL/ExmjLHAobHIzkAAAAAGQ0TAACAoiz6Cg1HwwQAAAAgo2ECAABQVEr7jjLHAyqlYQIAAACQ0TABAAAoqhbl3o52axsq5zIEAAAAyGiYjDL1ej3efHtP1dMo5I2+3VVPAQAAypWi5DVMyhsKGB6ByShSr9fjd+96Jja+9IuqpwIAAAANTWAyirz59h5hCQCluOB/rKt6CjDAv/zof1U9Bei3c8cb8dGzHj68b0opomaXHGgkApNR6tkvfyxOGNtU9TSG5Y2+3TH99v9T9TQAAADgHQlMRqkTxjbFCWP97wMAgBGhVnLDpMyxgGGxSw4AAABARkUBAACgoHpKUS9x3ZEyxwKGR2ACAABQVC3K7e97FgAq5zIEAAAAyGiYAAAAFJVSuVsBeyQHKqdhAgAAAJDRMAEAACjKtsLQcDRMAAAAADIaJgAAAEVpmEDD0TABAAAAyGiYAAAAFJX2H2WOB1RKwwQAAAAgo2ECAABQUL2Wol7iuiNljgUMj4YJAAAAQEbDBAAAoKiU9h1ljgdUSsMEAAAAIKNhAgAAUFQt7TvKHA+olMAEAACgKNsKQ8PxSA4AAABARsMEAACgoFqKqJV5O1rDBCqnYQIAAACQ0TABAAAoyK7C0Hg0TAAAAAAyGiYAAAAFaZhA49EwAQAAaACrV6+OKVOmRHNzc7S3t8dTTz31rq/ftWtXLF26NCZPnhzjxo2LX/3VX4377rvvKM0WRj4NEwAAgIJSSpFKrIUc7ljr16+PRYsWxerVq+OCCy6Ib37zmzF79ux4/vnn4/3vf/+Q33PFFVfEz3/+87j33nvj7LPPjq1bt8bu3bvLmD40BIEJAADAKLdy5cqYN29ezJ8/PyIiVq1aFd/97ndjzZo1sWLFikGv37BhQzzxxBPxk5/8JE466aSIiDjzzDOP5pRhxPNIDgAAQEEH1jAp8zhUfX19sXHjxujo6BhwvqOjI55++ukhv+fRRx+N6dOnx1/91V/F6aefHueee27ceOON8eabbxb5bYCGomECAAAwQvX29g7473HjxsW4ceMGnNu2bVvs2bMnWltbB5xvbW2NLVu2DDnuT37yk/j+978fzc3N8cgjj8S2bdti4cKF8eqrr1rHBPbTMAEAACjoSDVM2traYsKECf3HUI/X/HIOA2sp9Xr9HddC2bt3b6SUYt26dfGRj3wkLr300li5cmWsXbtWywT20zABAAAYoTZv3hwtLS39/523SyIiTjnllGhqahrUJtm6deug1skBEydOjNNPPz0mTJjQf27q1KlRr9fjZz/7WZxzzjklfQIYvTRMAAAAiqpFpBKPA/9Sa2lpGXAMFZiMHTs22tvbo6ura8D5rq6umDlz5pDTveCCC+KVV16JnTt39p/7z//8z6jVanHGGWeU9tsCo5nABAAAoKAqF32NiFi8eHHcc889cd9998ULL7wQN9xwQ3R3d8eCBQsiImLJkiUxd+7c/tdfeeWVcfLJJ8fVV18dzz//fDz55JNx0003xTXXXBPHH398mb81MGp5JAcAAGCUmzNnTmzfvj2WL18ePT09MW3atOjs7IzJkydHRERPT090d3f3v378+PHR1dUV1113XUyfPj1OPvnkuOKKK+L222+v6iPAiCMwAQAAKKiW9h1lqQ9jrIULF8bChQuH/NratWsHnTvvvPMGPcYD/JJHcgAAAAAyGiYAAAAFDWfdkYONB1RLwwQAAAAgo2ECAABQkIYJNB4NEwAAAICMhgkAAEBBKaVIJdZCyhwLGB4NEwAAAICMhgkAAEBBqbbvKHM8oFouQwAAAICMhgkAAEBBdsmBxqNhAgAAAJDRMAEAAChIwwQaj8AEAACgoBQlByblDQUMk0dyAAAAADIaJgAAAAXV0r6jLHUVE6ichgkAAABARsMEAACgIIu+QuPRMAEAAADIaJgAAAAUpGECjUfDBAAAACCjYQIAAFBQqqVIJW6TU+ZYwPBomAAAAABkNEwAAAAKsoYJNB4NEwAAAICMhgkAAEBBGibQeDRMAAAAADIaJgAAAEWV3DAJDROonMAEAACgoFrad5Q5HlAtj+QAAAAAZDRMAAAACrLoKzQeDRMAAACAjIYJAABAQam27yhzPKBaLkMAAACAjIYJAABAQdYwgcajYQIAAACQ0TABAAAoKKUUqcRaSJljAcOjYQJwCHbs2BFf/OIXo6OjI973vvdFSimWLVtW9bQAAIAjRGACcAi2b98ed999d+zatSs++clPVj0dAGCEObCGSZkHUC2P5AAcgsmTJ8cvfvGLSCnFtm3b4p577ql6SgAAwBEkMAE4BJ4jBgDejV1yoPEITKjUG317qp4Cx6A3+nZXPQUAAGCEE5hQqem3/3PVU+AYtHfXG1VPAQBoMBom0Hgs+spRd/xxTTF98q9UPQ0AAChNLZV/ANXSMOGoSynFgwtmxJtvexyHavT29sbEVVXPAgAAGMkEJlQipRQnjPXHj2rs9mcPAChZLcpthXgUAKrnOgQAAADIuM0KcIj+6Z/+KV5//fXYsWNHREQ8//zz8dBDD0VExKWXXhonnHBCldMDACpUS/WopXqp4wHVEpgAHKLPf/7z8dJLL/X/94MPPhgPPvhgRES8+OKLceaZZ1Y0MwAAoGwCE4BD9NOf/rTqKQAAI1TZO9vYJQeqZw0TAAAAgIyGCQAAQEEpyr0brWAC1dMwAQAAAMhomAAAABRklxxoPBomAAAAABkNEwAAgILskgONR8MEAAAAIKNhAgAAUFAtyr0b7c42VE9gAgAAUJBHcqDxCC4BAAAAMhomAAAABaVUj1TiVsBljgUMj4YJAAAAQEbDBAAAoCBrmEDj0TABAAAAyGiYAAAAFGRbYWg8rkMAAACAjIYJAABAQbVUj1qJO9uUORYwPBomAAAAABkNEwAAgILskgONR8MEAAAAIKNhAgAAUFCKcu9GK5hA9TRMAAAAADIaJgAAAAVZwwQaj8AEAACgINsKQ+PxSA4AAABARsMEAACgII/kQOPRMAEAAADIaJgAAAAUVIty70a7sw3Vcx0CAAAAZDRMAAAACrJLDjQeDRMAAACAjIYJAABAQXbJgcajYQIAAACQ0TABAAAoSMMEGo+GCQAAAEBGwwQAAKCgWpR7N9qdbaie6xAAAAAgIzABAAAoKKV61Eo8Uqof9hxWr14dU6ZMiebm5mhvb4+nnnrqkL7vX/7lX2LMmDHx4Q9/+LDfExqZwAQAAKCgA4u+lnkcjvXr18eiRYti6dKlsWnTppg1a1bMnj07uru73/X7XnvttZg7d2789m//doFPD41JYAIAADDKrVy5MubNmxfz58+PqVOnxqpVq6KtrS3WrFnzrt/3uc99Lq688sqYMWPGUZopjB4CEwAAgIJqR+CIiOjt7R1w7Nq1a9B79/X1xcaNG6Ojo2PA+Y6Ojnj66affcc5/8zd/E//93/8dt95663A/NjQ0gQkAAMAI1dbWFhMmTOg/VqxYMeg127Ztiz179kRra+uA862trbFly5Yhx/2v//qv+NKXvhTr1q2LMWNsngpDcWUAAAAUVIvDX3fkYONFRGzevDlaWlr6z48bN+4dvyelgROo1+uDzkVE7NmzJ6688sq47bbb4txzzy1lvtCIBCYAAAAjVEtLy4DAZCinnHJKNDU1DWqTbN26dVDrJCJix44d8eyzz8amTZviC1/4QkRE7N27N+r1eowZMyYee+yxuOiii8r7EDBKCUwAAAAKSsPcCvjdxjtUY8eOjfb29ujq6opPfepT/ee7urriE5/4xKDXt7S0xI9//OMB51avXh2PP/54PPTQQzFlypThTxwaiMAEAABglFu8eHFcddVVMX369JgxY0bcfffd0d3dHQsWLIiIiCVLlsTLL78c999/f9RqtZg2bdqA7z/11FOjubl50Hk4lglMAAAACqqlktcwOcyx5syZE9u3b4/ly5dHT09PTJs2LTo7O2Py5MkREdHT0xPd3d3lTRCOAQITAACABrBw4cJYuHDhkF9bu3btu37vsmXLYtmyZeVPCkYxgQkAAEBBtfjlzjZljQdUy3UIAAAAkNEwAQAAKKiW6lErcZecMscChkfDBAAAACCjYQIAAFBQ1bvkAOUTmAAAABSUSg5MksAEKueRHAAAAICMhgkAAEBBTfuPMscDqqVhAgAAAJDRMAEAACjItsLQeDRMAAAAADIaJgAAAAXZVhgaj4YJAAAAQEbDBAAAoCANE2g8GiYAAAAAGQ0TAACAgprSvqPM8YBqaZgAAAAAZDRMAAAACrKGCTQeDRMAAACAjIYJAABAQbVUj1qqlzoeUC2BCQAAQEGp5EdykkdyoHIeyQEAAADIaJgAAAAU1LT/KHM8oFoaJgAAAAAZDRMAAICCbCsMjUdgAnCUnPqBr0SqHVf1NCAiIt7svq3qKQCMWL1jd1Y9BWAEEJgAAAAUZFthaDzWMAEAAADIaJgAAAAU1JT2HWWOB1RLwwQAAAAgo2ECAABQkF1yoPFomAAAAABkNEwAAAAK0jCBxqNhAgAAAJDRMAEAACioFiU3TMobChgmgQkAAEBBtVSPplQvdTygWoJLAAAAgIyGCQAAQEG1KPdutDvbUD3XIQAAAEBGwwQAAKAg2wpD49EwAQAAAMhomAAAABSkYQKNR8MEAAAAIKNhAgAAUFBTimhK9VLHA6qlYQIAAACQ0TABAAAoyBom0Hg0TAAAAAAyGiYAAAAFaZhA49EwAQAAAMhomAAAABSkYQKNR2ACAABQUC2VuxWwwASq55EcAAAAgIyGCQAAQEG1VI9aqpc6HlAtDRMAAACAjIYJAABAQbUo9260O9tQPdchAAAAQEbDBAAAoCDbCkPj0TABAAAAyGiYAAAAFNSU9h1ljgdUS8MEAAAAIKNhAgAAUFAt1aOW6qWOB1RLwwQAAAAgo2ECAABQkF1yoPFomAAAAABkNEwAAAAK0jCBxiMwAQAAKKgW5db3PQoA1XMdAgAAAGQ0TAAAAIpKEanMx2g8kgOV0zABAAAAyGiYAAAAFJSi3FKIgglUT8MEAAAAIKNhAgAAUFAqeQ2TUtdDAYZFwwQAAAAgo2ECAABQUC3KvRvtzjZUz3UIAAAAkNEwAQAAKCileqRUL3U8oFoaJgAAAAAZDRMAAICC0v6jzPGAammYAAAAAGQ0TAAAAApKEZFKrIVomED1BCYAAAAFeSQHGo9HcgAAAAAyGiYAAAAF1dK+o8zxgGppmAAAAABkNEwAAAAKsoYJNB4NEwAAgAawevXqmDJlSjQ3N0d7e3s89dRT7/jab3/723HxxRfH+973vmhpaYkZM2bEd7/73aM4Wxj5BCYAAAAFpVT+cTjWr18fixYtiqVLl8amTZti1qxZMXv27Oju7h7y9U8++WRcfPHF0dnZGRs3bowLL7wwLr/88ti0aVMJvxvQGAQmAAAAo9zKlStj3rx5MX/+/Jg6dWqsWrUq2traYs2aNUO+ftWqVfHFL34xfv3Xfz3OOeec+MpXvhLnnHNOfOc73znKM4eRS2ACAABQUDoCR0REb2/vgGPXrl2D3ruvry82btwYHR0dA853dHTE008/fUjz37t3b+zYsSNOOumkw/nY0NAEJgAAACNUW1tbTJgwof9YsWLFoNds27Yt9uzZE62trQPOt7a2xpYtWw7pfb761a/G66+/HldccUUp84ZGYJccAACAgo7ULjmbN2+OlpaW/vPjxo175+/JFj6p1+uDzg3lgQceiGXLlsU//uM/xqmnnjqs+UIjEpgAAACMUC0tLQMCk6Gccsop0dTUNKhNsnXr1kGtk9z69etj3rx58eCDD8bHPvaxwvOFRuKRHAAAgIJqqfzjUI0dOzba29ujq6trwPmurq6YOXPmO37fAw88EJ/97GfjW9/6Vlx22WXD/ejQsDRMAAAARrnFixfHVVddFdOnT48ZM2bE3XffHd3d3bFgwYKIiFiyZEm8/PLLcf/990fEvrBk7ty5cccdd8T555/f3045/vjjY8KECZV9DhhJBCYAAAAFHak1TA7VnDlzYvv27bF8+fLo6emJadOmRWdnZ0yePDkiInp6eqK7u7v/9d/85jdj9+7dce2118a1117bf/4zn/lMrF27toRPAKOfwAQAAKCoVI+U6qWOd7gWLlwYCxcuHPJreQjyve99bxiTgmOLNUwAAAAAMhomAAAABVX9SA5QPg0TAAAAgIzABOAQ7Ny5MxYtWhSTJk2K5ubm+PCHPxx///d/X/W0AIARIqXyD6BaHskBOAS/8zu/Ez/84Q/jL/7iL+Lcc8+Nb33rW/EHf/AHsXfv3rjyyiurnh4AAFAygQnAQXR2dkZXV1d/SBIRceGFF8ZLL70UN910U8yZMyeampoqniUAUKValFvf9ygAVM91CHAQjzzySIwfPz5+7/d+b8D5q6++Ol555ZX413/914pmBgAAHCkaJqPUG317qp4CjFpv9O0+rNf/+7//e0ydOjXGjBn4V+YHP/jB/q/PnDmztPkBAKNP2euOWMMEqicwGaWm3/7PVU8BRq29u944rNdv3749zjrrrEHnTzrppP6vAwAAjcUjOaPI8cc1xfTJv1L1NOCYlN7lNs+7fQ0AODakI3AA1dIwGUVSSvHgghnx5tsex4Eient7Y+KqQ3/9ySefPGSL5NVXX42IXzZNAACAxiEwGWVSSnHCWP/boIjdh3kN/dqv/Vo88MADsXv37gHrmPz4xz+OiIhp06aVOj8AYPSxhgk0Ho/kABzEpz71qdi5c2c8/PDDA87/7d/+bUyaNCl+4zd+o6KZAQAAR4qqAsBBzJ49Oy6++OL4/Oc/H729vXH22WfHAw88EBs2bIi/+7u/i6ampqqnCABUrOx1RxRMoHoCE4BD8O1vfzuWLl0at9xyS7z66qtx3nnnxQMPPBC///u/X/XUAIARoJb2HWWOB1RLYAJwCMaPHx933HFH3HHHHVVPBQAAOAoEJgAAAAV5JAcaj0VfAQAAADIaJgAAAAWlVI+U6qWOB1RLwwQAAAAgo2ECAABQkDVMoPFomAAAAABkNEwAAAAKSmnfUeZ4QLU0TAAAAAAyGiYAAAAFWcMEGo+GCQAAAEBGwwQAAKCgWpR7N9qdbaie6xAAAAAgo2ECAABQVMm75FjEBKonMAEAACjMsq/QaDySAwAAAJDRMAEAACgo7f9V5nhAtTRMAAAAADIaJgAAAAWlVIuUyrsfXeZYwPC4CgEAAAAyGiYAAACF2SUHGo2GCQAAAEBGwwQAAKCgff2SMnfJAaqmYQIAAACQ0TABAAAozBom0Gg0TAAAAAAyGiYAAAAFpVSLlMq7H13mWMDwuAoBAAAAMhomAAAAhVnDBBqNwAQAAKCgtP9XmeMB1fJIDgAAAEBGwwQAAKAgDRNoPBomAAAAABkNEwAAgMJqUe79aPe2oWquQgAAAICMhgkAAEBBKaVIqcQ1TEocCxgeDRMAAACAjIYJAABAYWn/UeZ4QJU0TAAAAAAyGiYAAAAFpf2/yhwPqJaGCQAAAEBGwwQAAKCwWpR7P9q9baiaqxAAAAAgo2ECAABQkDVMoPEITAAAAApKKUVKJQYmJY4FDI9HcgAAAAAyGiYAAACFpf1HmeMBVdIwAQAAAMhomAAAABS0b8nX8u5HW/QVqqdhAgAAAJDRMAEAACjMGibQaDRMAAAAADIaJgAAAAWllCKl8lohZY4FDI+GCQAAAEBGwwQAAKAwa5hAo9EwAQAAAMhomAAAABSUohapxPvRZY4FDI+rEAAAACCjYQIAAFCYNUyg0QhMAAAACkr7f5U5HlAtj+QAAAAAZDRMAAAACkopRUolNkxKHAsYHg0TAAAAgIyGCQAAQGG1KPd+tHvbUDVXIQAAAEBGwwQAAKAgu+RA49EwAQAAAMhomAAAABSW9h9ljgdUScMEAAAAIKNhAgAAUFBKKVIqcQ2TEscChkfDBAAAoAGsXr06pkyZEs3NzdHe3h5PPfXUu77+iSeeiPb29mhubo6zzjor7rrrrqM0UxgdBCYAAACF1Y7AcejWr18fixYtiqVLl8amTZti1qxZMXv27Oju7h7y9S+++GJceumlMWvWrNi0aVPcfPPNcf3118fDDz98uB8cGpbABAAAYJRbuXJlzJs3L+bPnx9Tp06NVatWRVtbW6xZs2bI1991113x/ve/P1atWhVTp06N+fPnxzXXXBN//dd/fZRnDiOXNUxGiHq9HhERvb29Fc8EGt+B6+zAdXekHXif+t63j8r7waHo7d1Z9RQARqwDf0cezs8KO3pfj1TizjY7el/fP5eB/z4YN25cjBs3bsC5vr6+2LhxY3zpS18acL6joyOefvrpIcd/5plnoqOjY8C5Sy65JO699954++2347jjjiv6EWDUE5iMEDt27IiIiLa2topnAseOHTt2xIQJE47K+0RE9PU8esTfCw7VhAkq1wAHcyg/K4wdOzZOO+20aGv7aOnvP378+EH/Prj11ltj2bJlA85t27Yt9uzZE62trQPOt7a2xpYtW4Yce8uWLUO+fvfu3bFt27aYOHFi8Q8Ao5zAZISYNGlSbN68OU488UQrYsMRVq/XY8eOHTFp0qSj8n6ubwAYXQ7nZ4Xm5uZ48cUXo6+v74jMI//ZIW+X/P/y1w71/Qd7/VDn4VglMBkharVanHHGGVVPA44ZR6NZcoDrGwBGn8P5WaG5uTmam5uP4Gze3SmnnBJNTU2D2iRbt24d1CI54LTTThvy9WPGjImTTz75iM0VRhOLvgIAAIxiY8eOjfb29ujq6hpwvqurK2bOnDnk98yYMWPQ6x977LGYPn269UtgP4EJAADAKLd48eK455574r777osXXnghbrjhhuju7o4FCxZERMSSJUti7ty5/a9fsGBBvPTSS7F48eJ44YUX4r777ot77703brzxxqo+Aow4HskBAAAY5ebMmRPbt2+P5cuXR09PT0ybNi06Oztj8uTJERHR09MT3d3d/a+fMmVKdHZ2xg033BB33nlnTJo0Kb72ta/Fpz/96ao+Aow4qX609tUEAAAAGCU8kgMAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQEZgAAAAAZAQmAAAAABmBCQAAAEBGYAIAAACQ+X8xMtYei/4/cgAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "!sourmash plot genome_compare.mat\n", "\n", "from IPython.display import Image\n", "Image(filename='genome_compare.mat.matrix.png') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and for the R aficionados, you can output a CSV version of the matrix:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kloaded 1 sigs from 'akkermansia.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'shew_os185.fa.sig'g'\n", "\u001b[Kloaded 1 sigs from 'shew_os223.fa.sig'g'\n", "\u001b[Kloaded 3 signatures total. \n", "\u001b[K\n", "0-CP001071.1 Akke...\t[1. 0. 0.]\n", "1-NC_009665.1 She...\t[0. 1. 0.228]\n", "2-NC_011663.1 She...\t[0. 0.228 1. ]\n", "min similarity in matrix: 0.000\n" ] } ], "source": [ "!sourmash compare -k 31 *.sig --csv genome_compare.csv" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\"CP001071.1 Akkermansia muciniphila ATCC BAA-835, complete genome\",\"NC_009665.1 Shewanella baltica OS185, complete genome\",\"NC_011663.1 Shewanella baltica OS223, complete genome\"\r", "\r\n", "1.0,0.0,0.0\r", "\r\n", "0.0,1.0,0.22846441947565543\r", "\r\n", "0.0,0.22846441947565543,1.0\r", "\r\n" ] } ], "source": [ "!cat genome_compare.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is now a file that you can load into R and examine - see [our documentation](https://sourmash.readthedocs.io/en/latest/other-languages.html) on that." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## working with metagenomes\n", "\n", "Let's make a fake metagenome:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kcomputing signatures for files: fake-metagenome.fa\n", "\u001b[KComputing a total of 1 signature(s).\n", "\u001b[K... reading sequences from fake-metagenome.fa\n", "\u001b[Kcalculated 1 signatures for 3 sequences in fake-metagenome.fa\n", "\u001b[Ksaved signature(s) to fake-metagenome.fa.sig. Note: signature license is CC0.\n" ] } ], "source": [ "!rm -f fake-metagenome.fa*\n", "!cat genomes/*.fa > fake-metagenome.fa\n", "!sourmash sketch dna -p k=31,scaled=1000 fake-metagenome.fa" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the `sourmash gather` command to see what's in it:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r", "\u001b[K\r\n", "== This is sourmash version 4.0.0a4.dev12+g31c5eda2. ==\r\n", "\r", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\r\n", "\r\n", "\r", "\u001b[Kselect query k=31 automatically.\r\n", "\r", "\u001b[Kloaded query: fake-metagenome.fa... (k=31, DNA)\r\n", "\r", "\u001b[Kloading from shew_os185.fa.sig...\r", "\r", "\u001b[Kloaded 1 signatures from shew_os185.fa.sig\r", "\r", "\u001b[Kloading from shew_os223.fa.sig...\r", "\r", "\u001b[Kloaded 1 signatures from shew_os223.fa.sig\r", "\r", "\u001b[Kloading from akkermansia.fa.sig...\r", "\r", "\u001b[Kloaded 1 signatures from akkermansia.fa.sig\r", "\r", "\u001b[K \r", "\r", "\u001b[Kloaded 3 signatures.\r\n", "\r\n", "\r\n", "overlap p_query p_match\r\n", "--------- ------- -------\r\n", "499.0 kbp 38.4% 100.0% CP001071.1 Akkermansia muciniphila AT...\r\n", "494.0 kbp 38.0% 100.0% NC_009665.1 Shewanella baltica OS185,...\r\n", "490.0 kbp 23.6% 62.7% NC_011663.1 Shewanella baltica OS223,...\r\n", "\r\n", "found 3 matches total;\r\n", "the recovered matches hit 100.0% of the query\r\n", "\r\n" ] } ], "source": [ "!sourmash gather fake-metagenome.fa.sig shew*.sig akker*.sig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other pointers\n", "\n", "[Sourmash: a practical guide](https://sourmash.readthedocs.io/en/latest/using-sourmash-a-guide.html)\n", "\n", "[Classifying signatures taxonomically](https://sourmash.readthedocs.io/en/latest/classifying-signatures.html)\n", "\n", "[Pre-built search databases](https://sourmash.readthedocs.io/en/latest/databases.html)\n", "\n", "## A full list of notebooks\n", "\n", "[An introduction to k-mers for genome comparison and analysis](kmers-and-minhash.ipynb)\n", "\n", "[Some sourmash command line examples!](sourmash-examples.ipynb)\n", "\n", "[Working with private collections of signatures.](sourmash-collections.ipynb)\n", "\n", "[Using the LCA_Database API.](using-LCA-database-API.ipynb)\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "smash-notebooks", "language": "python", "name": "smash-notebooks" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 2 }