{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# An introduction to k-mers for genome comparison and analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "k-mers provide sensitive and specific methods for comparing and analyzing genomes.\n", "\n", "This notebook provides pure Python implementations of some of the basic k-mer comparison techniques implemented in sourmash, including hash-based subsampling techniques.\n", "\n", "### Running this notebook.\n", "\n", "You can run this notebook interactively via mybinder; click on this button:\n", "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dib-lab/sourmash/latest?labpath=doc%2Fkmers-and-minhash.ipynb)\n", "\n", "A rendered version of this notebook is available at [sourmash.readthedocs.io](https://sourmash.readthedocs.io) under \"Tutorials and notebooks\".\n", "\n", "You can also get this notebook from the [doc/ subdirectory of the sourmash github repository](https://github.com/dib-lab/sourmash/tree/latest/doc). See [binder/environment.yaml](https://github.com/dib-lab/sourmash/blob/latest/binder/environment.yml) for installation dependencies.\n", "\n", "### What is this?\n", "\n", "This is a Jupyter Notebook using Python 3. If you are running this via [binder](https://mybinder.org), you can use Shift-ENTER to run cells, and double click on code cells to edit them.\n", "\n", "Contact: C. Titus Brown, ctbrown@ucdavis.edu. Please [file issues on GitHub](https://github.com/dib-lab/sourmash/issues/) if you have any questions or comments!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculating Jaccard similarity and containment\n", "\n", "Given any two collections of k-mers, we can calculate similarity and containment using the union and intersection functionality in Python." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "def jaccard_similarity(a, b):\n", " a = set(a)\n", " b = set(b)\n", " \n", " intersection = len(a.intersection(b))\n", " union = len(a.union(b))\n", " \n", " return intersection / union" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def jaccard_containment(a, b):\n", " a = set(a)\n", " b = set(b)\n", " \n", " intersection = len(a.intersection(b))\n", " \n", " return intersection / len(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Let's try these functions out on some simple examples!" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "a = ['ATGG', 'AACC']\n", "b = ['ATGG', 'CACA']\n", "c = ['ATGC', 'CACA']" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jaccard_similarity(a, a)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jaccard_containment(a, a)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.3333333333333333" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jaccard_similarity(b, a)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jaccard_similarity(a, c)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jaccard_containment(b, a)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAATAAAADqCAYAAAAlKRkOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAbpklEQVR4nO3daZCcx33f8W/vXHsCC2APXAuCBEmRIA6SEklRokQSdFmW5ENJKnZV7DLfKK7ykVRiVzmlOMpk5SiRnFRsS3ES20lsp2LZFTmqWC7JpGyKEiURAA/xAkgQIEAQN7CLxQK7Ozt358WzMCFoZx4sMPt098zvUzU1ukrz5zP7/Kaffv7dj7HWIiISoi7XBYiIXC8FmIgESwEmIsFSgIlIsBRgIhIsBZiIBEsBJiLBUoCJSLAUYCISLAWYiARLASYiwVKAiUiwFGAiEiwFmIgESwEmIsFSgIlIsBRgIhIsBZiIBEsBJiLBUoCJSLAUYCISLAWYiARLASYiwVKAiUiwFGAiEqy06wIkhjE5oA/oveI9B2SveqWBKlCJea8CRWAauEjgj2Y348YAK4BVRMclQ3Qs4t5rQAkoX/EqAQVgbuG9YPO2mOA/jiyRCfzvt30YY4hOwuErXqtY3h+ZGlGQXbjiNQXM+BZsC0E1QHRMVgGrF94HgdQyfnSN6LicAyYWXhds3q/j06kUYK4YMwCM8G5YDRGNDnxQ5d1gmwDewdqZJAsw42YA2ER0bFYTBZUvVwxVYJIrQs3m7SW3JXUmBVhSjOkGbgZuIgqubrcFLdkF4B3gGHC21SO0hRHWCNHxuYlodBWSIlGYHQXe1qVnMhRgy8mYLLAZ2AJsoH1umhSJguwYcBxrK9fzf2LGTQYYIxppbSK8UG+kDpwC3gKO2rwtO66nbSnAWs2YNFFo3UJ0ci7n/IwP6sBpotHZW9jmIw8zbrqBW4kCaz3tE+qN1IATwGHgHZu/vrCXxSnAWsGYFNEJuWXh3Ze5mqTVgLeB/Vh79sr/woybEeAuomBv91BvpEo0aj0MHLN5W3NcT/AUYDciGm1tBXYQtTfIuybLWfYN/TrMpNlGdJNC3lUAXgVet3lbdV1MqBRg18OYDNFoYgftM2/TMqUctcN3Ujy2hdylHObJQUp/sZrcpXTHjryaKRIF2X5dXi6dAmwpokn57cA2oqZJuUI5S+3ATorHb6bHpn5wbqsC9adXMv9Hw3TPphRkiygB+4B9Nm9LrosJhQLsWkQtEDuILhezjqvxTjVF/dA2im/fTq4eM8oqGWp/tYrin6+hp9TV9hP416MMvA68qlaMeAqwZqLgupsouDp1Yr6husG+/R7mD91Ftppd2vGZ7aL65dWU/3I1PTWDWa4aA1YF3gBeUpA1pgBrxJg7gAfQpeKizq2j+Mr9dJV6b2xEeiFF5YtrqT7fT0+ramszJWCvzdsDrgvxkQLsasasAj4ErHVdio9KOWqvPEDp3IbW3nV9oY/C764lO53WSLeBM8B3bN5ecF2ITxRgl0W9XPcCO2n/5srr8vZtFA7cTa62THcTi4ba/xqm9Fer1JLSQJ3ojuWL6iGLKMAAjBkCHiW89XeJmOun8uJD1C6tSqZl5EiO4ufWkzqd9WZxu28uAE/bvJ10XYhrnR1gxnQB9yy8NOpaxLFbKOx7H931VLLHp2yo/8EIxScHNRproA68RDTJX3ddjCudG2DGDBKNuoYdV+Klaor6Sx+gdHaj28n1Pf0U/uM6utVy0dAk0WisI+fGOjPAjNkEPIY/+295ZXoVpRc+jCne4B3GVjmfovybG7GHu3VHuIEK8JTN22OuC0la5wWYMduAB0G9R4t5+3YKr99Dt/VsxFOF+h8PU/zL1bqkbMACe2zevua6kCR1ToBF810fIGpKlatYsK/eT+H4Fvpc19LM36xg7otr6bVqfm3kDeB7nTIv1hkBFq1h/BFgo+tSfFRNUX/uYUpTo2E0k+7vYT6/kZzmxRo6CfxNJ2yk2P4BZswK4MeI9lSXq8z3UN39GLXCQFjzS6cylH5jjNRkRo2vDUwDT7T7Xv3tHWDGrAV+FG15s6iLg5T37KKrkgszBGa7qP6rMWqa3G+oCHzD5u0Z14Usl/YNMGNuAz5M5+7+2dT0akq7d5GuZcI+PkVD7V+OUT3UoxBroA48Y/P2oOtClkN7Bli0EPvDrsvw1YU1lPbsIr1cS4KSVjLUfmOM6psKsWaeaccF4e03CWrMZqLF2LKIqaGFkVebhBdAzpL67HHSdxbQRoCNfciMm5tdF9Fq7RVgxqwnalDVLfZFnB+muOdR0nGbDoYoZ0n92xNkthfQ3lmLM8AuM242uC6kldonwIwZBj6C5rwWNb2K0t5HybZjeF2WtXT9mxNkb5vXSKyBFPCjZty0zfK59giwaF3jR9HSoEUVeqns2UUq6QXZLmQtXZ85QWq4gh6QsbgM8FEzbgZdF9IK4f9BG9MPfAy1SiyqkqH27I9gl7rlc8j666R/6xi2v4b2zFpcN/BxM276XRdyo8IOsGjP+o8BwX8Ry6FusLt3USn2+bEoO0lDVbKfPU4lU6cjltRchz7gYwtPSg9WuAEWPZvxo6jDvqHnP8z8pdWdOzK9pUT3p09qUr+JQaLLyWCnXsINMHgE7eXV0IEdzE2s184N9xTofXyCOdd1eGyYaF+8IIUZYMbcCbRdT0urTIxSfGurwuuyvz9F7z1zGok1sdmMmyB3aQkvwKI7jg+6LsNXpW6qLz5EGm0383e6wPz6KdKrq1Rd1+Kx94d4ZzKsAIv29NqFHjK7KAt278NUO+mO47Xqr5P+1yeoGksbrp1riTTwmBk3QWVCUMUC9wNDrovw1f57KXTypH2cLSW6f+EcBdd1eGwN0TkWjHACzJgNwA7XZfhqcoTi0ds17xXnx6fp26n5sGZ2mHETzMafYQRY1O8V7J2S5bbwBKGU5r2uza+dJpVTf1gzj4TSHxZGgEVb42h00cC++yiWerSM6lqtqpH55bMahTXRCzzsuohr4X+ARS0Tm12X4aupIUonNoexl71PHr1E713auaKZm0JorfA7wIzpQy0TDdUN9qUHMbp0vD6/epqulO5KNvN+39dL+h1g8ABqmWjo4HYK8/2dt86xVUaqZH9uUnclm0jj+V1JfwPMmBHgVtdl+KrUTfXIHbp0vFE/eYGeQTW4NnOrGTejrotoxN8Aix5CKw3sv4dyJ+zvtdyylq5fOEfbPz/xBnk7jePnCWDMrcCI6zJ8NbOC8qlNGn21ygdn6LmppBBrYsSMGy+vhvwLsGi50H2uy/DZa/dRo0sT963SBeaXzmjzwxj3+7jMyLuCgDuAAddF+GpilOLUiEZfrba1SI92rGiqn+jc9IpfAWZMCrjHdRk+e/1e1xW0r398TqPaGPeacePVQ2H8CjDYSrTVrSxicoTSzKAWay+XsTK5HXN6olETvcBdrou4kj8BZkwauNt1GT47uF3r95bbz07qGMe424wbb3oz/QmwqOdLczsNzKygrLmv5be1qDuSMbqB21wXcZlPAebdBKFP3tyhZsuk/NykjnUMb85VPwLMmNWo76uh+R6qZzZo9JWU+2bpGaooxJoYNuNmjesiwJcA8yjRfXTkDkrq+0pOCszfm9Jkfgwvzln3ARa1TnhzTe2jk5u1YDtpD8/omMe41YeWCvcBFj0eLee6CF9NrKVY7tZmhUlbWSPz3lk1tjaRw4NHG/oQYF4MRX119DbtV+XKx6d17GM4P3fdBpgxK4D1TmvwWDVN/dw6jU5duXuOXG9NaySbWG/GzQqXBbgegb3H8ed77fgtlKy2zHEmA12PXdJkfgynozB3J4cxBgVYUyc3u65AHr2oH5AYt7vcpcLll7MOPWmooWqa+vQqrXt07ZYSuW49gq2ZXqJz2QmXAbbB4Wd77+wG9X75IAXm/TO6jIzh7Fx2GWCavG/i9JjugPnig7P6LmI4O5fdBFi088Swk88OxOSoGil9sb1A1ujxa80MmXHjpFfR1QhsrcPP9t7UEOVqVo+T80VfnfQd89qhookuonPayQe7oMvHJs5uoOK6BvlB75/V4u4YTs5pBZiHpoY1OvXN1nl9JzE6JMCMyaL5r6YuDWr+yzebS/pOYgyZcZP4MXLxq7IW1B7QyOwA5VoG56v85Qd1W1Jj2qm1GYODfjAXAabLxybOj2rtna+2FTQPFkMB1unOD+t2va+2z7uuwHuJn9vJBli0/nF1op8ZmOk1unz01ZaivpsYq824SXR6KOkRWK+DzwxKsVebF/pquKLvJkYXCT/XNekwGUj484JS7KFa1/Y53spAlx72ESvRc1wB5pHZFZrA991YWQEWQwHWqWZWKsB8N1bSTZYY/Ul+WNIBlug/XGhmV+jk8N1NZe0NFkMjsE41O6D5L9+tK+s7iqEA61SlHq1Q8N1gTQEWo00DLOoB0yVkE5WMTg7f9SvA4vQl2QuW5JehHrAYVa2B9F5PXd9RjER7wZIMFF0+xqilFfC+67Z0aXfWWImd60meMHpAaxPlLHWM5sBCMFDTncgYiZ3rSW5b7M3o4r3w+BuwvRdmJmHcdT0AlSw1PDpGN+qfPcHjxy+xPZdi5kv/wI9j3CoDNWqX0h5cSv4+jzPBdjLM8C+8OsaJ/R0necK4/8IX/Cw8+3vwBdd1XKmSba9f9Uc28+wvvs+vY9wq/XVPLiF38Cwf9/IYt2WAeTO6+FU4NAZzruu4Ut2bo9Man7iDQ0O9fh3jVkn7Mgf2IIdY6eUxVoB1Gms8OSkklv6QYyV2taUA84TV9H0wuvRTE6ct+8Daao6n1YxOimDU9WMTJ7G/ZgWYJ4xVC0Uo9IccK7FDlGQbhTff+3b45BG4vQj9/fD5n4Gv/g/4nsuaurw5Oq3xK1/nk2dmub1co/8ffpnPf2gTX/2nD7g9xq1S9+WpWv+FT3KB26nSz2f5PNv4Kj/lxTFObFuojgyw1+C/u67haulKe80R/ueP+XeMW2U25UmA/ZK3xzixcz3Jk0ab9TWRLbVXgLWziyl/eho91ZYBpoeCNpEp04Uv/UXS1IyeWxAnsXM9yS9iJsHPCo4Bk9IaO++VDDWrNatxEjvXkwywOTyaB/NRuqzLbN8VuvQ3HMOS4CqX5ALM2kT/wUKUqegS0ncKsFhzNm/bcg4MdBnZVLaok8N3F1P6jmIkeo4nHWCzCX9eUPpmdXL47kROo+QYbR1gGoE1MXBRd7d8dyLrugLvJTpIUYB5ZOCi7m757lhWPzIxNALrVP0XE10ZIdfhWE5NrDEUYJ2qZ560US+YtypQn8iQcV2H59o6wNQLFqNnnorrGmRxU2mqrmvwXOKtUskGWNQLNp3oZwZmxZSaWX11uFvfTYwLSfaAgZtdUk85+MxgrDnnugJpZF+vWihinE76AxVgnhk6q4l8X+3r0fxXjMTPbRcBdpoEt5wNzcAlsl1VXar4pmSoHc3pxyVGB4zArC0B5xP/3ICsuKith3xzLEtFu1A0dd7mbTHpD3XVlKfLyCZWTehOrW/e6NGoOIaTc1oB5qHRk2qW9M1z/fpOYnRUgJ1B82ANrTlHLlXRL74viobaq73kXNfhMYuD+S9wFWDWloFJJ58dAANmzTnNg/liXy8lzX81dd7mrZO/V5cLU3UZ2cS64xqh+uLZftcVeM/ZuewywE46/GzvrT1BjrpCzLU62N0DunyM0ZEBdgpI/LZrKDIVUmqncO+dHOVZPUatmSJwwtWHuwswa+vAQWefH4ANRzWR79q3B/QdxDiU9PrHK7nenO2A48/32tgRcqaunjBXqlB/clCXjzGcnsNuA8zaaaKWCllEtkxqzVlKruvoVK/1UtTlY1Nnbd5ecFmA6xEYaBTW1OZDrivoXH89qNaJGM7PXR8C7AgJPoo8NKMn6c6UNA+TtNkuqnv66XZdh8cqwGHXRbgPMGureHAgfGXArD+mu7VJ++6AmldjHLZ563yHWvcBFnE+FPXZLW+QxaonLCl1sF9ZjR6g1pwX56wfAWbtBNpip6G+OTLDpzUKS8rLvRRPZ7V5YRNTNm+92DvYjwCLvOm6AJ+95zWvvqu29r+HdKxjeDH6Ar8C7CCoZaCRwSlyg+eZd11Hu3srR/FQj3q/mijjUQO6PwEW7VDxqusyfHb7a5pUXm5fGtJcY4xXXe08sRh/AizyGlof2dDIabp7ZzRKXS5nMpSe76fHdR0eKxKdo97wK8CiloqXXZfhsztf1ghhufzRsI5tjJdt3nr14GW/AiyyHyi4LsJX607QvfK8RqmtdjhH8dkBNa42UQBed13E1fwLMGtrwAuuy/DZ9ucx6gtrrd8b1fxijBd9aFy9mn8BFnkT9YU1NHiB3OhJ3ZFslef7mNedx6am8Kh14kp+Bpi1Ftjtugyf3fUiGW21c+OqUP+vo3pgbYxnbd56OeL3M8AArD0FHHVdhq96C2RuekujsBv1jUHmJzLqum/iHZu33j6/wt8Ai+wBjTIaueNlenLzeHVXKCQXUlT+57DaJpqoE52D3vI7wKy9BDzvugxfpWt07dyrrXau1xfXUi11eX4OuPWCzduLrotoxv8vz9pX0CPYGho5Tffa42o7Wao9/RTUtNrUKeAV10XE8T/AIk+jDv2Gdu4llynh3S1uX812Uf3ttbrr2EQJeNrXifsrhRFg1s4Bz7guw1eZCqmdezUXdq2+sJZKQXvdN/OMzds510VcizACDMDao8Abrsvw1dqT9Gx8W5eScZ5eQWH3gC4dmzhg8/Zt10Vcq3ACLLIbmHZdhK927KWn/6IWezdyMkPpd9cqvJq4CDzruoilCCvAosXe30StFYvqspj7v0VXqqo7k1crGmr5jXTVtM99I3XgKR+XCzUTVoABWDsJPOe6DF/1FsjcvVtPebra76yjfFbbRDfzvM3bSddFLFV4AQZg7avASddl+GrdCXpuOkQQk7BJeGIlc9/TvFczp2zeet8ysZgwAyzyFOD0qcA+2/YCvasmtNToQDfz/22UXtd1eGwa+FvXRVyvcAPM2iLwdWDWdSk+MmAeeJpcJ+/geipD6dNj5DTv1dAs8DWbt8H2WIYbYHC5P+xrqMl1UekaXR/8W1LZYuf1iF1MUfnUJlJFLRVqpAh8PZR+r0bC/3KtvUg0Euu4k/Ra5IqkH3wK20l3JouG2qfGsFNpbZPTQAX4a5u3064LuVHhBxhcvjP5JHTOSboUA5fI3vdtKp2wf1gV6p/ZSOV4Tk/WbqAGPGnzdsJ1Ia3QHgEGl/cPewq01fJihs7R/d7vUmrnEKtC/XPrKb3Wq73tG7BEvV5tszlC+wQYXF5upDWTDaw9Sc/7vkPJ1NovxCpQ/3cbKO1Vu0Qz37F5e9R1Ea3UXgEGYO2beL4Jm0ujp+i5/xnK7RRiFaj/5kbK2h6nqeds3nq5r/2NMNb/HTOujzHbgAdBt9AXMzFK8fmHydZTYf+IVaA+vpHyK326bGzAAnts3nr1QNpWad8AAzBmE/AYaAnJYs4PU3zuETK1dJhbyxQNtc9spKI5r4YqRHNex1wXslzaO8AAjFkN/BjQ77oUH80OUN69C0q9Yd21u5Ci8qkx7EndbWxkFnjC5u2U60KWU/sHGIAxPcBHgBHXpfionKW2ZxeVS6vCGMkcyVH89EYylwIdOSbgHPANm7dtvz9cZwQYgDEp4BFgi+NKvFQ32BcfYv7sRr/XDe7pp/C59fRoeVBDR4BvhbYtzvXqnAC7zJj3Afe6LsNXB3Yw99ZWevEsIOpgv7Kawp8M0+e6Fo+9ZPO2o57i1XkBBmDMrcDDoEuQxUyMUvz+B0lXcn4sxZnpovof1lN9SXcaG6kD37Z5e8h1IUnrzAADMGYEeBRY6boUH5Vy1F58iPLUiNveqte7mf/sBrKa72roEvBNm7fnXBfiQucGGIAxaeB+YJvrUnx1aCuFg9vptgnv6lAD+6dDzH95jd9zco7tB/Z2ynzXYjo7wC4zZj3RBL9aLRYxvYrS9x+CQn8yz1I8m6b8+fXYQz16dmMDs0SXjB2/K7EC7DJjssADwJ2uS/FR3WAPbaPw1p302GXq3q9A/f+uYf7P19Cru4wNHSDqrNdzD1CA/TBjRoEPAatdl+KjuT4qL32A6vRQa+fGDnYz/1vrSOvBGw1NAd+1eXvGdSE+UYAtxpguYDvwXvDjTpxvjt9MYf+9ZKvZGzs+c11U/3CE8lMrNdfVQBX4PvCqzdu2WYDfKgqwZowZIAqxW2nHnTtuUDVF/eB2ikdvI1df4l3CkqH2tUGKfzZEj7Z9XlQdeAt40ebtjOtifKUAuxZRkN0NvAcF2Q8p5agd2EnxxM30xN2trEL9myuZ/5MhutUasag68CbwsoIrngJsKYzpA3YSTfTr5LvKfA/V/fdSPjNGz9Wd/HWwe/uZ/8MRMhMZzXMtokY0Qf+KzVs9aesaKcCuhzG9wA5gK5oj+yGFXioHd1A5uYnucgqeWUHxz9aQ0QT9oqrA60RzXG2/+LrVFGA3wphuosn+u0DbulxlZnaAfXf8EziZZRsw4Logz5R5N7j0WMDrpABrhaiH7Baiyf51dPYusMeJOsSPs/DHZcaNAcaIgn7MYW2uWeA0cBg4rF6uG6cAa7Vo77FbiLbtWeu4mqRMAseAg1h7qdn/0IybFcBtwE3AUAK1+eAMUWi9rcvE1lKALado0n8LUaC102aKNeAk8A5wbOEJ6Utmxk0fsIkozNbTXvOJE0ShdUST8stHAZaUqBVjC7AZWEN4dzELRKOsd4CT2NYuIDbjJg1sIAqzTRBcY2sNOA8cJQqtpiNRaQ0FmAtRp/8aYHjhNQIM4tfc2SzR8pUJolFWok9yNuNmiCjMhoFV+HUTwALTRMfm3ML7eXXKJ08B5gtjMkRzQpcDbZhkTto54AJRWF34u5e1lQQ++5qZcZMhCrLLr9UL70ns0DrDD4bVpM37dXw6lQLMZ9F+ZX1El1OX33uB3MIre8UrTfQYrWqT98v/usS7QRX0nTAzbrK8G2o5okfopRdemSbvVaJWhitfRaJL5QJRsBeAuU7eb8t3CjARCZbW9YlIsBRgIhIsBZiIBEsBJiLBUoCJSLAUYCISLAWYiARLASYiwVKAiUiwFGABM8Z8yxhzwRijJ1gvgTHmqDFm3hgzu3D8vmaM6eSNFoOlAAuUMWYz0QN4LfCTbqsJ0k9Ya/uJdtA9C3zRcT1yHRRg4fp5YA/wx8DjbksJl7W2CPwF0QNaJDDttANmp/l54D8Be4E9xphRa+1ZxzUFx0RPmPoZoh8DCYwCLEDGmIeINvv7P9baSWPMYeAfAb/ttrKg/D9jTJVom6IJ4COO65HroEvIMD0OfMNaO7nw77+ELiOX6hPW2kGgG/gV4NvGmE55CEvbUIAFxkRPPfpp4GFjzBljzBngnwM7jTE73VYXHmttzVr7FaI97R9yXY8sjQIsPJ8gOtm2AncvvO4EvkM0LyZLYCI/RbSj6xuu65Gl0Y6sgTHGPAHst9b+2lX/+U8DXwA22hY/MajdGGOOAqNEPwSW6ElL/95a+6cu65KlU4CJSLB0CSkiwVKAiUiwFGAiEiwFmIgESwEmIsFSgIlIsBRgIhIsBZiIBEsBJiLB+v+ofyJ2V7xsKQAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "from matplotlib_venn import venn2, venn3\n", "\n", "venn2([set(a), set(b)])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAADKCAYAAAAGnJP4AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAgr0lEQVR4nO3de3Cc13nf8e+zu9hdLK4ELwBJUCJFkeLFtESZFKXaimRbdeOZWtaMZWsa1VYnoz+axkldN51MU8cM67E96SXOJLXbjidp0o6dScZWHLuxVSu2LFuWROpiXUnxIoqUCBIgQRAEsPfL6R/vrklC4O4LcPc95919PjMYjCgO3ocHu78973nPRYwxKKWUCkbEdgFKKdVJNHSVUipAGrpKKRUgDV2llAqQhq5SSgVIQ1cppQKkoauUUgHS0FVKqQBp6CqlVIA0dJVSKkAaukopFSANXaWUCpCGrlJKBUhDVymlAqShq5RSAdLQVUqpAGnoKqVUgDR0lVIqQBq6SikVIA1dpZQKkIauUkoFSENXKaUCpKGrlFIB0tBVSqkAxWwX0IlEiAH9QA8Qxfvwq30JUJn3lQdmjCFjpWBLZJ8I0Av0AV147VNrLwEMV7ZTCZgDZs1eU7ZRszUiSbx26ubKdqp1rCpAmUttlQVmMCYffLGdTYwxtmtoOyIIXqD2470RLv/eDySX+KPLwCwws9B3YyheW+XBk32SYOF26sML3KXcjRkgw6W2qbXTLDBj9prwfXiJRPHaZKF26gPiS/zJBd75WrrUXsZUrq1wNZ+GbpOIMAiMAmuBNXg9s6CdB8aAU8C4MZQs1FBXNWTXXvbVb6GMLF47jQGnzF6TtlBDfSIRYCVeG40Cqwh+OLACjHOprc6hgXHNNHSXSIRuLr0h1uL1bF1SBibwAngMmDSGwH/Zsk8iwAiX2mkF3tCAS6a5FCynzV5TsFKFyCCXPozWsPTea6vkgdPUPtiNmbFcTyhp6C6CCMuAm/ACZMhyOYtVe8O8CRw3hpbdNso+iQM3AuvxAjdMzw4qwDngbeBwy3vBImvx2mot3nBKmMziBfAxjDltu5iw0NBtQIQIsAHYBqy2XE6zZIHXgUPGMNesHyr7ZAjYDmwiXEF7NRXgLeA1s9eMNe2nisTxPry3AQNN+7l2XQAOAkcxlu4UQkJD9ypE6AG2AluAlOVyWsVQCxXDqaX8gOrwwQ14YTvcxNpcM40XKkeWPPwgsgKvnTbSHh9KCykCx4DXMGbKdjEu0tCdR4RRvB7I9bg39thKM3ihctgYGk4jkn3Si9dON+FNU+oUJbxQOWj2msmGf9ubdbARr61WtbY054zjvaaO6yyISzR0q0S4AdgFDFouxbYycAh4wRhy8/+n7JN+4Da8IZdO+lBayDhwwOw14+/4P17Yvgu4maVPEWwXWeBFvN5vx4dvx4euCCPA7XReL6SRAt4b5VVjKMk+SQLvwRty0ZWMVzoB7Dd7zUUARDbjfYCH7cFYq80Az2LMG7YLsaljQ1eEfrywXW+5FLdJZZaP/+Extv3edtybwuSSykcnefWvv85wIt/WY9vNcA54GrPAHUIH6LjQFSEK7MS77YtaLsdtayay7H45SiofJ34ix+o/ipB8U4N3nqESpd8ap7ArTSpapLzlZfIbjrTtw9dmOgI8gzHvGMZqZx0VuiKsA96LnVVQ4dGdLXH7SwVGJucFR9kw8KMMq77eTSTX8UMMYjD3T5F94DyJhLnyA7z3IrlbnkYGL5CwVV9I5IFngUOdstqtI0K3usHMe/GetKt61p/KsPvlBLHK1e8CoheLrPlihdShjg2UlUWK+05RWVeoE6oGs/EQma0vObda0UVngB9hQrgvxiK1feiKMAD8Y8K3gixYUjHseTnLhlM+b4vLhhX/J8vyb3fcbfTuObK/c5p4yvgbnlp2juzunxKPF3Q4q4EM8ON2X93W1qErwgbgLvQBUH3d2RLv319iYG7xU5tSL2RY++VkJww3iME8fJbMP50mFVnkdLl4juLun1JZdl6HGxoweDMcXrRdSKu0ZehWl+7eBrzbdi3OW302x3uf76KrvPReWGyywOgfQOJk23649Zco7x2jsDm39IUgUqGy5SVyG1/Xh2w+nAQeb8clxW0XuiKkgHvwNlpR9dxyMM2W4ymkGYscihWG/3uOwcfaLlC2Zsj//hjRvkpzlu6uPE3mPU+SjJV1vnMDM8A/YHys/AuRtgpdEdYAH6SzlqUuXiJf5leeLbBiuvnt1PdEhtVf6UbKbbFa7ePnyTw4SXe0yavvkmkKe34CfTM69NVAGXgKYw7ZLqRZ2iZ0RdgIfABdmlpfd67Eh35WIZVv3Zs9cTTHdf8+TiQf6p7cZ86Q+eBM64YCIiXKtz9OaWhSx3l9eBFjDtguohlC/aao0cD1KYjABchvSvLWlwtUEqFdZ9/qwAWoxIg+835iUysabzCkuAWR22wX0QyhD10NXJ+CCtyaEAdvEIFbo8G7KG0RvKEOXQ1cn4IO3JoQBm+QgVujwbsooQ/e0IauBq5PtgK3JkTBayNwazR4FyXUwRvK0NXA9cl24NaEIHhtBm6NBu+ihDZ4Qxe6Grg+uRK4NQ4HrwuBW6PBuyihDN5Qha4Iq4D3o4Fbn1QMH3i67Ezg1uQ3JTnzO05t4/fx8+4Ebk0lRvTAXURz3ZRs1xICtyCyxXYRixGa0BUhibfSLDQ1W3P7i1n6027O/Zy7PcXUR53YSWp7htyDk24upCnFie2/m1JFaI+J9K313uqhn6EQigATb5nqB9DjTxq74a0M60871XN7h3P/Ikl2k9Xb5/4S5f8wRqzZK82aaXaQ5Cu7ceIDynFR4J7q0fbOC0XoArcCo7aLcN7ATIFdr4TgEMRYhLHPRyj3lm1cXQzmD05RbNZeCq309kZ6xq4ja7uOEOgH7rZdhB/Oh271SPRbbdfhvFipwt0HIGqc/50CUB7sYuxzVnaQevgsmU358JzQ+9Ie4uleirbrCIH1iNxsu4hGnH6DitCDzlTw533P50jlQnF79UvZ7d2c+2Q6yEvumSV773S4TnKoxIjufz+VUhTnZn446DZEVtsuoh5nQ7e6J+49EJ4eiTVbj2VYfc7tcdyrmbo/RXpnIDMaVhYpfvZMOHf1yvSSePEOnJr54SgBPoiIs+8HZ0MXbxNyPcq6kaHpPO9+3ckn8P5EhNO/G6M02NLpUWIw+05R8XvEjovG15E6caM+WPMhhXeH7CQnQ1eEFcAO23W4zxj+0QsQCfnwS6UnxsRvtnR89/4psnUPkQyJQztJ5BNYeQAZMmtcnb/rXOhWp4fdiY7jNrb1jSx9mdAHCeDN383saMnt81CJ0gNtcjZZOUb0ld26Ws2n2xBx7vfuXOgCW4GVtotwXjJf4l1HnHtBXZMz/zqCiTZ9McBvjVNIhHhYYb7xdaTOr9TxXR+SwB7bRcznVOhWV53ttl1HKOx+uUCs0jZBAkBpOM75TzR1zHJnmtyutFvLfJvhpdsRg65W82ELIqtsF3E5p0IXeA+0x21gSy2/kGd0ou2CBICpjyUp9TdlzFIM5l9NtOcwVaaXxInNumjCpztsF3A5Z0JXhAG8oQXVyK5X2reHYxJRzv16U8Ysf3Wa7EixfT/ED+8grnN3fRlGZIPtImqcCV28KWIu1eOm0TNZhmbae+7yzN3dFFZf0wqsrgqVB8/T1aySXFSKEzuyQ3u7Pt2GiBP54kQRIgwDznwSucsYbn2tvcZxFxQVJv7lNc3bfeA8uYFye4cuwInNdOeTugWkD87cSTsRuujeCv6sH8vSE7KlvkuVubWb/Lolzd3tqlD5yIX2HVa4XCVK5Mi7dAqZTztd6O1aL0CEfmCd7TpCYcvxDujlXmbqY0vqwd1zkVyYV54t1qn1JMsRHdv1IQWst12E9dAFttkuIBQGZgosm+mI3tsvzb4vsZTjfT4y3TmBC1DuInpqg/Z2fbKeN1ZDV4QosNlmDaGx7Y3OG7cziSgXP7SoRQCbsuTbYbnvYr15U3tOjWuBNYgM2izAdk93I7qLWGOxUoXRMx0XJABcuHdRvdb7pzrzNntugOT0Mu3t+mS1t2s7dK139UPhxpO5tlt95ldxJEFmu6/ebqpMefdc5/Vya45t041wfNqMiLVTQ6yFbnUnMaeW5zlr0wnnj5VpqamP+VoM8pEL5LrsdySsmVhLstilwetDHNhk6+I2X6DbLV47PFZN5ujNdsY0satJ70z4WRr84en2n5dbj4kSObFJhxh8snaXbSV0RYjjjeeqRra+0ZFjlFeKRbjw0bpDDLvmyC0vh/NUiGY6ucn9wzYdsRwRK4ck2Orprgd9cTQUKVcYmdQHjQAzd9XtxX7oYmc+QJsvlyI+PaS9XZ9utHFRW6Grx6n7MXy+QCQkp/u2Wmk4Xu9In3dltJdbMz6qy4J9Wmvjorbe0Fb+saGzdlx7b5eb273gsuB1eQp9Fb1zqjk30lmLQ67BICK9QV808NAVYQgI8UGKARqZ1DfP5dK3LfjHt81pz+5yM8uI67Jg3wLvANro6erQgh+JfLltzj9rlsz2BYcQdqd1NdblTITI5AgtPeizjWjoqqo1Z/VNM1+lLzZ/57GowdyY0/Hc+SbW6nxdn9o7dKt7LYwEec3QWjPRvqdDXIu5PVcMJWzJkm+nQyebZXJYx7h96kZkeZAXDLqnO4xOFfNn1VRHT/S/qvSuK4YSbp/THt1CMn0k8gltG58C7e0GHbo6tOBH31yRZEFDdyG5G+OXH9N+S1p7uVdzdo2O6/oUaC4FHbqrA75eOK0+d03ng7U1k4iSu7EA3gkRnbiNo1/nRvSIdp9GEAnsYWzQoTsQ8PXCaWBW3yz15K8rA4wUKUXRmQtXk+7r3M1/FikG9AR1scB+KSJ0oXvn+tOb0SCppzBqAEYLOmZZTy6lz08WoS+oCwX5SRjYPyr0ejI6TllPcbUArCno7XM9+QRRg7aRT20Zuv0BXivcuvP6EK2e4qoowBod+a4vgmR6dLWeT4Hlk4aua7qKZWJlHYurp7QiArC6oOO5jaT7dQjGp7YMXR1e8KMvrT2TRsp9MUzUrCzpdLFG0n26B4NPOrzQsfrn9E3SUEQojBSHSvqgqJG5fh3T9Ul7uh2rL62h60NfdrQQNzolqpFMrw7B+NQd1GGVGrqu6U3rm8SH4fQ6/XDyIdOjdwOLEEhGBfILqZ6J5tD423segkM7IDULk/tsV3OFZN6d0H30Mw8x8/YOoolZPvZNd9rplb/afvzw3338AYjtWMWTn/sVHrVd0mce5aG3Z9iRiDL7zY/hTFsV4271dN8DDx2CHSmYncSddqrqBi60+iJB9XQduw188Cn46p/YrmJBUYc6cOvvfopdv+FWO5ULwuHv/LP1773nD//8XvYemmT3k2/ZX15+93qe+o1duNVWQCXiVug+CE99FffaqSqQnAoqDB3q5QJ89iisS9uuYkERh557bLnvKKkVbrXTG49tIN53bvnojZM9ccpbV/DsT05ws+2y7tvC0RUp3GorwIhboftZOLoO99qpKpCc6tCersOk4tSbxDlz44Mk+qaiFW9kbKibCxfzLLNclbOMvpoWo616uhq6frnU03VYxOjzIT+MY8MLjmur0FWqOXpHpsnPDtX6cFNZlg0kWv/wI7T0M9w5QYWuQ0+HHKejC/XdcM8JCrOrzp8+uiJdIHpokt13recl22W5SozG7iIEklNiTOt/JyKkgH/e8gv5tuNhOL4Zcr3QPQsPfBf+7Oe2qwLgQ09mWT7txhH13//0w8yNb6Zc6CWWmOW6O7/Lnt+2306vfONd0cOPfDxBIbp9JT///F38wHZJn/4+D4/PsblQpjcRY/bO6/jub+/BeltFi5Q//C13HmTvgIePw+Yc9HbD7APw3T/DfjtVPYoxb7X6IkGFbhL4VMsv1A7u+XmWlRfcCF2H3Rr7wey+0td0wU0DsQKlX/22LpDw6fsYc6rVF9HhBdfoBmO+lCO6L5AfkYoOLyxCIDkVyDvcGAqg+3r6kkvom8SH2fiUtpMP8bx2eBYhG8RFguxWzQV4rfCaC+yoplA71/OWM+OULuvO6H66izAbxEWCDN2ZAK8VXjO9On3Bh9nusa68aKA0kgokRtpCBmMCuRsPMnT11+/HbI/24BqqGOLjXRdiOmTVSK++6/wKrKW0p+saDd3GorMlpCznYjpe2UjvjC6A8imwfNKermuKXVFKUb1tric2WQY4E9fQbSQ1q9PFfNKebkfLJDV06+k6WwEY69J9BeqqYFJpdxZGOE57uh0t061jlfV0nQZgzLENul2TyFMStI18ar+erjEUgVxQ1wu12ZTtCtwWHwPgdFx7cfUkdbrYYrRlTxdgOuDrhdNFXd1aV8KbozveRays+2hdVc+sjnn7VMKYwDZWDzp0zwR8vXA6s6rLdgnOknyZ5JE4QDFC5O04edsluWqVvtv8CrSlgg7dlm8m0RbmerrIxYu2y3BS8kgBuXQewgs9egu9IINZdZqE7TJCItBcCjp0J9A9GPw5u1zbaSE9z10xnLC/V6dELaRnlmK8oGPePo0FebFAQ9cYKugQgz9jwzpWuZDe/VeE7KFu4roc+J1WTKB3Sv5kMGYqyAvaWK2iQwx+nF6VQHf9v1L0YpHEWPzyPzKCHElSsFWSq4bHtJfr0+mgL2gjdAPtyodWIR5ltkd7K5frPrhgezzXox9Ol5MyleUTxBv/TYWFTmDgoWsMU0Am6OuG0sQKDd3L9RxYcKK/juteaWCafLSiey74FHgn0NYvRnu7fpwa0VvEy/UeWLD3NpYgfjGqY5g1K8/o/FyfpoOcn1ujoeuys0NxyqJvIICu8Tyxmat+CL3araFbM3xKe/4+WXm+ZCt0T6BTxxqrRCOMr9Sl0wD9j9edofD/BvV2GiCZpjB4Qefn+nTMxkWtvFCrZ6ZZ+QeHzqGNOsRAqcLg/60bJL/oITkZ01kM649qZ8anSYw5a+PCNnsHr1m8dnicW55gNtXZS117n8vVG1qo+f5gZwdOpEzl+mPay/XpoK0LWwtdYziPt0JNNXJkQ2dP/l/2iK/e/t8PkigGdIy2i0ZOkesq6vxcH6zeadseB7P2aRMqb6xLUop0ZvB2nc6TOuSr95aJEt3f17kb4Gw8qA/QfDoc1CGUC7EdusfRPXYbK8civL2mM8Nk2fcW1XP91pD117QVfdPkBqZ1QYRPVjt7Vl+gxlAGXrdZQ2gc3Nh5vRjJlxl4bFFjlG8kSbwV77wP8g2HO3dYZZHGMOaizQJc6BUcQjeibmymL87UQGeFSd9Pc0Tyi36N/t2yzno9xQqURt+k23YdIWH9Ab710DWGWeBt23WEwus3dFSYMPTtJW3m/ng/iUwH7Tw2eoJ8xOhZaD7MASdtF2E9dKtesF1AKJxck2SuuzPmovY8l5m/o5hfxQiR7wx1xhBDpER506skbdcREr/AGOsdFydC1xjO4j1UU3WJ8ML2Dhi7K1VY9T+u6ciibw3RPd0B+zFsOEI+kddpYj5M48jzIydCt+oAHTzH0rexkSTn23xsd+DHWeIT1xS6xQiR/72ivUO3K09Je7m+HXChlwsOha4xzKDzdv15dkf7jt9JrszK/9WUIHlskNTprvadt3vTKxRjZXfeww4bx5gTtouoce0X9jw6b7exC4MJ3h5pzz2Jl/9Nnuhc026X/9tIe85kSM2Sv/6o9nJ9MMBTtou4nFOhawx5vGEG1cizOxKUou31hL5rPM/Qt5o69emVFMlnetts03yDueUZEHTGgg+HMGbSdhGXcyp0AYzhdXRPhsbyiSgv39Ret84jf2wuP169Wb463F6HV65+i+zQpG5s40MWBztxzoVu1ZPogonGDm/o5mJvewRv788zpF5rye3ydIzYN1a0x9hutEh5x3MauD7txxjnplg6GbrVHchesl2H+0R4+haohPx0ichcieGvtTRIvrOM7hNtsDx4+wvk4wWdIubDKYw5YruIhTgZulXPAeO2i3DehcEEL24NcS+ubFj7pZKf/XKvhRFk7zpi6Uh499xdc5LMdcdJ2a4jBNLAj20XcTXOhq4xVIB/wBuXUfUcvqGbU8PhfFi0/K8zpF4J5Cn8VIzYf15NqRLCoavULPmbn9HZCj54uWGMs3c1zoYugDFkgB8RwjdJ4J7aGb4lwqmXsqz4q54gL/l8L8lHhsI1myFaorzncSJ6rLov+zHG6Qfxzv8SjeE08KztOpxXjkX4yR5Cs9l59HyBNV+ysv/rX66k52AyPHdQtzxNoSfNNa3Q6xDHMeYV20U04nzoAhjDiziwO5DzZnvjHLg5BL3dYoXRfRDNWHsg9IVR4hdDsDfD+iOkV5/SbRt9uAg8YbsIP0IRulU/AWZtF+G8k2u7OT6atl1GXau+niP5ptVTDuaiRL+wlkrZ4aGr/ily25/XB2c+lIDHMMb5D1EIUehWV6s9Bu0zyb1lDrw7xcVeNx8k9P48w7IfOBEkh7tJ/OVKN4cZYgVKe54gpqvOfPkZxkzZLsKv0IQugDFM4j1YC/e81FYzEeHHd8RIJ90aaki+nmX1f3XqCfzfDpF6dACn7gyiJcq3P045kdODJn14HmOO2i5iMUIVugDGcAIN3sZyiRg/vDPiTPAmX8+y7vcSRIrOvea+OkKPK8EbLVG+40eUBqd01ZkPz2PM87aLWCzn3gB+GMObaPA25krwOhy4NS4ErwbuooQycCGkoQsavL7ZDt4QBG6NzeDVwF2U0AYuhDh0QYPXN1vBG6LArbERvBq4ixLqwIWQhy5o8PoWdPCGMHBrggxeDdxFCX3gQhuELmjw+hZU8IY4cGuCCF4N3EVpi8AFEEfOamsKEUaAe0AnlNfVVSxz57MFhqeav9Jp4Idphr+WQsptMb/0IxfI/PpZkrEmd1BSc+T3PE6kZ06X9zZQAp50dZvGpWir0AUQoRv4ILDGdi3O23E4zbajKSJNmIAv+TIjf1qg/4m2W7K6KUv+82NEBsvNCciRt8jsfJqkbmDT0EW8lWahWfjgR9uFLoAIAuwCdtquxXnD53K87/kY8dLSJ+LHJgqs+30hfqZte229ZcqfG6OwPbv0fRCkTGX7C+TWH9M7MR+OA0+EZWnvYrRl6NaIcB3wftAxs7qS+RJ37S8xNLP41WK9z2RY/V+SRPId0Wv75DnS90+RiixyeW4iS3H3E1QGL+hrsYEK8AzGvGq7kFZp69AFEKEPb5x3pe1a3GYMu17NsOmkz/1tSxVW/XmOZd/ruF7bzjS53z1NrKfib5nu8nGyu54k3lXUY3YaSONtQO70frjXqu1DF0CEKHAHsM12Lc5bdzrL7S/FiZWvHhDRC0XWfqFC99GO7bUNlSjtPUXphnyd0xwqmE2vkbnpVQLdqD2kTgE/dvnEh2bpiNCtEWEN8D5g0HIpbkvmS+x+ucDoxLxebNkw+IMMK/+iu1OGExq5d4rsJyeJJ82Vvdj+KXI7nybSN4PVLSxDIIt32kPbzE5opKNCF0CECPBu4FbQXZzqWjWZY89LEXqzcRJHc6z+SoTE2xoi8/SXKH96gvwdc6RiBUrbfkFBD5BsyACHgAMuHpPeSh0XujUi9AJ7gI22a3FapDLD/V88wrbP70AfSNZT+fB5Xv7O/2QkXmDEdjGOGweexphztguxoWNDt0aElcDtwGrbtTgmD7wAvGYMFdknCbwpeNtBHwjN8wZwwOw13skmIjcAtwH9Noty0DRez/aE5Tqs6vjQrRHhemA3MGS7FsuKwEHgF8bwjts+2Se9eO20kTZZRn4NxvDC9p09NpEI3oPbW9AVkmngF8DrGNPxS/U1dOepLiXeDmygs0LlAl7YHl0obOeTfZICtgBboaOezheAo8BBs9dcaPi3vfBdj/ea6rS7qTHgNeAkGjS/pKF7FdXlxLVQ6bVcTqtUgBN4QwhnlvIDZJ9EgOvxenVrm1eac6bwAuSY2bvEVVIiy/DaaRO07ayGAnAYOIgxF20X4yIN3QaqS4qvw+upjFoup1nSeE+OXzeGTLN+qOyTQbxQ2Ux7hEoFbznqQbPXjDftp4p04QXvNtpnOGsS707pGMaUbBfjMg3dRRChHy9QRvFWuIVpJ60M3u3em8BJY1p39LjskxjemO/1eBsPhSmAy8AE8BZwxOxt8WR9kRHgRry7hIGWXqv5pvEWNRzDmLOWawkNDd0lEiGBFyhr8ULYtSfVReAMXtCOGYOVnZpknwjeB9QoXlsN495Y+Xm88BgDxs1eSz01kV4utdNaqLPazY4sXht5bWWME4d5ho2GbpNU93ioBfAagn/DGOAcl8Jjwhj3NnWv9oJrH1ZrsXN7PceldhpreW92qUSWcymERwh+MU+JSx/cp9pti0VbNHRbpPogrg+vBzz/ew9LG5ooADPA7ALfZ10M2UaqIXy1dupjaUFTwQvWGRZoL7M3hCugRATvdXO1dlrqlpMZqq8frmyvWSCtsw6aT0PXgupS5N7qVwRvsUGk+iV4oVH7KuMtVJg1hryVgi2qTk3rxRsXjsz7MnjtU2urEl7Yps3eDnthi8TwAribS+1Te13BpddS7XsWmNWHXsHT0FVKqQC59kBDKaXamoauUkoFSENXKaUCpKGrlFIB0tBVSqkAaegqpVSANHSVUipAGroOEZFfE5HnRGRORM6IyA9E5H2263KJiJwQkWy1jS6IyN+LyDrbdblKRH5SbSc9askRGrqOEJHPAn8MfAlvU5jrgK8BH7VYlqs+YozpxdsUfAL4U8v1OElE1gN34q3cu9duNapGQ9cBIjIA/EfgN40xjxhj0saYojHme8aYf2e7PlcZY3LAt/D2pVXv9CngGeAvgIfslqJq9AhyN9yBtyvZ39ouJExEJAU8gBcs6p0+BfwRsB94RkSGjTETlmvqeBq6blgOTBrdfMSv74hICW/XrXPAP7Fcj3OqzwKuB/7GGDMpIm8AvwZ8xW5lSocX3HAeWCHeTlGqsfuMMYN4dwefBp4Q7wQGdclDwA+NMZPV//4mOsTgBA1dNzyNt33jfZbrCBVjTNkY8wjeVoU6y6NKRLqBTwB3ici4iIwD/wa4WURutlud0tB1gPFOTf088FURuU9EUiLSJSIfFpH/ZLs+V4nno8AyvIM2lec+vA+ibcAt1a+twM/wxnmVRbqfrkNE5EG8HslWvJ37nwe+aIx5ymphDhGRE3hT6sp4U6FOAl82xnzDZl0uEZFHgdeMMf923p9/AvgTYFSfH9ijoauUUgHS4QWllAqQhq5SSgVIQ1cppQKkoauUUgHS0FVKqQBp6CqlVIA0dJVSKkAaukopFSANXaWUCtD/B4MsXIt9RIW/AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "venn3([set(a), set(b), set(c)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculating k-mers from DNA sequences\n", "\n", "To extract k-mers from DNA sequences, we walk over the sequence with a sliding window:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def build_kmers(sequence, ksize):\n", " kmers = []\n", " n_kmers = len(sequence) - ksize + 1\n", " \n", " for i in range(n_kmers):\n", " kmer = sequence[i:i + ksize]\n", " kmers.append(kmer)\n", " \n", " return kmers" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['ATGGACCAGATATAGGGAGAG',\n", " 'TGGACCAGATATAGGGAGAGC',\n", " 'GGACCAGATATAGGGAGAGCC',\n", " 'GACCAGATATAGGGAGAGCCA',\n", " 'ACCAGATATAGGGAGAGCCAG',\n", " 'CCAGATATAGGGAGAGCCAGG',\n", " 'CAGATATAGGGAGAGCCAGGT',\n", " 'AGATATAGGGAGAGCCAGGTA',\n", " 'GATATAGGGAGAGCCAGGTAG',\n", " 'ATATAGGGAGAGCCAGGTAGG',\n", " 'TATAGGGAGAGCCAGGTAGGA',\n", " 'ATAGGGAGAGCCAGGTAGGAC',\n", " 'TAGGGAGAGCCAGGTAGGACA']" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "build_kmers('ATGGACCAGATATAGGGAGAGCCAGGTAGGACA', 21)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the k-mers that are output, you can see how the sequence shifts to the right - look at the pattern in the middle.\n", "\n", "So, now, you can compare two sequences!" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "seq1 = 'ATGGACCAGATATAGGGAGAGCCAGGTAGGACA'\n", "seq2 = 'ATGGACCAGATATTGGGAGAGCCGGGTAGGACA'\n", "# differences: ^ ^" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10 0.09090909090909091\n" ] } ], "source": [ "K = 10\n", "kmers1 = build_kmers(seq1, K)\n", "kmers2 = build_kmers(seq2, K)\n", "\n", "print(K, jaccard_similarity(kmers1, kmers2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading k-mers in from a file\n", "\n", "In practice, we often need to work with 100s of thousands of k-mers, and this means loading them in from sequences in files.\n", "\n", "There are three cut-down genome files in the `genomes/` directory that we will use below:\n", "\n", "```\n", "akkermansia.fa\n", "shew_os185.fa\n", "shew_os223.fa\n", "```\n", "The latter two are two strains of *Shewanella baltica*, and the first one is an unrelated genome *Akkermansia muciniphila*." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "import screed # a library for reading in FASTA/FASTQ\n", "\n", "def read_kmers_from_file(filename, ksize):\n", " all_kmers = []\n", " for record in screed.open(filename):\n", " sequence = record.sequence\n", " \n", " kmers = build_kmers(sequence, ksize)\n", " all_kmers += kmers\n", "\n", " return all_kmers" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "akker_kmers = read_kmers_from_file('genomes/akkermansia.fa', 31)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['AAATCTTATAAAATAACCACATAACTTAAAA',\n", " 'AATCTTATAAAATAACCACATAACTTAAAAA',\n", " 'ATCTTATAAAATAACCACATAACTTAAAAAG',\n", " 'TCTTATAAAATAACCACATAACTTAAAAAGA',\n", " 'CTTATAAAATAACCACATAACTTAAAAAGAA']" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "akker_kmers[:5]" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "499970\n" ] } ], "source": [ "print(len(akker_kmers))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "shew1_kmers = read_kmers_from_file('genomes/shew_os185.fa', 31)\n", "shew2_kmers = read_kmers_from_file('genomes/shew_os223.fa', 31)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see the relationship between these three like so:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "akker vs shew1 0.0\n", "akker vs shew2 0.0\n", "shew1 vs shew2 0.23675152210020398\n" ] } ], "source": [ "print('akker vs shew1', jaccard_similarity(akker_kmers, shew1_kmers))\n", "print('akker vs shew2', jaccard_similarity(akker_kmers, shew2_kmers))\n", "print('shew1 vs shew2', jaccard_similarity(shew1_kmers, shew2_kmers))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "akker vs shew1 0.0\n", "akker vs shew2 0.0\n", "shew1 vs shew2 0.38397187523995907\n" ] } ], "source": [ "print('akker vs shew1', jaccard_containment(akker_kmers, shew1_kmers))\n", "print('akker vs shew2', jaccard_containment(akker_kmers, shew2_kmers))\n", "print('shew1 vs shew2', jaccard_containment(shew1_kmers, shew2_kmers))" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAACpCAYAAACI/O4MAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAkU0lEQVR4nO2deXyU1bnHv89MMllJAgQS9kVEFhFREVERcUFtrdpr3WqFqtVyba1tvbValxCt3taq3XvrdalLtbdYl6ptFRVXVEBAZBcEwpKEECAh+2Rmzv3jTIAqgZlk5l3P9/OZT4YJ857nzXve3/uc5zznOaKUwmAwGAzWELDbAIPBYPATRnQNBoPBQozoGgwGg4UY0TUYDAYLMaJrMBgMFmJE12AwGCzEiK7BYDBYiBFdg8FgsBAjugaDwWAhRnQNBoPBQozoGgwGg4UY0TUYDAYLMaJrMBgMFmJE12AwGCzEiK7BYDBYiBFdg8FgsBAjugaDwWAhRnQNBoPBQozoGgwGg4UY0TUYDAYLMaJrMBgMFmJE12AwGCzEiK7BYDBYSIbdBhjchQgZ6Id1DFCAUoqYvVYZuoRICCgGCoBcIC/+s+PVca2F+LUGokAr0Aw0xX82A43ATpRqsPYk3Icopey2wdFIuQSAfKBH/FWw3/sQ+zplR8eMogUpgu6Ie/Z/qTLVaPEpJIwIeUAh+hwL4u8L0effcQMeiI5zbYi/9n/foBQt6bXccEhEgkAJWmT7xH8WpqGlVqB2v1c1SjWnoR3XYkT3c0i55AMDgQHoTpqHFtRUEUWLUS2wDdiqylRTCo+fECII+vwGxl+9SN/IpwF9rpXANiPCFiGSDQwGhqL7c6ZNluwAKoAKlNppkw2OwfeiK+WSDfRHd8oBaA/PanYDW+OvKlWmIuloRISOB8og9LmG0tFOAuwiLsBApVK022SH9xDJAkYCw4G+pNZhSAWNwCZgrV8F2JeiGw8ZDANGowXXSUTRgrQaqFBl3btAImQCRwCj0N6s02gH1gOrlMKXN2FKECkGxgKH4Z65mu3AKmADSkXtNsYqfCW68dDBaLQA5dhsTiI0ojvlGlWmWpP5oghF6JtwJPYNK5OlBv2w+Uwp0uLtewqRADACfZ372GxNd2gF1gAr/BD/9bzoSrkIejg9Jv7TacOtRIgCG4CVqkzVdPaf4nHaQcCR6DCCWwkDnwIrlaLebmMcichwYCLpmQyziwiwEvgYpdrsNiZdeFp0pVyGAMcDPe22JYXUAAtUmara/0MRhqPP1Y6YdLqIAWuBxUrheQ8oIUQGoK+zmz3bQxEGPkZ7vp4b8XhSdKVc+gKTgH5225JGNgELmK2ygMnoTASvEgGWA0t9G3YQKQRORk+A+oVmYAFKrbPbkFTiKdGNZyJMQk8ceZtIQZTq77XwydeExWNziGb4YXVhI/ChUmyw2xDLEBFgHHAc7pkgSzUVwLteifd6RnSlXEajh11ZdtuSdnZ/qYUdM0Oo3CAAraF2Fo+NsHmAGyYHU8FW4F2l8PbqJ+3dTgVK7TbFAbQB73vB63W96Eq5ZAHT0Eng3iaaH2VrWZjWUQcW18o+zbx3bLZPvN424C2lqLDbkLQgciTaifCrd9sZFcDbqOSyeZyEq0VXyqUPcAZ6Sa63aTm8jW13BIgWHTz9qzEnzFuToCHfroUPVvMJsNAz9R9EMoBT0KlghgPTCLzq1sUVrhVdKZcx6AmkoN22pJ3d5zRTc002ZCbmwUYCURaOD1Phm3BDNfCGUli+nDqliOQBZ6HrIhgOTgR4E6U22m1IsrhOdKVcMoEp+METUEFF1Q9aaJia26Xvrx/czKJxOSBuzE1OllZgnlJstduQLiFSApyJru5lSJzFKLXYbiOSwVWiK+XSAzgHKLLZlPQT6RVhy0+jhAd1b2Jwd0Erb07KpC3L+yMCXeXtA6VYYbchSSFyGHAqfhi1pYcNwDyUckWIyTWiGxfcr6DLDHqbSK8Im36liPZMzfLdxpwwc08O+kR4QQvvcruNSAiRkegMBT+MRtLJZuA1N9RwcMUst/8E94FYygQXIL8lxPT3omS1Ob5DpojJIhxltxGHRGQU2sM1gtt9BgNnxesGOxrHi66/BLcoQsX9MaK9U595kN8SYvr8iI+E9wRHC6/2cE+x2wyPMRA4M14IyLE42jgplwJ8I7gFUSrujxEpTl+qV35zFtPnRwiF/SS84+024guIDEOHFAypZzBwRnwlnyNxrOjGBfdcfCO4D0SJ9E1/bm1+cxbT3/OT8E5ylPDqurfTMCGFdDIUXYHNkThSdKVcMtD5it4XXBVUbPlZhEiJdYsZejRncfoH7UjMHbOo3WeSiANWLIrkoPu1WWWWfo5GxJFppY4UXXSsy0vlGDun+vqWbqeFdYWihmyOW+GJAiIJMk3ExpWLOs44Hb3nnsEapiLiuBKYjhNdKZex+GHhA8CeqS3sOd2+ZPgRm/MYVOmXTSKzgDNFbMuFnYK3y286kSAwHRFHLThxlOjG6+BOttsOSwj3a6f6evvrI5zwcYi8Jr9sDFkMnGR5qzpTwfvlRp1JHnCa3Ubsj2NEN14L9wwcZFPaiGXG2Do7hnLAYoWMWJBpC2IEoq5YzZMCRolYKIC6nsKJlrVnOBD9ERljtxEdOELg4vuYnYYfJs4Aqm9opb2/c+r+9mjOYtIy15bK6wInidDboramYN9W94Z9nICII6oROkJ00Tv0unkjxcRpOKnrBWzSydDKXB/FdzOAqfGNPNOHDivYnzVhgPg1t9sIcIDoSrnkoIs1e59YZozts5ybLnTcigyCEb+EGYrRO0SnBxNWcCKOCDPYLrroPc38Mfyqndl6yCLkdpIdzmT8Gr94uwATRUhXzeHj8Uu/dhfHI2JraM9W0ZVyKQFG2mmDZYRL2tn95Wy7zTgkh1fk+CibIUQ6RlkivfBL2qP7CAET7DTAbk/3BJvbt46aWe3ggr3LAirAcSv8tM35SBF6pfiYx2OW+TqZsYjYNmlvmwhIuQzDL8niLYe30XSc8ybPOqP/jhx6726z2wyLEFL58BcpxUyeOZ0gcKxdjdsiuvEUMX9MngHUzHLf5NRxy/1SlwFgoAj9U3SsSSk6jiG9jESkyI6G7fJ0BwOFNrVtLa3DwrSOdN8Gkb32ZNOrzi/eLsCR3T6CSH/8MnpzPwIcbUfDdomu7WkblrHrQvfGR0ev90v5R4AhIt0uRuOffu0NDkPE8slty0U3Xid3kNXt2kIsO0bjZOdnLHTGgO3ZZLb7RXgFvUini9+WXHQdV4N7CGJDTQw7PF3/eAN157SiQs7PWOiMoApw+CY/hRhGi3T5nhiN/dlAhuTp+oO2i1jaSaRcbHmy2MbuLzt39VmijKhw/zkkTg4wLOlv6Vq5o1JujcEKChCxdORt9ZP5MHRdU+/TPLbV0t0g0kVea4iSHX4qhjO2C98ZhClO7mYs9XatFl3/hBZ2X+C+NLHOGL3BO+dyaEpFkt61JHnv2OAkBiJi2YjOMtGNF7bpa1V7tqJE0TTBOx59353ZPtpPDZJZ3KB3nTWLIdxNBqQsT/uQWOnpDrCwLXtpGRV2RIHyVBGMBeizy08Tasn01RLAvRkqhg6GWtWQlaLrj3q5AE3Huzc3tzMGbPdTiKFfEnupDUmrJQarsGy0YjzddNA0wTtebgeltd47p84JAqUJ/l8jut4gFxFLwp+WiK6USxF+md2NZcVoG+L+rIXPU9AQ8tFCCUhkZCaSAxSl3RKDVfSzohGrPF3/hBaaj2pzRQnHZAkglO4I222GhSQyMitOuxUGK7HkelolDv4JLTRO8u4s/4Aa757bFykWOeQEWR9LLDFYhSXX0yrR9U/nbDnCe15uB73q/BTXhUN7PsbT9RYFiKQ9NJh2gZByyQDcU8C7u0R6e3fZbG6Ld8/twBQc4vf+cSb8Q9ofpFZ4ZY7Ya94SYpkxYj28K0yZ0SAZvtktGA7Wd3VJQH9MDvsLI7quor3U+7P7+U3ey0HunIN5uv7p1/4i7dfVCtE91BDNO4T7e1+QChq9/2DZx8FuQP+EzPxF2q+rEd1U0t7f+7P7PZr8FF44WN81outN0h4yMuGFVBIe6AfR9dPW4iGRTkuRmniuN0n7w9SKSZ/ERXcTPXmeKwnHPYzhvMNFzGMZA3mdy4mSRTY7uYJH6Imu8foJA3iNbxAhByHGd7mHXCL8k4ms4BwAsqjjUh6lhEae5DyqOBpQhNjDhTzGIOqZz0jmcR1Z7ARgAEu4nH8kdabtfRIXpFe+P5M9W8YRzGrgwqfLAdg4byAfP/YNVCwTkSjjLn+aEWdvYvVzI/nkyevIzNO29Tp8CaeWaduW/2Usa/9+CagAfce9xym3vQLA+/edSuWiM2hv7sM5v72RoqGNADRU5fD27KsIN/ZCqSCDJs/l+OvfT9ju/OYkRfednnDFldAY9xrPfAf+bx78eSD8+HJoy4JeO+HVR2BYK9QHYdo3YPMQEAU3/RV+9Kn+bn0QzrwM1h0BEoOrXoD7lsKMU+Cfp0JAQagVfvlnuKgKHhwKt1+hv6uAK1+Cez9Ozn56AAcq9uMaT7cVZDDcWgh16+B3P4cjHoCvRSFjEFTMhydyIbYWcs+HmTuhTwa0/xIevxQqAdZBzvkwoxr6C3AnPP4d2PAEDLwZvhGBzABEy+Hpb8OmW2D8w3C+gApA9BaYcwOst/lPkQieEN3E28ggxqn8jQlspo4s/shtrGQ1rzGDE3mGE1nHi5zIy0znCl6knQCvcDVn8Sjj2UoNeYSI0k6AJVzCNcymhEYe40JeYxrf4CXOZS49eRGAv3Ear3Au1/AUAIWs53v8rstnGstJXJCGnvo+ofw3WfrolXs/W/70hYw4+yXGfX0ly586kpV/vZARZ98PQG6f9Xzlf//dtmhYWPvCZZz041/R+4jdvHztT9j83jIGn1xF6YTPGH7Gct6958Z/+87Sh08lp7iKcx/8Pbs35jP3xrs4+qoFhPISi9UGo0mKblYMZv8NrtwMFVlw9G3wzGr40Qz4r2fgxnXwrRNh1nR49UW4bor+Xu2dsLQHnP09+P49kKngki9BUQPsvh3aBdbEvc3yhfDEO/r9T46Cmy+Ci34D51fCFXdDbgwWFMK022H2J/rfCdNZ/3XNDs9XwOmlUNUCOe0gP4Urn4IHzoOas+G862HyIzB/FpwzHLasgf95Hkp/AJddCr8EuBguOQFWPgoP1kOwBkIAs+HCa+Gl2bCyDI68Gy78Ntx/Haz5KSwLAk/BgOvh2hugzMY/Q6IEEMlCqbRV1bMivJB4Qv1A6pnAZgCKaCOPKnZSRAt9OYF1ABzFaio5BoD3GEMPtjKerQD0pYkMFDEEEJoJEQPaySafOoC9HjJAOyG0C5QaVDBxQRp1wTpyi5u+8Hm4Sd/M4cYcMvPqDnqMz14bRqjHDvodU0soL0rx6EVsems8AMNP30Lp0Tu/+CVRRNuyUTForcsimNlERlbiIhRI9s81qV4LLsCQNuhbBZ8Wwa6+8H19TZmxGj7S15QN/WDSGv1+QgPkNMOj8aIy80+Cx/6l32cqGKe9d4btt7NF437hgNLwPoGt76qD0dk94oqFIu9C0SIYdzm8B7AK8oIQOQ9qAM6CVW+j76et0O9MWAPwVaiug96LocdnkLMBRj4UP0YhRA+Hlo426uIPoN2QU4S+zwZBW3Df77NcFpNK67W1wtPtmrBvoDcNDGIcG1lEFW9wNGfyMQs5ljZ6AVBLCQC/4gbayWcQi7iUuWQRZSJP8SRlBGkjhxqu5Om9x36CC9jKCQRpYSb37/28nuHcy+1kUc/pPMORVCVndEb3+taEq/7Kh7+8gQ1zv4ZCmHr7z/f+rrl2OM9dfjuZufWMn/EMg6dU0VhdRFaPXXv/T06v3dRtHH7QNo699k1ev/k7PHPxvcQi2Yz52kMEMhJX0kCsGw/q13tD5SD4+kb4QxXcejT87GP47bHQoK8pR2yFd8ZD8yJ4rydUDYF1PWHddv37S8+HtSOheAc88Rc4tkF/fvmp8I8zIJoBf3lgX5t/HAa3zoQ9veAHjybp5ULnN6ArVh5+Gy65HZ7dFa/5eyQ0xiD4MAz5FlQ8C8fWoXfKGAJbn4djfgDrH4She6D3x9AzE2K50DARvrkVBg6Gihfhr/0h/Av46zfhhkfgawrkRdjbZ2+Cox+GrzZDwf3wW1v+AF0jrc8IKzpO8m3Uk8WzzOI45tCTVr7E46xkKr/gVsJkI+jUrBgB6hjBZTzCf/ILtjGBdxlFG0FWMZXL+Cm3cBMFbGNOPL4LMIMX+Ak3M5gFvM40AMawme9xCzdxF0cxj5e5Lmm7VbB7XvOaF6Yy4pw5XPTMzYyYPocFv54JwKCTNvOV/72F/3jqLoaeOo+Fv0/etg7WvjSWvD5buWjOTZxy612sffEyGrcnXoRbVBfPcUsWXD4LZs3RnunvHoc5U6HkVmjKhkA83e4P86F3HQy9Fa6/BAZ9BhkKWoPQ2BMmfgbb74YxG+Dqi/Yd/6m3oO42+OZzcPuX9n0+ayPsnA3/dw88eQ7UJutodHYDOt55uw3GFULD1cRHj+gnyH/DQ+VwcT+4JRdaA/HR3kPwSiPk9IXb/wdOK4EtmRALQ3A7DL4W3q6Bn2ZD+Eo4G+ABmPptmNMAN18Dc66GmR1t3Qsf74Kye+AP98H5lv8Buo7rRTc5z6KNIH9iFkNYwHSWAjCaar7Pr/kRdzORheSwA4ACdtOTTymhkXzClLKCSgazIl7V7HB2EADG8hE7OOwLbU1mIdvioYqetFIYnzCZxgpiBNlOflK2S7Lxzs+xa91kjv6mPucJVy+mZddQAPJLWskt1raNu3wFKhakblM++aV1tHV4iEDLrp5kFe4+aBvbFpzIwBOXIAHoP3EHofxaqpcmWjsWlHThHBuCMHUWnLIAfqHPj69Ww4ZfawGdtRB66mtKbgzemQM1d8HaP0BLLhy7HcY0QkYYfhb//vWLYcsBCk8/sAhWHf3Fzy+shqw2eCHZ4kudPWQcn6nyIYxYDuML4J7ZcM0mOGI8XPUd2LAFflEF/30KrOsD2wGGQesSeLwG7loMjzZB/hSoHQ+782H3LNgIcCEs3hAv+v0RTP45+j69DxZvO8AODD+EdbugeDlJ3k/2kdZr6yzRjQF/YgYFVHExr+/9vCqeARFFeJMvM5K3AZjIKhoYQCMh2gmwg5H0oYq+1NFMP6rjF3k9oymIhwrW7rdP20eMJ49qALZRsNfSjxgKCH1oTO5UI927WBnZ9ax5YSQAq58dRSi/BoCd6wpQcePWvzIUlFA4uJHhZ2wi3NCXqqW9CTcFqV09kaFTlx20jaweu6heqnc/3bW+B20NJRSPqk3YxlggyeF5FJgyAwZUwTP7rilL41kt7QJ3fBm+rK8p1SGojBcduXs0BKI6EyEIjPoEHtB/H/48Ckor9fuX9ys+PXsc9NR/N+b2huZ4H5/XC3aUwqQDxLkPeQIHwvH5yq/D843w4z3wk9nw0FBYuwweXRrPKNoNGQ/DWTPQ99M6yKmPh1O+BScPh3XDoHUi7CmE3S/orYmYC6MHou+nfKh/AEYC3AujesZjxf+APh1/uD/B4ChkjCHZ+8k20iq6VsR0E1/BtJARVHMCuWzjXm4HYCLPU0tfPouHAfqxhHPRKU7FNDOG1/k9PwEUJazgNJYDMIaXeZwfIUTJZieX8hgAr/EfvEAJgiKbnVwQz1z4kGNYz6kIUQK0czoPJf1IkmjiF+uf3/0WjdUjiYbzeeainzN4yoscedmTrHrmElY/G0CC7Rw140kA1v79GKqWnIpIFAm2M37mQ0gAMrJiHHHeX5j/8++DEvqMnc/gKfrhMv/e06hcdBaRtgLm/tcdFA5azlm/fJJjrvkH79/3TZ69/A5QwrDTntubTpYIsWQd3d+NgGUnQPE26KuvKf/5PKztC6/qa8oxS+DBeNraih5w8Q06XaygDh5/dN+xfv0szLwK7rsE8hrgscf15z+bBjNHQzAKOU3wmz/pz58fAZedoz+XGFz/9L7Jt8TPuJPPXbv68Icw/RM4SoGcCW//GNYCzIV+t8GVAqoEKl+AJzq+czf8ZRZcfQ1k9IYdf4fHAe6AJ++BS+6FQCa03wNPAjwMx8yAyUGIZkL4bnjIFTOPmrReW1FdDdEl2kC5XAj0TmsjTmHLnc00T3BN/maXqOnVwhsnuiZdKgU8qxRf9I5FpgCjrTfHkGaiKPVIOhuwIrzgliFF98nc7vg4X7dpyvX+Of47DZ183mypFQarSPt1tUJ091jQhjMIVTp+RrvbNPhKdNuUorMtiozoehNPiG5nnoL3CG31gejmuyg0120O1neN6HoTT4iufzzdzErvC9KePFcsCkgRB+u7RnS9iRFdVxGqzoSYt4ffDXne3RnjixzM0/XPCM5fpP26mvBCKpGoEGzwbpHvcEaUqAe3l++czh0GpVrx0ySxf0g8Z72LpP0GUmUqip+GYhk7XZu/eUhasr17bgfmUA5D2m9Qg+W4X3Tj1FjUjv3krPGup7uzyE+iqyC+3LxzDvV7g7uoR6nOslVShlWiu9Widuwnb6F3J9O2lfgptFCr1AGLl++PEV1vYcn1tOom2mZRO/aT+0kIPLhNeQxFdZ/Otq7xIon0WRNe8BaWXE9LRFeVqXr8MukQaA+QtTHtQxTL2dOjjYivJtEOPTrTk2kHr+pmcBOVVjRi5U3koxDDUu/Fdav6eO+cOicC8epzh6YinYYYLKMJpbzj6cbxT4ghf6H3clm3lXg3Vv1FqpRKuHTjpnQaYrAMyx6eRnTTQfanIaTVO55hJBCltqeJ5x4IpWrYb78wg2vxnuiqMtVK4kM2dyNKyFuatt1ELaemdxsq4P26EvvYfOj/8m+YEIO7iWBRPBes31xvtcXt2Uev570TYlh9mJ8m0CqViu8cnTib0mCHwTo2o5RlI1Orb6YNsN8W6F4mZ3WIzEr3e7uNOWFqihPfuNL9rOrCd7bgl+wcb7LGysYsFd34kmBLT9BWer7s/rjuuqF+WoXWTFe8Vr39in9Gcd6iHqUszayyY9i4GhfspJoSCudmI23uFd5oIMb6IX7yclcnkbXwedbggs0qDV+gKyObbmG56Koy1YAejnmfQFuAHu+7N8SwpbTVRwsiYnTHW1WqhfgW5QbXECG+KaeV2HVDrbSpXevp9ax7J9RWH+Ze25Nnk1Ldrobnn37tDdZbUeDm89giuqpMbcEvyyezKkJkr3JfHmdtUQt1hSG7zbCQFd0+glLVWJh6ZOgWMeBjOxq2c+i40Ma2raXkj0FX7SihUCwa56cVaBVKpSyH3D/92t2sQSlbdrWxTXRVmaoAquxq31KyN4bIX+geb3dbiZ+83BiwIGVH0yvUNqXseIZ0EAGW2NW43ZMkH9rcvnX0fTAE7c6f3Y5KjMVjM+02w0LWdGExxKFYiF8ydNzJCpSybTcbW0VXlakd+CVvN7M2g15/d763++mwVppz/SK6rcBHKT+qUnXApyk/riEVtGFTLLcDuz1d0EM7f6xSK/5zLsGdzq2125wV5pMj/JSXu1CptPW9hfilX7uLBXZkLOyP7aKrylQbfpl8kKhQ+nvnLpZYdFSUWND2PmER25VK4yhL5+2+n7bjG7rCVpSyfWTtiBtMlak1+KVSU/6iHAped97uyBsGNlFZkmO3GRYRAd5JeytKrcdMqjmFMFZc8wRwhOjGeROwJYXDckp/l0Noi3NWqtXnt7LwqFy7zbCQd5SyLE/8XUyYwQl8iFKOKErkGNFVZSoMvA44d/idKiQqDCwLOKLQeXswylvHB31UL3elUqy3rDUdZphvWXuGA7HFCWGFDhwjugCqTNXilw6auSOTfg/YP6n2wYSwj7IVaoAPLG9Vqc+wobCKAYAG9CjaMThKdGFvfNfyIhS20OODHAr/ZV98d+3QJraV+iWO2wq81o0qYt3lfcwSYauJAK/Gd212DI4T3TjvATvtNsISSh7MIWuD9Z1iZ2ErS8b6JY6rgDeUosk+C1QMHT5rsM0G//EmSu2y24jP40jRjRc7fxU/TKxJVBh0SyaZ26ybWKvPb2Xe5BCIX+K47yvlgI1Rtcf1KtButyk+YDFKObLUpiNFF0CVqUbgJfwgvMHmIEP+K8OS7X3q89t47aRMH9XJna+Ug0ouas/rDUzB83SyHqUW221EZzj6xlNlqgnfCG9jkCE3ZpBZnT7h3ZPXxmsnZdCe6ZcKYu85SnA7UGozMA9TnyEdbATestuIg+Fo0QUfCu/gGzPI2J76rIaG3Dbmnuw3wXVuxoBSG9DiYIQ3dVQAb8Tj545F9J56zkfKJQ84Fyi025a0EymKUHF/jEjf1JRX7BDccMgvgvuuUi7ZKFJkOHAaLnCAHM4GYJ7TBRdcJLoAUi65wDlAb7ttSTuRgihb72yn7bDuFaCpLWrh7eNDPhHcGNrDdUwifEKIDEULr5+2R0olnwJv4xIxc5XoAki5BIGTgFF225J2VFCx/bpm6qfnJf9dFGuHNbN0TK5PshSagddTuAOEtYgUA9OBfLtNcREKWIhSy+w2JBlcJ7odSLmMBE7GD95B/bRmtn83GxVKbAgaCUZ5f0LYRwsftqHzcB2VBJ80IjnAmUCp3aa4gDA6nLDZbkOSxbWiCyDl0gvdSb0f520bFGbrnRApPnicd09eG29OCvhkaa9Cb7uyRCmPTEiJBNDOhPdHcl2nHr3SrM5uQ7qCq0UXQMolE5gKDLfblrQTy46x7bY2mscf2IPdXNrMBxOyfVITtxWYpxRb7TYkLYiMAiYDfnh4JsNnwHso5ZwqfUnietHtQMrlcOAEwPtD6vppzdRcGyKWr0MrLVntLDwq4qN6uJvQE2bOq0ucSkTy0Q7FALtNcQAtwLsotcluQ7qLZ0QXQMolBBwHjAW8PXkUzY2y/TvNLJoZYNmoHJ94t3vQK8y22G2IpYiMRjsUfvV6PwPmO61wTVfxlOh2EI/1TgIG2W1LGlkPLGS2CqGHoV72hsLozQSXK+WDessHQnu9JwJDbbbESvagi49vstuQVOJJ0e1AyqU/2kMottuWFFIJLIjvpLwXEQYDxwO9bLEqPcSAlcBS12cmpAqREvR17me3KWmkBT1ButoNix2SxdOi20FcfMegvQQ3DsMj6ATwVars4KXqROgPHAkMwb0hlhZ0TeVVSuGILVYch4gXH7LtwDJgOUp5thKbL0S3AymXHHQqzmjckYRej/b0Po1vZ5QwIuSjHzSjALdsq16J3mFhk43Fxt2DiKAdibFAf3uN6RZNwGpglVfitgfDV6LbgZSLoOO9Y4CBOMv7bQe2AqtVmep2OpQIGcBh6AdN3+4eLw20ob341UpRZ7Mt7kWkCC2+hwOpqdmRfrahnYoKtyzhTQW+FN39ief59kNPRA3A+uGaAmrRQrsFqFFl6YljiZCNfsh0vOzYOUIBO9Be7Tag2reTY+lAJAMYgc5b74+zHAqAOnTK31qUqrfXFHvwveh+nngIoj9agEuBHkAqi8WE0bOyHUK7TZXZk+gtQm+0+A5CP2zSFYbYxT6RrVIK+zfk9AMimehrOwQYDGTZYIUCqtFlFzehlPdLtB4CI7oJEK9u1uMArxDakwigJ60Uegv5jlcDWmD3vuwS2EQQIYReUl0IFMRfhehzzUA/fPZ/AKn4K4w+18b4z4b9/62U2Z7GdnT8tw86k6c4/r4nqfeEG9EORS16RFPj5tVj6cCIriFpRBDP1DrwMyJB9AinEB1q+vwrg31OhUKn8EXRS7CbP/dqBHaiVIu1J+E+jOgaDAaDhTgtyG4wGAyexoiuwWAwWIgRXYPBYLAQI7oGg8FgIUZ0DQaDwUKM6HYDEdkkIi0i0igiu0XkHyLi5XKSiMjXReSj+DlXici/RORku+0yGNyCEd3u8xWlVD56KfF24Lc225M2ROSHwK+Ae4AS9CqnPwDn22iWIc2IyFtxp8KOFW2ew4huilC6OtLf0EV0PIeIFAJ3At9RSj2nlGpSSrUrpV5SSv3IbvsM6UFEhgJT0IsjzrPXGm9gRDdFiEgucAnwod22pInJ6NoMz9ttiMFSZqD79GPATHtN8QYZdhvgAV4QkQiQh15rfpbN9qSL3kCtUipityEGS5kBPAAsAD4UkRKl1HabbXI1xtPtPhcopYrQXuB3gbdFpNRek9LCTqBYdOlAgw+IT5AOAeYopRajN4j8ur1WuR8juilCKRVVSj2HLgjixdn8D9AFxy+w2Q6DdcwE5iqlauP/fhoTYug2xmtJEaJL552HLpe32mZzUo5Sql5E7gB+Hw+nzEXvcnEGME0pdZOtBhpSiojkABcDQRGpjn+cBRSJyHil1DL7rHM3RnS7z0siEkXP7lYAM5VSK222KS0ope6P34C3AU+ha+YuBu621TBDOrgAPWobB/9WdH4OOs57ow02eQJT2tFgMHwBEXkFWKmUuvFzn18M/AYYaCZVu4YRXYPBYLAQM5FmMBgMFmJE12AwGCzEiK7BYDBYiBFdg8FgsBAjugaDwWAhRnQNBoPBQozoGgwGg4UY0TUYDAYLMaJrMBgMFvL/DCDfFismjj4AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "venn3([set(akker_kmers), set(shew1_kmers), set(shew2_kmers)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Let's hash!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Choose a hash function!\n", "\n", "We need to pick a hash function that takes DNA k-mers and converts them into numbers.\n", "\n", "Both the [mash](https://mash.readthedocs.io/en/latest/) software for MinHash, and the [sourmash](https://sourmash.readthedocs.io) software for modulo and MinHash, use MurmurHash:\n", "\n", "https://en.wikipedia.org/wiki/MurmurHash\n", "\n", "this is implemented in the 'mmh3' library in Python.\n", "\n", "The other thing we need to do here is take into account the fact that DNA is double stranded, and so\n", "\n", "```\n", "hash_kmer('ATGG')\n", "```\n", "should be equivalent to\n", "```\n", "hash_kmer('CCAT')\n", "```\n", "Following mash's lead, for every input k-mer we will choose a *canonical* k-mer that is the lesser of the k-mer and its reverse complement." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "import mmh3\n", "\n", "def hash_kmer(kmer):\n", " # calculate the reverse complement\n", " rc_kmer = screed.rc(kmer)\n", " \n", " # determine whether original k-mer or reverse complement is lesser\n", " if kmer < rc_kmer:\n", " canonical_kmer = kmer\n", " else:\n", " canonical_kmer = rc_kmer\n", " \n", " # calculate murmurhash using a hash seed of 42\n", " hash = mmh3.hash64(canonical_kmer, 42)[0]\n", " if hash < 0: hash += 2**64\n", " \n", " # done\n", " return hash" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is now a function that we can use to turn any DNA \"word\" into a number:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "13663093258475204077" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_kmer('ATGGC')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The same input word always returns the same number:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "13663093258475204077" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_kmer('ATGGC')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "as does its reverse complement:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "13663093258475204077" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_kmer('GCCAT')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and nearby words return very different numbers:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1777382721305265773" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_kmer('GCCAA')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Note that hashing collections of k-mers doesn't change Jaccard calculations:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "def hash_kmers(kmers):\n", " hashes = []\n", " for kmer in kmers:\n", " hashes.append(hash_kmer(kmer))\n", " return hashes" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "shew1_hashes = hash_kmers(shew1_kmers)\n", "shew2_hashes = hash_kmers(shew2_kmers)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.23675152210020398\n" ] } ], "source": [ "print(jaccard_similarity(shew1_kmers, shew2_kmers))" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.2371520123045373\n" ] } ], "source": [ "print(jaccard_similarity(shew1_hashes, shew2_hashes))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(ok, it changes it a little, because of the canonical k-mer calculation!)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Implementing subsampling with modulo hashing\n", "\n", "We are now ready to implement k-mer subsampling with modulo hash.\n", "\n", "We need to pick a sampling rate, and know the maximum possible hash value.\n", "\n", "For a sampling rate, let's start with 1000.\n", "\n", "The MurmurHash function turns k-mers into numbers between 0 and `2**64 - 1` (the maximum 64-bit number).\n", "\n", "Let's define these as variables:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "scaled = 1000\n", "MAX_HASH = 2**64" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, choose the range of hash values that we'll keep." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.844674407370955e+16\n" ] } ], "source": [ "keep_below = MAX_HASH / scaled\n", "print(keep_below)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and write a filter function:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "def subsample_modulo(kmers):\n", " keep = []\n", " for kmer in kmers:\n", " if hash_kmer(kmer) < keep_below:\n", " keep.append(kmer)\n", " # otherwise, discard\n", " \n", " return keep" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Now let's apply this to our big collections of k-mers!" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "akker_sub = subsample_modulo(akker_kmers)\n", "shew1_sub = subsample_modulo(shew1_kmers)\n", "shew2_sub = subsample_modulo(shew2_kmers)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "499970 502\n", "499970 513\n", "499970 503\n" ] } ], "source": [ "print(len(akker_kmers), len(akker_sub))\n", "print(len(shew1_kmers), len(shew1_sub))\n", "print(len(shew2_kmers), len(shew2_sub))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So we go from ~500,000 k-mers to ~500 hashes! Do the Jaccard calculations change??" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "akker vs akker, total 1.0\n", "akker vs akker, sub 1.0\n" ] } ], "source": [ "print('akker vs akker, total', jaccard_similarity(akker_kmers, akker_kmers))\n", "print('akker vs akker, sub', jaccard_similarity(akker_sub, akker_sub))" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "akker vs shew1, total 0.0\n", "akker vs shew1, sub 0.0\n" ] } ], "source": [ "print('akker vs shew1, total', jaccard_similarity(akker_kmers, shew1_kmers))\n", "print('akker vs shew1, sub', jaccard_similarity(akker_sub, shew1_sub))" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "shew1 vs shew2, total 0.23675152210020398\n", "shew1 vs shew2, sub 0.2281795511221945\n" ] } ], "source": [ "print('shew1 vs shew2, total', jaccard_similarity(shew1_kmers, shew2_kmers))\n", "print('shew1 vs shew2, sub', jaccard_similarity(shew1_sub, shew2_sub))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And you can see that the numbers are different, but not very much - the Jaccard similarity is being *estimated*, so it is not exact but it is close." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Let's visualize --" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAACpCAYAAACI/O4MAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAkU0lEQVR4nO2deXyU1bnHv89MMllJAgQS9kVEFhFREVERcUFtrdpr3WqFqtVyba1tvbValxCt3taq3XvrdalLtbdYl6ptFRVXVEBAZBcEwpKEECAh+2Rmzv3jTIAqgZlk5l3P9/OZT4YJ857nzXve3/uc5zznOaKUwmAwGAzWELDbAIPBYPATRnQNBoPBQozoGgwGg4UY0TUYDAYLMaJrMBgMFmJE12AwGCzEiK7BYDBYiBFdg8FgsBAjugaDwWAhRnQNBoPBQozoGgwGg4UY0TUYDAYLMaJrMBgMFmJE12AwGCzEiK7BYDBYiBFdg8FgsBAjugaDwWAhRnQNBoPBQozoGgwGg4UY0TUYDAYLMaJrMBgMFmJE12AwGCzEiK7BYDBYSIbdBhjchQgZ6Id1DFCAUoqYvVYZuoRICCgGCoBcIC/+s+PVca2F+LUGokAr0Aw0xX82A43ATpRqsPYk3Icopey2wdFIuQSAfKBH/FWw3/sQ+zplR8eMogUpgu6Ie/Z/qTLVaPEpJIwIeUAh+hwL4u8L0effcQMeiI5zbYi/9n/foBQt6bXccEhEgkAJWmT7xH8WpqGlVqB2v1c1SjWnoR3XYkT3c0i55AMDgQHoTpqHFtRUEUWLUS2wDdiqylRTCo+fECII+vwGxl+9SN/IpwF9rpXANiPCFiGSDQwGhqL7c6ZNluwAKoAKlNppkw2OwfeiK+WSDfRHd8oBaA/PanYDW+OvKlWmIuloRISOB8og9LmG0tFOAuwiLsBApVK022SH9xDJAkYCw4G+pNZhSAWNwCZgrV8F2JeiGw8ZDANGowXXSUTRgrQaqFBl3btAImQCRwCj0N6s02gH1gOrlMKXN2FKECkGxgKH4Z65mu3AKmADSkXtNsYqfCW68dDBaLQA5dhsTiI0ojvlGlWmWpP5oghF6JtwJPYNK5OlBv2w+Uwp0uLtewqRADACfZ372GxNd2gF1gAr/BD/9bzoSrkIejg9Jv7TacOtRIgCG4CVqkzVdPaf4nHaQcCR6DCCWwkDnwIrlaLebmMcichwYCLpmQyziwiwEvgYpdrsNiZdeFp0pVyGAMcDPe22JYXUAAtUmara/0MRhqPP1Y6YdLqIAWuBxUrheQ8oIUQGoK+zmz3bQxEGPkZ7vp4b8XhSdKVc+gKTgH5225JGNgELmK2ygMnoTASvEgGWA0t9G3YQKQRORk+A+oVmYAFKrbPbkFTiKdGNZyJMQk8ceZtIQZTq77XwydeExWNziGb4YXVhI/ChUmyw2xDLEBFgHHAc7pkgSzUVwLteifd6RnSlXEajh11ZdtuSdnZ/qYUdM0Oo3CAAraF2Fo+NsHmAGyYHU8FW4F2l8PbqJ+3dTgVK7TbFAbQB73vB63W96Eq5ZAHT0Eng3iaaH2VrWZjWUQcW18o+zbx3bLZPvN424C2lqLDbkLQgciTaifCrd9sZFcDbqOSyeZyEq0VXyqUPcAZ6Sa63aTm8jW13BIgWHTz9qzEnzFuToCHfroUPVvMJsNAz9R9EMoBT0KlghgPTCLzq1sUVrhVdKZcx6AmkoN22pJ3d5zRTc002ZCbmwUYCURaOD1Phm3BDNfCGUli+nDqliOQBZ6HrIhgOTgR4E6U22m1IsrhOdKVcMoEp+METUEFF1Q9aaJia26Xvrx/czKJxOSBuzE1OllZgnlJstduQLiFSApyJru5lSJzFKLXYbiOSwVWiK+XSAzgHKLLZlPQT6RVhy0+jhAd1b2Jwd0Erb07KpC3L+yMCXeXtA6VYYbchSSFyGHAqfhi1pYcNwDyUckWIyTWiGxfcr6DLDHqbSK8Im36liPZMzfLdxpwwc08O+kR4QQvvcruNSAiRkegMBT+MRtLJZuA1N9RwcMUst/8E94FYygQXIL8lxPT3omS1Ob5DpojJIhxltxGHRGQU2sM1gtt9BgNnxesGOxrHi66/BLcoQsX9MaK9U595kN8SYvr8iI+E9wRHC6/2cE+x2wyPMRA4M14IyLE42jgplwJ8I7gFUSrujxEpTl+qV35zFtPnRwiF/SS84+024guIDEOHFAypZzBwRnwlnyNxrOjGBfdcfCO4D0SJ9E1/bm1+cxbT3/OT8E5ylPDqurfTMCGFdDIUXYHNkThSdKVcMtD5it4XXBVUbPlZhEiJdYsZejRncfoH7UjMHbOo3WeSiANWLIrkoPu1WWWWfo5GxJFppY4UXXSsy0vlGDun+vqWbqeFdYWihmyOW+GJAiIJMk3ExpWLOs44Hb3nnsEapiLiuBKYjhNdKZex+GHhA8CeqS3sOd2+ZPgRm/MYVOmXTSKzgDNFbMuFnYK3y286kSAwHRFHLThxlOjG6+BOttsOSwj3a6f6evvrI5zwcYi8Jr9sDFkMnGR5qzpTwfvlRp1JHnCa3Ubsj2NEN14L9wwcZFPaiGXG2Do7hnLAYoWMWJBpC2IEoq5YzZMCRolYKIC6nsKJlrVnOBD9ERljtxEdOELg4vuYnYYfJs4Aqm9opb2/c+r+9mjOYtIy15bK6wInidDboramYN9W94Z9nICII6oROkJ00Tv0unkjxcRpOKnrBWzSydDKXB/FdzOAqfGNPNOHDivYnzVhgPg1t9sIcIDoSrnkoIs1e59YZozts5ybLnTcigyCEb+EGYrRO0SnBxNWcCKOCDPYLrroPc38Mfyqndl6yCLkdpIdzmT8Gr94uwATRUhXzeHj8Uu/dhfHI2JraM9W0ZVyKQFG2mmDZYRL2tn95Wy7zTgkh1fk+CibIUQ6RlkivfBL2qP7CAET7DTAbk/3BJvbt46aWe3ggr3LAirAcSv8tM35SBF6pfiYx2OW+TqZsYjYNmlvmwhIuQzDL8niLYe30XSc8ybPOqP/jhx6726z2wyLEFL58BcpxUyeOZ0gcKxdjdsiuvEUMX9MngHUzHLf5NRxy/1SlwFgoAj9U3SsSSk6jiG9jESkyI6G7fJ0BwOFNrVtLa3DwrSOdN8Gkb32ZNOrzi/eLsCR3T6CSH/8MnpzPwIcbUfDdomu7WkblrHrQvfGR0ev90v5R4AhIt0uRuOffu0NDkPE8slty0U3Xid3kNXt2kIsO0bjZOdnLHTGgO3ZZLb7RXgFvUini9+WXHQdV4N7CGJDTQw7PF3/eAN157SiQs7PWOiMoApw+CY/hRhGi3T5nhiN/dlAhuTp+oO2i1jaSaRcbHmy2MbuLzt39VmijKhw/zkkTg4wLOlv6Vq5o1JujcEKChCxdORt9ZP5MHRdU+/TPLbV0t0g0kVea4iSHX4qhjO2C98ZhClO7mYs9XatFl3/hBZ2X+C+NLHOGL3BO+dyaEpFkt61JHnv2OAkBiJi2YjOMtGNF7bpa1V7tqJE0TTBOx59353ZPtpPDZJZ3KB3nTWLIdxNBqQsT/uQWOnpDrCwLXtpGRV2RIHyVBGMBeizy08Tasn01RLAvRkqhg6GWtWQlaLrj3q5AE3Huzc3tzMGbPdTiKFfEnupDUmrJQarsGy0YjzddNA0wTtebgeltd47p84JAqUJ/l8jut4gFxFLwp+WiK6USxF+md2NZcVoG+L+rIXPU9AQ8tFCCUhkZCaSAxSl3RKDVfSzohGrPF3/hBaaj2pzRQnHZAkglO4I222GhSQyMitOuxUGK7HkelolDv4JLTRO8u4s/4Aa757bFykWOeQEWR9LLDFYhSXX0yrR9U/nbDnCe15uB73q/BTXhUN7PsbT9RYFiKQ9NJh2gZByyQDcU8C7u0R6e3fZbG6Ld8/twBQc4vf+cSb8Q9ofpFZ4ZY7Ya94SYpkxYj28K0yZ0SAZvtktGA7Wd3VJQH9MDvsLI7quor3U+7P7+U3ey0HunIN5uv7p1/4i7dfVCtE91BDNO4T7e1+QChq9/2DZx8FuQP+EzPxF2q+rEd1U0t7f+7P7PZr8FF44WN81outN0h4yMuGFVBIe6AfR9dPW4iGRTkuRmniuN0n7w9SKSZ/ERXcTPXmeKwnHPYzhvMNFzGMZA3mdy4mSRTY7uYJH6Imu8foJA3iNbxAhByHGd7mHXCL8k4ms4BwAsqjjUh6lhEae5DyqOBpQhNjDhTzGIOqZz0jmcR1Z7ARgAEu4nH8kdabtfRIXpFe+P5M9W8YRzGrgwqfLAdg4byAfP/YNVCwTkSjjLn+aEWdvYvVzI/nkyevIzNO29Tp8CaeWaduW/2Usa/9+CagAfce9xym3vQLA+/edSuWiM2hv7sM5v72RoqGNADRU5fD27KsIN/ZCqSCDJs/l+OvfT9ju/OYkRfednnDFldAY9xrPfAf+bx78eSD8+HJoy4JeO+HVR2BYK9QHYdo3YPMQEAU3/RV+9Kn+bn0QzrwM1h0BEoOrXoD7lsKMU+Cfp0JAQagVfvlnuKgKHhwKt1+hv6uAK1+Cez9Ozn56AAcq9uMaT7cVZDDcWgh16+B3P4cjHoCvRSFjEFTMhydyIbYWcs+HmTuhTwa0/xIevxQqAdZBzvkwoxr6C3AnPP4d2PAEDLwZvhGBzABEy+Hpb8OmW2D8w3C+gApA9BaYcwOst/lPkQieEN3E28ggxqn8jQlspo4s/shtrGQ1rzGDE3mGE1nHi5zIy0znCl6knQCvcDVn8Sjj2UoNeYSI0k6AJVzCNcymhEYe40JeYxrf4CXOZS49eRGAv3Ear3Au1/AUAIWs53v8rstnGstJXJCGnvo+ofw3WfrolXs/W/70hYw4+yXGfX0ly586kpV/vZARZ98PQG6f9Xzlf//dtmhYWPvCZZz041/R+4jdvHztT9j83jIGn1xF6YTPGH7Gct6958Z/+87Sh08lp7iKcx/8Pbs35jP3xrs4+qoFhPISi9UGo0mKblYMZv8NrtwMFVlw9G3wzGr40Qz4r2fgxnXwrRNh1nR49UW4bor+Xu2dsLQHnP09+P49kKngki9BUQPsvh3aBdbEvc3yhfDEO/r9T46Cmy+Ci34D51fCFXdDbgwWFMK022H2J/rfCdNZ/3XNDs9XwOmlUNUCOe0gP4Urn4IHzoOas+G862HyIzB/FpwzHLasgf95Hkp/AJddCr8EuBguOQFWPgoP1kOwBkIAs+HCa+Gl2bCyDI68Gy78Ntx/Haz5KSwLAk/BgOvh2hugzMY/Q6IEEMlCqbRV1bMivJB4Qv1A6pnAZgCKaCOPKnZSRAt9OYF1ABzFaio5BoD3GEMPtjKerQD0pYkMFDEEEJoJEQPaySafOoC9HjJAOyG0C5QaVDBxQRp1wTpyi5u+8Hm4Sd/M4cYcMvPqDnqMz14bRqjHDvodU0soL0rx6EVsems8AMNP30Lp0Tu/+CVRRNuyUTForcsimNlERlbiIhRI9s81qV4LLsCQNuhbBZ8Wwa6+8H19TZmxGj7S15QN/WDSGv1+QgPkNMOj8aIy80+Cx/6l32cqGKe9d4btt7NF437hgNLwPoGt76qD0dk94oqFIu9C0SIYdzm8B7AK8oIQOQ9qAM6CVW+j76et0O9MWAPwVaiug96LocdnkLMBRj4UP0YhRA+Hlo426uIPoN2QU4S+zwZBW3Df77NcFpNK67W1wtPtmrBvoDcNDGIcG1lEFW9wNGfyMQs5ljZ6AVBLCQC/4gbayWcQi7iUuWQRZSJP8SRlBGkjhxqu5Om9x36CC9jKCQRpYSb37/28nuHcy+1kUc/pPMORVCVndEb3+taEq/7Kh7+8gQ1zv4ZCmHr7z/f+rrl2OM9dfjuZufWMn/EMg6dU0VhdRFaPXXv/T06v3dRtHH7QNo699k1ev/k7PHPxvcQi2Yz52kMEMhJX0kCsGw/q13tD5SD4+kb4QxXcejT87GP47bHQoK8pR2yFd8ZD8yJ4rydUDYF1PWHddv37S8+HtSOheAc88Rc4tkF/fvmp8I8zIJoBf3lgX5t/HAa3zoQ9veAHjybp5ULnN6ArVh5+Gy65HZ7dFa/5eyQ0xiD4MAz5FlQ8C8fWoXfKGAJbn4djfgDrH4She6D3x9AzE2K50DARvrkVBg6Gihfhr/0h/Av46zfhhkfgawrkRdjbZ2+Cox+GrzZDwf3wW1v+AF0jrc8IKzpO8m3Uk8WzzOI45tCTVr7E46xkKr/gVsJkI+jUrBgB6hjBZTzCf/ILtjGBdxlFG0FWMZXL+Cm3cBMFbGNOPL4LMIMX+Ak3M5gFvM40AMawme9xCzdxF0cxj5e5Lmm7VbB7XvOaF6Yy4pw5XPTMzYyYPocFv54JwKCTNvOV/72F/3jqLoaeOo+Fv0/etg7WvjSWvD5buWjOTZxy612sffEyGrcnXoRbVBfPcUsWXD4LZs3RnunvHoc5U6HkVmjKhkA83e4P86F3HQy9Fa6/BAZ9BhkKWoPQ2BMmfgbb74YxG+Dqi/Yd/6m3oO42+OZzcPuX9n0+ayPsnA3/dw88eQ7UJutodHYDOt55uw3GFULD1cRHj+gnyH/DQ+VwcT+4JRdaA/HR3kPwSiPk9IXb/wdOK4EtmRALQ3A7DL4W3q6Bn2ZD+Eo4G+ABmPptmNMAN18Dc66GmR1t3Qsf74Kye+AP98H5lv8Buo7rRTc5z6KNIH9iFkNYwHSWAjCaar7Pr/kRdzORheSwA4ACdtOTTymhkXzClLKCSgazIl7V7HB2EADG8hE7OOwLbU1mIdvioYqetFIYnzCZxgpiBNlOflK2S7Lxzs+xa91kjv6mPucJVy+mZddQAPJLWskt1raNu3wFKhakblM++aV1tHV4iEDLrp5kFe4+aBvbFpzIwBOXIAHoP3EHofxaqpcmWjsWlHThHBuCMHUWnLIAfqHPj69Ww4ZfawGdtRB66mtKbgzemQM1d8HaP0BLLhy7HcY0QkYYfhb//vWLYcsBCk8/sAhWHf3Fzy+shqw2eCHZ4kudPWQcn6nyIYxYDuML4J7ZcM0mOGI8XPUd2LAFflEF/30KrOsD2wGGQesSeLwG7loMjzZB/hSoHQ+782H3LNgIcCEs3hAv+v0RTP45+j69DxZvO8AODD+EdbugeDlJ3k/2kdZr6yzRjQF/YgYFVHExr+/9vCqeARFFeJMvM5K3AZjIKhoYQCMh2gmwg5H0oYq+1NFMP6rjF3k9oymIhwrW7rdP20eMJ49qALZRsNfSjxgKCH1oTO5UI927WBnZ9ax5YSQAq58dRSi/BoCd6wpQcePWvzIUlFA4uJHhZ2wi3NCXqqW9CTcFqV09kaFTlx20jaweu6heqnc/3bW+B20NJRSPqk3YxlggyeF5FJgyAwZUwTP7rilL41kt7QJ3fBm+rK8p1SGojBcduXs0BKI6EyEIjPoEHtB/H/48Ckor9fuX9ys+PXsc9NR/N+b2huZ4H5/XC3aUwqQDxLkPeQIHwvH5yq/D843w4z3wk9nw0FBYuwweXRrPKNoNGQ/DWTPQ99M6yKmPh1O+BScPh3XDoHUi7CmE3S/orYmYC6MHou+nfKh/AEYC3AujesZjxf+APh1/uD/B4ChkjCHZ+8k20iq6VsR0E1/BtJARVHMCuWzjXm4HYCLPU0tfPouHAfqxhHPRKU7FNDOG1/k9PwEUJazgNJYDMIaXeZwfIUTJZieX8hgAr/EfvEAJgiKbnVwQz1z4kGNYz6kIUQK0czoPJf1IkmjiF+uf3/0WjdUjiYbzeeainzN4yoscedmTrHrmElY/G0CC7Rw140kA1v79GKqWnIpIFAm2M37mQ0gAMrJiHHHeX5j/8++DEvqMnc/gKfrhMv/e06hcdBaRtgLm/tcdFA5azlm/fJJjrvkH79/3TZ69/A5QwrDTntubTpYIsWQd3d+NgGUnQPE26KuvKf/5PKztC6/qa8oxS+DBeNraih5w8Q06XaygDh5/dN+xfv0szLwK7rsE8hrgscf15z+bBjNHQzAKOU3wmz/pz58fAZedoz+XGFz/9L7Jt8TPuJPPXbv68Icw/RM4SoGcCW//GNYCzIV+t8GVAqoEKl+AJzq+czf8ZRZcfQ1k9IYdf4fHAe6AJ++BS+6FQCa03wNPAjwMx8yAyUGIZkL4bnjIFTOPmrReW1FdDdEl2kC5XAj0TmsjTmHLnc00T3BN/maXqOnVwhsnuiZdKgU8qxRf9I5FpgCjrTfHkGaiKPVIOhuwIrzgliFF98nc7vg4X7dpyvX+Of47DZ183mypFQarSPt1tUJ091jQhjMIVTp+RrvbNPhKdNuUorMtiozoehNPiG5nnoL3CG31gejmuyg0120O1neN6HoTT4iufzzdzErvC9KePFcsCkgRB+u7RnS9iRFdVxGqzoSYt4ffDXne3RnjixzM0/XPCM5fpP26mvBCKpGoEGzwbpHvcEaUqAe3l++czh0GpVrx0ySxf0g8Z72LpP0GUmUqip+GYhk7XZu/eUhasr17bgfmUA5D2m9Qg+W4X3Tj1FjUjv3krPGup7uzyE+iqyC+3LxzDvV7g7uoR6nOslVShlWiu9Widuwnb6F3J9O2lfgptFCr1AGLl++PEV1vYcn1tOom2mZRO/aT+0kIPLhNeQxFdZ/Otq7xIon0WRNe8BaWXE9LRFeVqXr8MukQaA+QtTHtQxTL2dOjjYivJtEOPTrTk2kHr+pmcBOVVjRi5U3koxDDUu/Fdav6eO+cOicC8epzh6YinYYYLKMJpbzj6cbxT4ghf6H3clm3lXg3Vv1FqpRKuHTjpnQaYrAMyx6eRnTTQfanIaTVO55hJBCltqeJ5x4IpWrYb78wg2vxnuiqMtVK4kM2dyNKyFuatt1ELaemdxsq4P26EvvYfOj/8m+YEIO7iWBRPBes31xvtcXt2Uev570TYlh9mJ8m0CqViu8cnTib0mCHwTo2o5RlI1Orb6YNsN8W6F4mZ3WIzEr3e7uNOWFqihPfuNL9rOrCd7bgl+wcb7LGysYsFd34kmBLT9BWer7s/rjuuqF+WoXWTFe8Vr39in9Gcd6iHqUszayyY9i4GhfspJoSCudmI23uFd5oIMb6IX7yclcnkbXwedbggs0qDV+gKyObbmG56Koy1YAejnmfQFuAHu+7N8SwpbTVRwsiYnTHW1WqhfgW5QbXECG+KaeV2HVDrbSpXevp9ax7J9RWH+Ze25Nnk1Ldrobnn37tDdZbUeDm89giuqpMbcEvyyezKkJkr3JfHmdtUQt1hSG7zbCQFd0+glLVWJh6ZOgWMeBjOxq2c+i40Ma2raXkj0FX7SihUCwa56cVaBVKpSyH3D/92t2sQSlbdrWxTXRVmaoAquxq31KyN4bIX+geb3dbiZ+83BiwIGVH0yvUNqXseIZ0EAGW2NW43ZMkH9rcvnX0fTAE7c6f3Y5KjMVjM+02w0LWdGExxKFYiF8ydNzJCpSybTcbW0VXlakd+CVvN7M2g15/d763++mwVppz/SK6rcBHKT+qUnXApyk/riEVtGFTLLcDuz1d0EM7f6xSK/5zLsGdzq2125wV5pMj/JSXu1CptPW9hfilX7uLBXZkLOyP7aKrylQbfpl8kKhQ+nvnLpZYdFSUWND2PmER25VK4yhL5+2+n7bjG7rCVpSyfWTtiBtMlak1+KVSU/6iHAped97uyBsGNlFZkmO3GRYRAd5JeytKrcdMqjmFMFZc8wRwhOjGeROwJYXDckp/l0Noi3NWqtXnt7LwqFy7zbCQd5SyLE/8XUyYwQl8iFKOKErkGNFVZSoMvA44d/idKiQqDCwLOKLQeXswylvHB31UL3elUqy3rDUdZphvWXuGA7HFCWGFDhwjugCqTNXilw6auSOTfg/YP6n2wYSwj7IVaoAPLG9Vqc+wobCKAYAG9CjaMThKdGFvfNfyIhS20OODHAr/ZV98d+3QJraV+iWO2wq81o0qYt3lfcwSYauJAK/Gd212DI4T3TjvATvtNsISSh7MIWuD9Z1iZ2ErS8b6JY6rgDeUosk+C1QMHT5rsM0G//EmSu2y24jP40jRjRc7fxU/TKxJVBh0SyaZ26ybWKvPb2Xe5BCIX+K47yvlgI1Rtcf1KtButyk+YDFKObLUpiNFF0CVqUbgJfwgvMHmIEP+K8OS7X3q89t47aRMH9XJna+Ug0ouas/rDUzB83SyHqUW221EZzj6xlNlqgnfCG9jkCE3ZpBZnT7h3ZPXxmsnZdCe6ZcKYu85SnA7UGozMA9TnyEdbATestuIg+Fo0QUfCu/gGzPI2J76rIaG3Dbmnuw3wXVuxoBSG9DiYIQ3dVQAb8Tj545F9J56zkfKJQ84Fyi025a0EymKUHF/jEjf1JRX7BDccMgvgvuuUi7ZKFJkOHAaLnCAHM4GYJ7TBRdcJLoAUi65wDlAb7ttSTuRgihb72yn7bDuFaCpLWrh7eNDPhHcGNrDdUwifEKIDEULr5+2R0olnwJv4xIxc5XoAki5BIGTgFF225J2VFCx/bpm6qfnJf9dFGuHNbN0TK5PshSagddTuAOEtYgUA9OBfLtNcREKWIhSy+w2JBlcJ7odSLmMBE7GD95B/bRmtn83GxVKbAgaCUZ5f0LYRwsftqHzcB2VBJ80IjnAmUCp3aa4gDA6nLDZbkOSxbWiCyDl0gvdSb0f520bFGbrnRApPnicd09eG29OCvhkaa9Cb7uyRCmPTEiJBNDOhPdHcl2nHr3SrM5uQ7qCq0UXQMolE5gKDLfblrQTy46x7bY2mscf2IPdXNrMBxOyfVITtxWYpxRb7TYkLYiMAiYDfnh4JsNnwHso5ZwqfUnietHtQMrlcOAEwPtD6vppzdRcGyKWr0MrLVntLDwq4qN6uJvQE2bOq0ucSkTy0Q7FALtNcQAtwLsotcluQ7qLZ0QXQMolBBwHjAW8PXkUzY2y/TvNLJoZYNmoHJ94t3vQK8y22G2IpYiMRjsUfvV6PwPmO61wTVfxlOh2EI/1TgIG2W1LGlkPLGS2CqGHoV72hsLozQSXK+WDessHQnu9JwJDbbbESvagi49vstuQVOJJ0e1AyqU/2kMottuWFFIJLIjvpLwXEQYDxwO9bLEqPcSAlcBS12cmpAqREvR17me3KWmkBT1ButoNix2SxdOi20FcfMegvQQ3DsMj6ATwVars4KXqROgPHAkMwb0hlhZ0TeVVSuGILVYch4gXH7LtwDJgOUp5thKbL0S3AymXHHQqzmjckYRej/b0Po1vZ5QwIuSjHzSjALdsq16J3mFhk43Fxt2DiKAdibFAf3uN6RZNwGpglVfitgfDV6LbgZSLoOO9Y4CBOMv7bQe2AqtVmep2OpQIGcBh6AdN3+4eLw20ob341UpRZ7Mt7kWkCC2+hwOpqdmRfrahnYoKtyzhTQW+FN39ief59kNPRA3A+uGaAmrRQrsFqFFl6YljiZCNfsh0vOzYOUIBO9Be7Tag2reTY+lAJAMYgc5b74+zHAqAOnTK31qUqrfXFHvwveh+nngIoj9agEuBHkAqi8WE0bOyHUK7TZXZk+gtQm+0+A5CP2zSFYbYxT6RrVIK+zfk9AMimehrOwQYDGTZYIUCqtFlFzehlPdLtB4CI7oJEK9u1uMArxDakwigJ60Uegv5jlcDWmD3vuwS2EQQIYReUl0IFMRfhehzzUA/fPZ/AKn4K4w+18b4z4b9/62U2Z7GdnT8tw86k6c4/r4nqfeEG9EORS16RFPj5tVj6cCIriFpRBDP1DrwMyJB9AinEB1q+vwrg31OhUKn8EXRS7CbP/dqBHaiVIu1J+E+jOgaDAaDhTgtyG4wGAyexoiuwWAwWIgRXYPBYLAQI7oGg8FgIUZ0DQaDwUKM6HYDEdkkIi0i0igiu0XkHyLi5XKSiMjXReSj+DlXici/RORku+0yGNyCEd3u8xWlVD56KfF24Lc225M2ROSHwK+Ae4AS9CqnPwDn22iWIc2IyFtxp8KOFW2ew4huilC6OtLf0EV0PIeIFAJ3At9RSj2nlGpSSrUrpV5SSv3IbvsM6UFEhgJT0IsjzrPXGm9gRDdFiEgucAnwod22pInJ6NoMz9ttiMFSZqD79GPATHtN8QYZdhvgAV4QkQiQh15rfpbN9qSL3kCtUipityEGS5kBPAAsAD4UkRKl1HabbXI1xtPtPhcopYrQXuB3gbdFpNRek9LCTqBYdOlAgw+IT5AOAeYopRajN4j8ur1WuR8juilCKRVVSj2HLgjixdn8D9AFxy+w2Q6DdcwE5iqlauP/fhoTYug2xmtJEaJL552HLpe32mZzUo5Sql5E7gB+Hw+nzEXvcnEGME0pdZOtBhpSiojkABcDQRGpjn+cBRSJyHil1DL7rHM3RnS7z0siEkXP7lYAM5VSK222KS0ope6P34C3AU+ha+YuBu621TBDOrgAPWobB/9WdH4OOs57ow02eQJT2tFgMHwBEXkFWKmUuvFzn18M/AYYaCZVu4YRXYPBYLAQM5FmMBgMFmJE12AwGCzEiK7BYDBYiBFdg8FgsBAjugaDwWAhRnQNBoPBQozoGgwGg4UY0TUYDAYLMaJrMBgMFvL/DCDfFismjj4AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "venn3([set(akker_kmers), set(shew1_kmers), set(shew2_kmers)])" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAACoCAYAAABDoD2pAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAeMklEQVR4nO2de5TdVZXnP/vW+5GkklTeTyBAyIOEkIiBKIioKC3S0ugMDeJjdBbitDo2ODow1de02O20MCNL1pqZtVC7hW5RGJYYRGAAeSWBRF55PyCpJJWkkkoqqapbr3vvnj/OLQikUo/U/Z3ze5zPWr9VN7du7tm3fud+f/u39z77iKri8Xg8HjukXBvg8Xg8ScKLrsfj8VjEi67H4/FYxIuux+PxWMSLrsfj8VjEi67H4/FYxIuux+PxWMSLrsfj8VjEi67H4/FYxIuux+PxWMSLrsfj8VjEi67H4/FYxIuux+PxWMSLrsfj8VjEi67H4/FYxIuux+PxWMSLrsfj8VjEi67H4/FYxIuux+PxWMSLrsfj8VjEi67H4/FYxIuux+PxWMSLrsfj8Vik1LUBUULSkgJqCkdt4WcJIIUDIHvC0QMcB45pg/ZaN7hIiFABjMLMl74jBWjhAPNZ24F2VfIu7PQMExEBxgLjMPO5+oSjBqjg3fkN757vHiDTz3EUaEG1x96HiB6iqoO/KmEUxHUiMAUYj5mQtUAV707A4ZIBWgvHscLRrA3aNUJzi4IIpUAdMLqfn5XDeCsFOjEC3HbCz2agRRU/4VwhMhqYDNQDEzBzOwjH6xhwCDiMOe/NqPoLcQEvurwjsvXA1MIxGXt3AYeARmCPNmizpTEBEKEOmFE4pmC8miDpBvYDTUCTKkcCHi/ZGE92EjCrcNQ5sqQHM8d3A3uS7gknVnQLQjsTOBcjtGVuLQKMh7iXd0W4qJOz4M1OxXzuGZiQgUs6MSK8C3hblZxbc2KCyGTMvJ7F8O5SbJDHnPMdwE5Us47tsU7iRFfSMgqYi5mU1Y7NGYgcsB3YoA06Io9QhLHAAuBswhvH7wK2AVtUaXVsS/QQKcWc33mYsEEU6Mac802oHnNtjC0SIboFr3YWcB4wjdOPy7piH/CmNmjjcP6TCDOBhZjPHCWagM0Y79fHAgdCpBY4HzgHKHdszUjYC2xAhzfHo0isRbcgtnOBJYTbqx0qrcAGYJs29H9bJkIZxotfgEmCRZkuYBPwhiqJjgOehEglZl7PI16lnweAl1E94NqQoIit6Epa5gBLib7w9EcXsA7YrA3mBIqQwgjtEqLt8fRHF/AqsCnxcV+RMoxnez7hyEMERSNGfGOXbI2d6EpaJgCXYEq+4k4LsJq/03Lgg8TzAnMibcAaVd52bYgTRM4BLsKULiYBxcR8V8ep4iE2oitpqQQ+gAknJIPeCb003drLzkuFtYvK6a4IuuQrLOwFXkpMwk2kBvgQpuokiWSA5+IS742F6EpapgMfISkegIrS8u8ytFxXBWUmntdbkuPVed3snBWH2PVQyAOvqPK6a0MCxXi3FxO/kNHpsA14Kepeb6RFV9IiwIXABUSvIuH0yNZl2ZvO0n1m//WX+ydkeP7CSnKlcUquDEQj8Iwq3a4NKSomUXYZyfVuT0UGeAbVfa4NOV0iK7qSlmrgckyxfzLInNfNvjtKyI8auNa2vaqHpz8odNTEOdFyIu3AU6pYXdEXGCLjgE/gfvFKWFFgDapvujbkdIik6EpapmEENxnhBICjn87Q/OVKGKIHmy3J8eKSHpomJeVvlAfWqhLJL+I7iMzGhMqScsEcCduA51GNVEVL5ERX0rIEE1JIRjhBS5T9f9tF24rhi6eibDw7w5vn1gRgWVjZBTwbybpekSWYMkfP0DkIPIlqxrUhQyVSoitpWY5ZYZUMsnVZ9tyZo2dGxYjeZ/+EDM8trSRfkpQ47yFgVWSEVySFid/OcWxJVGkHVkVlKXFkRFfScjGm+D8ZZEfn2H13juzE4mStD43t5P8tr0RTybhDMG0FV4U+wWYE96PAGa5NiTgZjPAedW3IYETC85G0rCBJgpurzdH4k2zRBBdgwtEqLlvbBRG5yo6ceuAvRELXZetdjOB+DC+4xaAa+HQhCRlqQi+6kpYPY9aXJ4NcdY7dP8nSO3lkIYX+mNxSxaUvdyZIeMcTVuE1vW4vxzRi8hSHSuAqROpcGzIQoQ0vFGpwP4xp3pIM8pV5dt/VO+IY7mDsmZzhhaVJWUQBcAQTauh0bcg7iFyG6QzmKT4dwCOodrg2pD/C7OkuI1GCW5Fn93/vCVxwAWYcqGb5nyOT7S0C44BPFZq4u0ekrxWjJxhqgI8XegyHjlCKrqRlNrDYsRl22XdHNz2z7d0Gz26qZuHWUHoCATEe07/ALSIzME1rPMEyAXOnHDpCJ7qSljGY8pnk0PJXGTKL7C9imLe9mgktodgY0xJnizjMD5hY40dJSo25e+Ygsti1Ee8nVKIraSnFZHOT09yj8+xuDt/gZtVYCmHF+hLKeyK1omeEXCzioO2nSDlmaW9y5nY4WIZIqPpXhEp0MbcDoS/5KBq56hz77khBiTvPp7KnjA+ti8YiguKQAq5wUNFwCTDG8pgec1dxGSKhWQ4fGtGVtCwgaStymr7fQ26s+zX2E49UMX9bkuK7tcDlIpZu80VmYTaN9LihkjDE8wuEQnQlLWMxOx8kh5br3MRxT8WCbdWMPxru1VvFZTo2krUiFYToC59gZiMSCqcuFKILrCA8tgRPz5ReDl8froJ9E98VJB/Owu1gWCIS+BZHFxOPTVHjwMVhCDM4FzpJyznAFNd2WOXAN3NDbtFok+qucuZvT1L9bgkm1hoMJoHjwwrhoRLj4DnF6Rdf0tK3oWJyaFveSef8cHm5JzJvZxVVXf1u7x5TZogE0PvA9FVYXvT39YyUMxBx6uS59rYugBCuiw8KLVEO/sdwbx5Zkk+x9M0kVTMAXFTYwr6YnIOvVggrH3A5uDPRlbSMIkmdwwCOXt1Jbnz46zSnHaxizPEkCe9oYH7R3s0sP/XNyMPLpMIOHU5w6ekuw8TUkkG+Ik/L58IvuACCsHRDkhZMgEmqFavvxQJ88izsLCt0erOOE9EteLlnuRjbGS2f6yRfG8oGHP0y8UgV9UeSVEJWAZw34ncxK88Wjfh9PEEzFkdJTlee7jyStP5cRWn9ZPDdw4rN/O1J83bnFWHBxFwomsfsCZbzXQxqXXQlLSUkqWUjQNuKrkG3TQ8jkw9VUdGdJOGtZeRNxUfuLXtsMQ6RybYHdeHpnkWSKhYAjl4dTa8+hXDOriSFGGAku5SITMdXLESN4iVQh4gL0bX+IZ3SM6mXrrnRvcic1ei+N4RdpouctnAmZ1up+HCG7VVqVkVX0jIR01w4ORz9bLRLr6q6y5jSnKSeu3A6joFIMUITHvukMHF4qwPaJFmegJYoxy+NflJl7lt51yZY5pzT2NpnDklKDscLq1snWRNdSUuKpG013baii3xN9BJo72fi4aQl1MqB4Ta+9l5udBmDiLVYvE1PdxKQrPhg2/J4dOxKIUxtTlpCbfqQX2ligvZ3o/AUE2sXTZuiO/RJHBc658fnIjP1oGsLbDOc+ToTH1qIOrNtDWRTdKdZHMs9PVN6ydXFR3QnHI3PZxkatcOoYvChhegzCRErVUZWRFfSUkHSqhbal8WrPWJVdxk1Hb2uzbDM4N6uaeGYvLu4+CFYOo+2PN1pJO32q2NpPOK5JzKtOWmiO5S7s3Ew7EoHTziZZGMQm6KbLDrPjX6p2PuZ0uzaAttMHUKf3WTdwcUbK+fSlugmazuerjN60Or4ta2sPxqN1pTFoxyoH+Q1XnTjwzgb7R4DF11Ji0Dgm/+Fi6458Yrn9lGeLU1YvS4M3kthMFH2RIdSTMvHQLHh6dZaGic89E6NXzy3j9pMPC8op2bUKX9jkmjj7JnisUDgF1EbYpgsLxegZ2p8LzKjOpLm6Q40f+tImkMRfwK/iNqYMKf2FOJK76T4VmqMbo+vF98/A4mu35InftQEPYD3dIMgOz5+SbQ+ajvie0Hpn4GcBi+68SPwc+pFt9hoiZIbHd+6zdrOpIlujcgpN1D1ohs/Aj+nNsQhmPBChlLu5VbylKKUMI313MCjPMRlbOMKupnAzXyHSbQDsJnJPMZNtDOTuTzC53kyELt667OQKu6S2ce/dRPH9yykpKKNax9IA/D209N57Rc3oPkyRHIs/OsHmHPlLl775SLeeuIzIIpIjnnXPci5V+8omi1VnQF68YdLYeGtkC2FfAksWw+PPwpPjIcvfhUytTB9N7x4H4zJwbPj4KYvQmc1aAq+/jCkNwRg2CigtZ/nA78VtclouLMculLmj5k7AHduhupPwddaYXwdtDwO//tcyNwCH3gQrlSgArr/Ee6/AfY6/gjFIBaebjBd2SvJ8lXu4jZW8i1WcoD5vMIZnMVOPs/dlNPyntfX0cFl/BtnBCS2fWTri59omn3ZSyy9+afvee7NB65lzpWP8tlfrWTOlb9j46+vBeCcq7bwl//yAz77q5Vc8JVfsuFfbyyqLRW9AV6ox2bh5bvg0ErYtRLemA/3ngHfvhauewpab4eaDHxjhXn9bZ+CD6+H5r+He/8P3H19QIad6osYO0/3GbirGVYegDsBvg6fXARbjsIdi2DLzXAlwFw4/AL802H4wVfh97fBDW4tLxpliATaZ8SG6AYzRgoYg2k32EsJWrgFXMweznyf4AJMoY0L2U2KYLPvGsD6gbnXbKe6vuOk53s6zAWtp72KsppWAKrru5FU3++LvypO8gGGF0qAGYUWkh0lkCsxJ/qtc+FHfzbPf2E1vLj43f/TXmhS0lwFo44FZNip5nDsF4u8Dou+B6sBvgerX4PFAP/JnJQMwL+Ht49bqG+1SKDn1UZ4Ibjb0SzCXdxOFxOYybMs4+3AxhoqWmonu3/Bl3/Nmru/yVtP/BWKcOkd//jO7177+WJ2PvGXZLtHc8FX7inquKmge2h0Ccy8HY5OgA89C5ccgopOqC7sXrHwKByvM4/veRSu+ibUXg7ZcvjZ3QEZdSrRjVW5mABXwDcBPgnP/QKe74DRF8ExgKVwrKOfHM334ZK5EERYxxWBntfoeroApSi3sZJb+C5Hmc0GpgY21lBRSzm0LY9cypxPPsh1v/kvzPn4g6z9nze987vFX3qNa/+1gUVfuJctD3+m6GNLPsALS6VC80rY9F3YORueHmCL7H9aBpethvbvwk/ugdu+DL1BXBRONYdjlVR8DH58EH74OPx0FVx2F5x94u9LAIH3nPsfw7nPwopfwsM2bQ2YyItu8Iynk4lsZWOCdho+sn05i7/4KgAXfGU9nUdmn/Sauddsp6e9ntZdtZatKwJnd8KCrfCnM6G7CjKFufrmWBjdah4/vQL+Zp15fMtbkC2DDRH8rOHgkkKy8AJoWwqvvQiza+D42sJS6LUwphra+l5/P0y7E278OfxsPpwc/vL0iw3RDWZTwwPU0lJI0nVQxkHmMYEDgYw1HMTSKtnSymNsecRsqLf5obmU15oWYE2vTEALf/K3nppJPl/KmJntRR1bUwF5eK/XwvZC4rW5DN6YB/P2w5lb4XtLzPP/vBwufs08rjsCvyrs5PrQZCO657ed9LYj51RzODYLRZqgfDdU9D1+A+adD02L4PUfwXKAH8HyRfA6wLMw7ha4eSXcdzXErf1coBuximqw80bScgNBZHnfYBp/5EsoKRRhOuv4a1bxWy5nK5+gl9GU0cYE3uRr/Av7GM0v+K/kqERQUnTzdRoYS3G3F+9Y3M3elcVNYD32jf9A+4FzyPXUUlrRxswP/Y662QfZ9JvPo/kUUtLL+Tc+wFkfa+S5H36CQxuXI6kcqZIezrv2oaKWjOVRfv0XAYnu/dPg218yFUsqcNE6WLUK/lhvSsY6a2BaI7xwn6l0+M0U+PaN0F1h7vT/5iG4fVMAhj2m2k85lMhVxKRt6R+h/ga4GSAPJRfD2kfhDxuh5ir42jEYVwdHHoP/dR5klsKNG2DJaDgC0Fdi5vRDFI/7UQ3Mc7chutdjmt4kg8z8Lvb8g5VtP5yQS+V58FPxCEsNnUdV2X/SsyJXAGfaN8cTMPehGtgtq40vT6eFMcJD6eH4LgEG6C5LWpcxOPUczli1wmOD3iAFF+yI7nELY4SHsuZSyMUm1ncSnVVJ6zKmnJA8eh9edONH4AlBL7rFRlQoOR5fb7CtOr4XlP7JqJ5yQY0X3fgR+Dm1IbpBZJPDTenh+HqD7TVJE92BnAYvuvEjFqKbLE8XoKw50JITp7TVxGpBwBAYyGnwtanxw4tuJCnfH19vsK0m3onCkxlo/h6DgHt5eGxzJOgBbIhuBwEXG4eOsn3xLalqr45vr+D+ObXoquax8CX1WOVQ0AMELg7aoErSvN2q7fH0BntKs3RXxPOznZrWQX4f+JfUY40sg5/vEWPLI2uyNE44qNhdTqojfhUMh8b1uDbBMt3QT5vQ93LYhiEeKxwm6NVi2BPdOHSUHx5Vm+MnUE0TXVtgmybVQfsreNGND1bOpU1PN77Jpf6oWefaguLTNDHQjvohZCjOwhGgN2hDPFY4aGMQK6KrDdpD/DoRDUzty/ESqExFD5nqeH2mwdk36CtMMi15d3LxIw/ssTGQzSx7siZm2aEySo7GxwNqHh+/GPXAHFcdcgJ4d6CWeGxwAFUrIUGboju41xA3qjfER3T3Jy6eO5z52kjSwmfxw9qF06boNgPxSy4NxKgX47F6Ky95miYWf5PLcDP0OzPVLizFAz2BET/R1QbNA2/ZGi8U1K6pJNUe/dvyA/Vd9JQnqT63m+HH93YFYIfHDkdRtbaWwPbKqSC6+ocXyQmjn+52bcaI2XJmfFfY9c9WVYZ7sdxB0lZexodtNgez+mXSBj1M0m7Dxj1SDkHunhswmcoeDk6I704YJ6OcjnOgmsF7u1EkB2y1OaALD2ajgzHdUXaojKrNxd2HzSbbZ0U/PDI89gyjauH9JOtOLh7sLMTkreFCdN8iaVv4jH0kmgm1vOTZMStpCbTTF07VJiys3fcUFesXSuuiW0iobbE9rlNq11ZQ0hq98rGmiUlLoB1n5AXy3tuNDodRtb5oy1WCZBNJqmsUFeoei1653MazkyS4AJuG0GthMLYC0Q0nJYs3XAzqRHS1QTuA7S7Gdsa431aRaotOfPRAfYYjdUkKLXQCm0f8Lqq9wKsjfh9P0LSgusPFwC5LgV6BYZflRJdUb4r6+6Ph7eZR1i1IWrPy9apFa1yzCWgv0nt5guFlVwM7E92Ct/umq/GdUPdYFaXN4RfePVM6aastd22GRVopZp5BNQfEsM1cbGhC1Upzm/5wXfT+GknaUVVUmHRvuAvos6kc6xckKawAsEa16AsbtuO38gkrzrxccCy62qC9wGqXNlindn0l1a+G90Kz8ezuhG3Js0uVxqK/q9mBIFlzOxrscFGxcCKuPV20QXeStLaPk+8pg97webztVT1smlPl2gyLZIGXAnt31X0krTwy3HQS5PkeIs5Ft8ALJGkr67JDZUz8ebjKivKS5/mlgERzIcfp8Ypq4AmvNfikWlh43vbqs/4Ihehqgx7HCG9yGPtoNTUvhyfM8Np5XbSOSVLybLeqhUSuaYz9XODjeAZjB6q7XBsBIRFdAG3QrSTtVmzqjyspPey+mmH/hAxbz6x2bYZFjgPPWBtNdS9Jm9vhIhRhhT5CI7oFXgQOuTbCGqnuFNN+gNP4bmdFLy9cmKQuYlngSVXrDfVX46sZXKDA02EIK/QRKtHVBs0BT5KkZZSVb5cz8T43PXfzkue5ZXmypaGaBwHzgiot1kc1K9X+SJLmdjhYXUhohobQfdm0QduBp0lSb4axv69yEt9949yuhC313axqt2H1e1BtA57CNzu3xRZUN7g24v2ETnQBtEH3Autd22GVqf9QSeU2ey0vd8zsYPOcJMVxmwlDXM+0f3RvR/w5QEiT86EUXQBt0D8DobtKBUaqN8WM71dQsTP428+3p2d45fyawMcJD4eAP6iGpCxRdRNJa+ZvlzbgSVRDeUcRWtEF0AZ9iST1Z0h1p5j53XLKdwUnvI2TM6xZnDQPd5Uq4dqrTvVFLO/NlRA6gN+jGtqNEkItugDaoKuB113bYY1Ud4pZt5ZTvqf4IrFvYoYXL0zSirODwGMOKhWGyp+Ana6NiBEZjOC2uTZkIEIvugDaoGtJUo/SVFeKmX9bSllT8YT3QH2G55ZVJWjF2QHCLbh9/RmeJmm9pYOhHfgdqsdcGzIYYs57NJC0LAWWuLbDGrnaHHv+vpfus0ZWR9s4OcNLS6rQVJIE9w9F7I8bLCICrADOc21KRDkOrAq7h9tHpEQXQNKyELiIiHjpI0ZFOfiNDMc+PvzEVx7l9fM62XJWkmK42zC1uNFrkC+yEPggkJSLYzHYBzyFarhi9gMQOdEFkLRMAj4K1Lq2xRqtH8tw8OZKKBvaxaanNMvzS7M01ydltVkWeEk14sttRaYDVwBJ6oNxumwA1oS1SuFURFJ0ASQtFcBHgJmubbFG1xk97E0LubFlA77uWG03z1xUQmdVUrbcOYZZ2huPZbYiY4BPAHWOLQkreeAFVCN5gY2s6PYhaVkELCMp4YZcbY69DT10ze2/CqFxcobVF1SSL0nG3wN2AM9HJn47VETKgQ8BZ7k2JWS0Ac+gesC1IadL5EUX3gk3XAEkp+D/yGcyHL6xAi3s8tBdlmXdgl4apyWlJCyL2WZnk2tDAkXkDEySLSnndSA2AmtRjV68/gRiIboAkpZy4EJgPknxerPjsuz/Vg+broJ1CyvoLUvKNjtvA6stNCAPByKVGOE907UpjmgD/lRYQh15YiO6fUhaxgKXAFNd22KBg8BL/J2WARcD4xzbEzStmGRZsrZ36sN4vctJTgI5j9nO/uWoe7cnEjvR7UPSMhP4APEUog5grTbojr4nRBBgLrCU+N2KdmAaIG1VTVD3uf4QKcHczS0G4lyZsgNYh+px14YUm9iKLoCkRYA5mLDDaMfmFIPDmF4UO7Wh/zIZEUown3kBMN6ibUHQjikL2hiaZjVhwSTaFgELgThVqezBeLb2ex5bItaieyKSlukYT3A20Yr5KrAbeFMbdP9w/qMIUzBfyllEp+BegUZgM7An8Z7tYIhUYzzfuUT3DiePmeMb0OHN8SiSGNHtQ9JSBZyLmaRh9n57MftqbdCGkS1vFGEU734xw1p034H5vFtU6XBtTOQQSWESbfOAyY6tGSoZzDnfjGpiznniRPdEJC3TMAI8jXB4CW2Y26tGoEkbips8ECEFTAJmYBaVuI53HweaMF5Oo/dqi4TIOMwFdhYwyrE17ycL7MXEbHdFbTVZMUi06J5IoephKjCl8NNGkiKPac7SCDRqg7ZaGPMdRKjBCPAMzIUnaC+4HSOyTUBTYkq+XGIEeFbhmOjIigzmwrob2IdqouPzXnRPgaRlHEZ8x2FKdPqO00la5DFeXStmyeqxwuMWbdBQrKQqVD+MAsb0c9Qy9JhwFiOuJx5twEFVYpeJjhQiVZg7nQmFo57iOxd5oAWT9D0EHIpzUux08KI7TAo9H/oEuAYo4b2ClMMITxbowQhsmzZE9w9dqIjou+CUFH72JSO1cPQCbap+t9tIIVKLqXKpKRzVJxyVmPOcwszxfOHoxXivGUwsvu9xK3AkiSGD4eBF1+PxeCwSpdIpj8fjiTxedD0ej8ciXnQ9Ho/HIl50PR6PxyJedD0ej8ciXnRHiIjsEpFOEWkXkaMiskpEZri2K0hE5HoRWVf4zPtF5A8issK1XR5PFPCiWxw+raq1mNVsB4F7HNsTGCLyn4H/AdyJKbSfCdwLfMahWZ4AEZFnCw5FhWtb4oAX3SKiql3AbzFNR2KHmA0TfwDcoqoPq2qHqvaq6qOqeqtr+zzFR0RmY/ZqU+Bqt9bEAy+6RURMm73PA2tc2xIQyzGrlP6va0M81vgCZj7/ArjJrSnxIE7Nj13yiIhkMcsoD2G2z44j44HDGqOtUzyD8gXgLmAtsEZEJqnqQcc2RRrv6RaHa1S1DuMFfgP4k4hEpafpcGgB6kXEX6wTQCE5Ogt4UFXXAzuB691aFX286BYRVc2p6sOYpjdxzOavBrqBaxzb4bHDTcATqnq48O8H8CGGEeM9liIiIoJJNozFbDcTK1T1mIj8N+BnhXDKE5iOU1cAH1HV25wa6CkaYtpAfg4oEZEDhacrgDoRWaSqr7uzLtp40S0Oj4pIjnf3M7tJVTc6tikQVPUnhS/h7cD9mF6564EfOjXMU2yuwdyxLcS0KO3jQUyc9zsObIoFvrWjx+M5CRF5HNioqt953/OfA34KTPcJ1dPDi67H4/FYxCfSPB6PxyJedD0ej8ciXnQ9Ho/HIl50PR6PxyJedD0ej8ciXnQ9Ho/HIl50PR6PxyJedD0ej8ciXnQ9Ho/HIv8f9BK4usIZwMAAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "venn3([set(akker_sub), set(shew1_sub), set(shew2_sub)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other pointers\n", "\n", "[Sourmash: a practical guide](https://sourmash.readthedocs.io/en/latest/using-sourmash-a-guide.html)\n", "\n", "[Classifying signatures taxonomically](https://sourmash.readthedocs.io/en/latest/classifying-signatures.html)\n", "\n", "[Pre-built search databases](https://sourmash.readthedocs.io/en/latest/databases.html)\n", "\n", "## A full list of notebooks\n", "\n", "[An introduction to k-mers for genome comparison and analysis](kmers-and-minhash.ipynb)\n", "\n", "[Some sourmash command line examples!](sourmash-examples.ipynb)\n", "\n", "[Working with private collections of signatures.](sourmash-collections.ipynb)\n", "\n", "[Using the LCA_Database API.](using-LCA-database-API.ipynb)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "py38arm", "language": "python", "name": "py38arm" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 2 }