You can get the latest development master branch with:
git clone https://github.com/dib-lab/sourmash.git
sourmash runs under both Python 2.7.x and Python 3.5+. The base
requirements are screed and ijson, together with a Rust environment (for the
extension code). We suggest using
rustup to install the Rust environment:
curl https://sh.rustup.rs -sSf | sh
To install all of the necessary Python dependencies, do:
pip install -r requirements.txt
Briefly, we use
cargo test for testing, and
coverage for code
We suggest working on sourmash in a virtualenv; e.g. from within the sourmash clone directory, you can do:
python -m virtualenv dev . dev/bin/activate pip install -e .
You can run tests by invoking
make test in the sourmash directory;
python -m pytest will run the Python tests, and
cargo test will
run the Rust tests.
If you’re having trouble installing or using the development environment¶
If you are getting an error that contains
ImportError: cannot import name 'to_bytes' from 'sourmash._minhash', then it’s likely you need to update Rust and clean up your environment. Some installation issues can be solved by simply removing the intermediate build files with:
Automated tests and code coverage calculation¶
Code coverage can be viewed interactively at codecov.io.
There are three main components in the sourmash repo:
Python module (in
The command-line interface (in
The Rust core library (in
setup.py has all the configuration to prepare a Python package containing these three components.
First it compiles the Rust core component into a shared library,
which is wrapped by CFFI and exposed to the Python module.
A short description of the high-level files and dirs in the sourmash repo:
. ├── benchmarks/ | Benchmarks for the Python module ├── binder/ | mybinder.org configuration ├── data/ | data used for demos ├── doc/ | the documentation rendered in sourmash.bio ├── include/ | C/C++ header files for using core library ├── sourmash/ | The Python module and CLI code ├── sourmash_lib/ | DEPRECATED: previous name of the Python module ├── src/ | │ └── core | Code for the core library (Rust) ├── tests/ | Tests for the Python module and CLI ├── utils/ | ├── asv.conf.json | benchmarking config file (for ASV) ├── Cargo.toml | Rust definition for a workspace ├── CITATION.cff | Citation info ├── codemeta.json | Metadata for software discovery ├── CODE_OF_CONDUCT.rst | Code of conduct ├── CONTRIBUTING.md | Instruction for contributing to development ├── LICENSE | License for the repo ├── Makefile | Entry point for most development tasks ├── MANIFEST.in | Describes what files to add to the Python package ├── matplotlibrc | Configuration for matplotlib ├── netlify.toml | Configuration for netlify (build docs for preview) ├── paper.bib | References in the JOSS paper ├── paper.md | JOSS paper content ├── pytest.ini | pytest configuration ├── README.md | Info to get started ├── requirements.txt | Python dependencies for development ├── setup.py | Python package definition └── tox.ini | Configuration for test automation
The Python module (and CLI)¶
sourmash ├── cli/ | Command-line parsing, help messages and overall infrastucture ├── command_compute.py | compute command implementation ├── commands.py | implementation for other CLI commands ├── compare.py | Signature comparison functions ├── _compat.py | Py2/3 compatibility functions ├── exceptions.py | Mapping from core library errors to Python exceptions ├── fig.py | Plotting functions ├── index.py | Index base class and definitions ├── lca/ | LCA index and utility functions ├── logging.py | Logging functions (notify, error, set_quiet) ├── __main__.py | Entry point for the CLI ├── _minhash.py | MinHash sketch implementation (calls the core library) ├── np_utils.py | NumPy utils ├── sbt*.py | SBT implementation ├── search.py | search functions for indices (search, gather) ├── sig | signature manipulation functions │ └── __main__.py | implementation for `sourmash sig` commands ├── signature_json.py | signature parsing code (to/from JSON) ├── signature.py | signature class and methods ├── sourmash_args.py | convenient shortcuts for CLI usage └── utils.py | Convenience functions to interact with core library
The Rust core library¶
This is completely defined in
src/core to avoid mixing with the code of other components
(and trying to make it easier to reason about changes).
If you’re only working on the core,
you don’t need to change any files outside this directory.
src/core ├── benches/ | Benchmarks for the core library ├── Cargo.toml | Crate definition and metadata ├── cbindgen.toml | Configuration for cbindgen (the C header generator) ├── examples/ | Examples using the crate API ├── README.md | Containing links to CI, docs and general info about crate. ├── src | │ ├── cmd.rs | High-level commands (search, index, compute...) │ ├── errors.rs | All the errors generated by this crate │ ├── ffi/ | FFI-related functions. They are exported to a C header by cbindgen. │ ├── from.rs | Conversion methods for other crates │ ├── index/ | Index methods. An index is a collection of signatures, optimized for searching. │ ├── lib.rs | Entry point for the library, control the exposed public API. │ ├── signature.rs | Signature methods. A signature is a collection of sketches. │ ├── sketch/ | Sketch methods. A sketch is compressed representation of data. │ └── wasm.rs | Webassembly API. └── tests/ | Integration tests (using the public API of the crate)
Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes, MINOR version when you add functionality in a backwards compatible manner, and PATCH version when you make backwards compatible bug fixes.
For the Rust core library we use
(note it starts with
r, and not
The Rust version is not automated,
and must be bumped in