We do our best to support sourmash users! Users have found important bugs, and some of our best features have come from user requests. Please help us improve sourmash for everyone by asking questions as you have them!
Please ask questions and file bug descriptions on the GitHub issue tracker for sourmash, sourmash-bio/sourmash/issues.
You can also ask questions of Titus on Twitter at @ctitusbrown.
We do our best to guarantee stability of features and APIs within major versions - because of this, upgrading from (e.g.) sourmash v3.4 to sourmash v3.5 should be a simple matter of installing the new version.
We also recommend using version pinning for software and workflows
that depend on sourmash, e.g. specifying
sourmash >=3,<4 for
software that is tested with sourmash 3.x. Read on for details!
Upgrading major versions (to sourmash 4.0, for example) will often involve more work; see the next section for more our suggested process.
Our goal is to support the use of sourmash in pipelines and
applications by communicating clearly about bug fixes, feature
additions, and feature changes in sourmash. Versions are tagged in a
vMAJOR.MINOR.PATCH format, following the [Semantic Versioning]
convention. From their definition:
“Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards compatible manner, and
PATCH version when you make backwards compatible bug fixes.”
So, for example,
Major releases, like v4.0.0, may break backwards compatibility at the command line as well as top-level Python/Rust APIs.
Minor releases, like v4.1.0, will remain backwards compatible but may introduce significant new features.
Patch releases, like v4.1.1, are for minor bug fixes; full backwards compatibility is retained.
We do sometimes (rarely!) alter behavior in minor versions by fixing bugs; this will be documented in release notes.
For software and workflows that depend on sourmash, we recommend pinning versions to the current major release of sourmash.
For example, with Python toolchains such as pip, you should be able to use:
to pin the version requirement to any sourmash v3.x release.
For conda, the same syntax should work.
We intend that all command-line commands, command-line options, input formats, and output formats will be fully backwards compatible within major versions. That is, you should never see old behavior change when you upgrade within a major sourmash release (barring bug fixes!). Moreover, if you rely on a feature introduced in v3.3.0, that feature will not break in v3.4.0, but will also not be backported to version 3.2.0.
We intend to guarantee the Python API at the top level, i.e.
functions and classes available from the
sourmash top-level module
will be stable within major versions.
The latest minor release (e.g. v3.5) before a new major release (v4.0) will contain deprecations for all top-level API changes at the time of the first major release. See below for our suggested migration procedure.
sourmash v3.x supports Python 2.7 as well as Python 3.x, through Python 3.8.
sourmash v4.0 dropped support for versions of Python before Python 3.7, and our intent is that it will support as-yet unreleased versions of Python 3.x (e.g. 3.10) moving forward.
For future versions of sourmash, we plan to follow the Scientific Python SPEC 0 proposal for Python version support. For example, this means that we dropped support for Python 3.9 on October 10, 2023.
The Rust API is not yet at 1.0 and should not be regarded as stable.
If you depend on sourmash, we recommend using the following process:
pin sourmash to the major version you developed against, e.g.
when ready to upgrade sourmash, upgrade to the latest minor release within that major version (e.g. sourmash 3.5.x).
scan for deprecations that affect you, check the release notes, and fix any major issues noted.
upgrade to the next major version (e.g. sourmash 4.0) and run your integration tests or workflow.
fix outstanding issues.
In particular, we recommend upgrading major versions of sourmash in isolation, without adding any new features to your software.
If you want to upgrade workflows and scripts from prior releases of sourmash to sourmash v4.0, we suggest doing this in two stages.
First, upgrade to the latest version of sourmash 3.5.x (currently v3.5.1), which is compatible with all files and command lines used in previous versions of sourmash (v2.x and v3.x). After upgrading to 3.5.x, scan the sourmash output for deprecation warnings and fix those.
Next, upgrade to the latest version of 4.x, which will introduce some backwards incompatibilities based upon the deprecation warnings.
The major changes are detailed below; please see the full release notes for 4.0 for all the details and links to the code changes.
If you use sourmash from the command line, there are a few major changes in 4.0 that you should know about.
sourmash compute is deprecated in favor of
sourmash sketch, which provides quite a bit more flexibility in creating signatures.
sourmash index will now save databases in the Zip format (
.sbt.zip) instead of the old JSON+subdirectory format (see updated docs). You can revert to the old behavior by explicitly specifying the
.sbt.json filename for output when running
Third, all sourmash commands that operate on signatures should now be able to directly read from lists of signatures in signature files, SBT databases, LCA databases, directories, and files containing lists of filenames (see updated docs).
Fourth, if you use
sourmash lca commands,
sourmash lca gather has been removed. In addition, there are some changes in how
summarize works: it now uses abundances by default, and no longer combines all signatures before summarizing. Specify
--ignore-abundance and combine your signatures using
sourmash sig merge to recover the old behavior. Note also that
lca summarize now includes a new column,
filename, in the CSV output.
Finally, k-mer sizes have changed for amino acid sequences in v4. If you use protein, Dayhoff, or HP signatures, we now interpret k-mer sizes differently on the command line. Briefly, k-mer sizes for protein/dayhoff/hp signatures are now the size of the k-mer in amino acid space, not the space of the k-mer in DNA space (as previously used). In practice this means that you need to divide all your old k-mer sizes by 3 when working with k-mers in amino acid space!
Note also that while
sourmash compute still behaves the same way in v4.x as it did in sourmash 3.5.x,
sourmash sketch translate and
sourmash sketch protein both use the new approach to amino acid k-mer sizes, as do all of the the command line options for searching, manipulation, and display. Again, in practice this means that you need to divide all your old k-mer sizes by 3 if they apply to amino acid k-mers.
There are several minor changes where error messages should occur appropriately:
--traverse-directoryis no longer needed on the command line for
sourmash indexor other functions; directory traversal happens automatically.
the command lines for
sourmash lca indexno longer require signature files to be specified, which can break existing command lines. To fix this, reorder arguments so that any signatures are specified at the end of the command line.
First, all k-mer sizes for
hp signatures have changed in the Python layer to be “correct”, i.e., to be the size of the protein k-mer. Previously they were 3*k, i.e. based on the size of the DNA k-mer from which the protein sequence would have been created.
MinHash class API has changed significantly!
get_mins()has been deprecated in favor of
.hashes, which is a dictionary that contains abundances.
mergenow just modifies
MinHashobjects in-place, and no longer returns the merged object; use
+=) for the old behavior, or
+) to create a new merged object.
max_hashhas been deprecated in favor of
is_molecule_typehas been replaced with a property,
moltype– instead of
moltype == t.
SourmashSignature objects no longer have a
name() method but instead a
name property, which can be assigned to. This property is now
None when no name has been assigned. Note that
str(sig) should now be used to retrieve a display name, and should replace all previous uses of
Fourth, a few top-level functions have been deprecated:
load_one_signature(...)should be replaced with
load_file_as_signatures(...). Note there is currently no top-level way to load signatures from strings. For now, if you need that functionality, you can use
sourmash.signature.load_one_signature(...), but please be aware that these are not considered part of the public API that is under semantic versioning, so they may change in the next minor point release; this is tracked in https://github.com/sourmash-bio/sourmash/issues/1312.
load_sbt_index(...)have been deprecated. Please use
create_sbt_index(...)has been deprecated. There is currently no replacement, although you can use it directly from
Fifth, directory traversal now happens by default when loading signatures, so remove
traverse=True arguments to several functions in
Please post questions and concerns to the sourmash issue tracker and we’ll be happy to help!