Using sourmash output with R and other languages

Most of the sourmash shell commands output CSV files upon request; these files have headers and are straightforward to load into R. Below are some code snippets and links that might be useful.

R code for working with compare output

(by Taylor Reiter)

sourmash compare can output matrices in a CSV format, which can easily be read into R. For example, if you download the Eschericia signature collection as in the sourmash tutorial, then the shell command

sourmash compare ecoli_many_sigs/*.sig --csv ecoli.cmp.csv

will output a file ecoli.cmp.csv that can be loaded into R like so:

sourmash_comp_matrix <- read.csv("ecoli.cmp.csv")

# Label the rows
rownames(sourmash_comp_matrix) <- colnames(sourmash_comp_matrix)

# Transform for plotting
sourmash_comp_matrix <- as.matrix(sourmash_comp_matrix)

See the output of plotting and clustering this matrix, produced by this RMarkdown file.

You can download the ecoli.cmp.csv file itself here.