Using sourmash
output with R and other languages¶
Most of the sourmash shell commands output CSV files upon request; these files have headers and are straightforward to load into R. Below are some code snippets and links that might be useful.
R code for working with compare output¶
(by Taylor Reiter)
sourmash compare
can output matrices in a CSV format, which can
easily be read into R. For example, if you download the Eschericia
signature collection as in
the sourmash tutorial,
then the shell command
sourmash compare ecoli_many_sigs/*.sig --csv ecoli.cmp.csv
will output a file ecoli.cmp.csv
that can be loaded into R like so:
sourmash_comp_matrix <- read.csv("ecoli.cmp.csv")
# Label the rows
rownames(sourmash_comp_matrix) <- colnames(sourmash_comp_matrix)
# Transform for plotting
sourmash_comp_matrix <- as.matrix(sourmash_comp_matrix)
See the output of plotting and clustering this matrix, produced by this RMarkdown file.
You can download the ecoli.cmp.csv
file itself here.