Mash Implementation

Types

PanGraph.Graphs.Mash.MinimizerType
struct Minimizer
    value    :: UInt64
    position :: UInt64
end

A minimizer is a kmer that, given a hash function that maps kmers to integers, is the minimum kmer within a given set of kmers. The value is the result of applying the hash function to the kmer. The position is a bitpacked integer that includes reference ID, locus, and strand

source

Functions

PanGraph.Graphs.Mash.distanceMethod
distance(graphs...; k=15, w=100)

Compute the pairwise distance between all input graphs. Distance is the set distance between minimizers. Linear-time algorithm using hash collisions.

source
PanGraph.Graphs.Mash.hashMethod
hash(x::UInt64, mask::UInt64)

A transliteration of Jenkin's invertible hash function for 64 bit integers. Bijectively maps any kmer to an integer.

source
PanGraph.Graphs.Mash.sketchMethod
sketch(seq::Array{UInt8}, k::Int, w::Int, id::Int)

Sketch a linear sequence into a vector of minimizers. k sets the kmer size. w sets the number of contiguous kmers that will be used in the window minimizer comparison. id is a unique integer that corresponds to the sequence. It will be bitpacked into the minimizer position.

source