Mash Implementation
Types
PanGraph.Graphs.Mash.Minimizer
— Typestruct Minimizer
value :: UInt64
position :: UInt64
end
A minimizer is a kmer that, given a hash function that maps kmers to integers, is the minimum kmer within a given set of kmers. The value is the result of applying the hash function to the kmer. The position is a bitpacked integer that includes reference ID, locus, and strand
Functions
PanGraph.Graphs.Mash.distance
— Methoddistance(graphs...; k=15, w=100)
Compute the pairwise distance between all input graphs. Distance is the set distance between minimizers. Linear-time algorithm using hash collisions.
PanGraph.Graphs.Mash.hash
— Methodhash(x::UInt64, mask::UInt64)
A transliteration of Jenkin's invertible hash function for 64 bit integers. Bijectively maps any kmer to an integer.
PanGraph.Graphs.Mash.sketch
— Methodsketch(seq::Array{UInt8}, k::Int, w::Int, id::Int)
Sketch a linear sequence into a vector of minimizers. k
sets the kmer size. w
sets the number of contiguous kmers that will be used in the window minimizer comparison. id
is a unique integer that corresponds to the sequence. It will be bitpacked into the minimizer position.