Blocks

Types

PanGraph.Graphs.Blocks.BlockType
mutable struct Block
    uuid     :: String
    sequence :: Array{UInt8}
    gaps     :: Dict{Int,Int}
    mutate   :: OrderedDict{Node{Block},SNPMap}
    insert   :: OrderedDict{Node{Block},InsMap}
    delete   :: OrderedDict{Node{Block},DelMap}
end

Store a multiple sequence alignment of contiguous DNA related by homology. Use as a component of a larger, branching multiple genome alignment. uuid is a string identifier unique to each block sequence is the consensus (majority-rule) sequence gaps recapitulate all locations of insertions for generating the full sequence alignment. mutate, insert, and delete store polymorphisms of each genome contained within the block.

source
PanGraph.Graphs.Blocks.BlockMethod
Block(sequence,gaps)

Construct a block with a unique uuid with fixed sequence and gaps. No individuals and thus polymorphisms are initialized.

source
PanGraph.Graphs.Blocks.BlockMethod
Block(b::Block, slice)

Return a subsequence associated to block b at interval slice. The returned block has a newly generated uuid.

source
PanGraph.Graphs.Blocks.BlockMethod
Block(bs::Block...)

Concatenate a variable number of blocks into one larger block. The returned block has a newly generated uuid.

source
PanGraph.Graphs.Blocks.PairPosType
mutable struct PairPos
    qry :: Maybe{Pos}
    ref :: Maybe{Pos}
end

Representation of matched pair of intervals within a pairwise alignment. qry can be of type Pos or Nothing ref can be of type Pos or Nothing If either ref or qry is nothing, the PairPos corresponds to an insertion or deletion respectively.

source
PanGraph.Graphs.Blocks.PosType
mutable struct Pos
    start :: Int
    stop  :: Int
end

Representation of a single interval within a pairwise alignment. Inclusive on both ends, i.e. includes start and stop Used internally to unpack cigar strings.

source

Functions

Base.append!Method
append!(b::Block, node::Node{Block}, snp::Maybe{SNPMap}, ins::Maybe{InsMap}, del::Maybe{DelMap})

Adds a new genome at node to multiple sequence alignment block b. Polymorphisms are optional. If nothing is passed instead, an empty dictionary will be used.

source
Base.lengthMethod
length(b::Block, n::Node)

Return the length of the sequence of node n within the multiple alignment of block b.

source
Base.lengthMethod
length(b::Block)

Return the length of consensus sequence of the multiple alignment of block b.

source
Base.pop!Method
pop!(b::Block, n::Node)

Remove genome of Node n from Block b.

source
PanGraph.Graphs.Blocks.allele_positionsMethod
allele_positions(snp::SNPMap, ins::InsMap, del::DelMap)

Return an iterator over polymorphic loci, i.e. SNPs and Indels. The iterator will be sorted by position in ascending order.

source
PanGraph.Graphs.Blocks.allele_positionsMethod
allele_positions(b::Block, n::Node)

Return an iterator over polymorphic loci for node n contained within block b The iterator will be sorted by position in ascending order.

source
PanGraph.Graphs.Blocks.applyallelesMethod
applyalleles(seq::Array{UInt8}, mutate::SNPMap, insert::InsMap, delete::DelMap)

Take a sequence and apply polymorphisms, as given by mutate, insert, and delete. Return the brand new allocated sequence.

source
PanGraph.Graphs.Blocks.combineMethod
combine(qry::Block, ref::Block, aln::Alignment; minblock=500)

Take a pairwise alignment aln from the consensus of qry to ref and merge both. The resultant new block, with a novel uuid is returned. Alignment aln is a segmented set of intervals mapping homologous regions of one block into the other. Parameter minblock is the cutoff length of an indel, above which a new block will be created.

source
PanGraph.Graphs.Blocks.partitionMethod
partition(alignment; minblock=500)

Parse the alignment into matched intervals of a pairwise alignment. If any insertion or deletion is larger than minblock, a new block is created to hold the homologous interval. This ensures that all blocks are at least minblock long and no block contains an insertion or deletion longer than itself.

alignment is assumed to be an data structure from the Utility module

source
PanGraph.Graphs.Blocks.rereferenceMethod
rerefence(qry::Block, ref::Block, aligment)

Take a pairwise alignment segments from the consensus of qry to ref and rereference all polymorphisms of qry to the consensus sequence of `ref. Low-level function used by higher-level API.

source
PanGraph.Graphs.Blocks.swap!Method
swap!(b::Block, oldkey::Array{Node{Block}}, newkey::Node{Block})

Remove all polymorphisms associated to all keys within oldkey. Concatenate and reassociate them to newkey.

source
PanGraph.Graphs.Blocks.swap!Method
swap!(b::Block, oldkey::Node{Block}, newkey::Node{Block})

Remove all polymorphisms associated to oldkey and reassociate them to newkey.

source
PanGraph.Graphs.marshal_fastaMethod
marshal_fasta(io::IO, b::Block; opt=nothing)

Serialize the multiple sequence alignment of block b to a fasta format to IO stream io. Each sequence will be serialized as-is, i.e. with no gaps.

If opt is not nothing, the output will be an aligned fasta file. Futhermore, opt is interpreted as a function to be called per internal node that gives a unique name for each fasta record that is generated per node.

source
PanGraph.Graphs.reverse_complementMethod
reverse_complement(b::Block; keepid=false)

Return the reverse complement of the multiple sequence alignment within Block b. By default, will return a block with a new uuid, unless keepid is set to true.

source
PanGraph.Graphs.sequence!Method
sequence!(seq, b::Block, node::Node{Block}; gaps=false)

Mutate the sequence buffer seq in place to hold the sequence associated to genome node within sequence alignment of block b. By default, gaps (charater '-') will not be returned, unless gaps is set to true. Return the sequence with gap characters to generate the full sequence alignment.

source
PanGraph.Graphs.sequenceMethod
sequence(seq, b::Block, node::Node{Block}; gaps=false, forward=false)

Return the sequence associated to genome node within sequence alignment of block b. By default, gaps (charater '-') will not be returned, unless gaps is set to true. Return the sequence with gap characters can be used to generate the full sequence alignment. If forward is true, the true orientation of the genome is ignored and will be returned to align to the forward consensus.

source
PanGraph.Graphs.sequenceMethod
sequence(b::Block; gaps=false)

Return the consensus of the multiple sequence alignment within block b. By default, gaps (charater '-') will not be returned, unless gaps is set to true. Return the consensus alignment with gaps is useful for generating the full sequence alignment.

source