Blocks
Types
PanGraph.Graphs.Blocks.Block
— Typemutable struct Block
uuid :: String
sequence :: Array{UInt8}
gaps :: Dict{Int,Int}
mutate :: OrderedDict{Node{Block},SNPMap}
insert :: OrderedDict{Node{Block},InsMap}
delete :: OrderedDict{Node{Block},DelMap}
end
Store a multiple sequence alignment of contiguous DNA related by homology. Use as a component of a larger, branching multiple genome alignment. uuid
is a string identifier unique to each block sequence
is the consensus (majority-rule) sequence gaps
recapitulate all locations of insertions for generating the full sequence alignment. mutate
, insert
, and delete
store polymorphisms of each genome contained within the block.
PanGraph.Graphs.Blocks.Block
— MethodBlock(sequence,gaps,mutate,insert,delete)
Construct a block with a unique uuid
.
PanGraph.Graphs.Blocks.Block
— MethodBlock(sequence,gaps)
Construct a block with a unique uuid
with fixed sequence
and gaps
. No individuals and thus polymorphisms are initialized.
PanGraph.Graphs.Blocks.Block
— MethodBlock(sequence,gaps)
Construct a block with a unique uuid
with fixed sequence
.
PanGraph.Graphs.Blocks.Block
— MethodBlock(b::Block, slice)
Return a subsequence associated to block b
at interval slice
. The returned block has a newly generated uuid
.
PanGraph.Graphs.Blocks.Block
— MethodBlock(bs::Block...)
Concatenate a variable number of blocks into one larger block. The returned block has a newly generated uuid
.
PanGraph.Graphs.Blocks.Block
— MethodBlock(sequence,gaps)
Construct a block with a unique uuid
. All fields are empty.
PanGraph.Graphs.Blocks.PairPos
— Typemutable struct PairPos
qry :: Maybe{Pos}
ref :: Maybe{Pos}
end
Representation of matched pair of intervals within a pairwise alignment. qry
can be of type Pos
or Nothing
ref
can be of type Pos
or Nothing
If either ref
or qry
is nothing, the PairPos corresponds to an insertion or deletion respectively.
PanGraph.Graphs.Blocks.Pos
— Typemutable struct Pos
start :: Int
stop :: Int
end
Representation of a single interval within a pairwise alignment. Inclusive on both ends, i.e. includes start
and stop
Used internally to unpack cigar strings.
Functions
Base.append!
— Methodappend!(b::Block, node::Node{Block}, snp::Maybe{SNPMap}, ins::Maybe{InsMap}, del::Maybe{DelMap})
Adds a new genome at node
to multiple sequence alignment block b
. Polymorphisms are optional. If nothing is passed instead, an empty dictionary will be used.
Base.length
— Methodlength(b::Block, n::Node)
Return the length of the sequence of node n
within the multiple alignment of block b
.
Base.length
— Methodlength(b::Block)
Return the length of consensus sequence of the multiple alignment of block b
.
Base.pop!
— Methodpop!(b::Block, n::Node)
Remove genome of Node n
from Block b
.
PanGraph.Graphs.Blocks.allele_positions
— Methodallele_positions(snp::SNPMap, ins::InsMap, del::DelMap)
Return an iterator over polymorphic loci, i.e. SNPs and Indels. The iterator will be sorted by position in ascending order.
PanGraph.Graphs.Blocks.allele_positions
— Methodallele_positions(b::Block, n::Node)
Return an iterator over polymorphic loci for node n
contained within block b
The iterator will be sorted by position in ascending order.
PanGraph.Graphs.Blocks.applyalleles
— Methodapplyalleles(seq::Array{UInt8}, mutate::SNPMap, insert::InsMap, delete::DelMap)
Take a sequence and apply polymorphisms, as given by mutate
, insert
, and delete
. Return the brand new allocated sequence.
PanGraph.Graphs.Blocks.assert_equals
— Methodassert_equal(b₁::Block, b₂::Block)
Throw an error in block b₁
is not equivalent to block b₂
. Useful for internal debugging.
PanGraph.Graphs.Blocks.check
— Methodcheck(b::Block)
Check whether block b
is internally self-consistent. Useful for debugging internals.
PanGraph.Graphs.Blocks.combine
— Methodcombine(qry::Block, ref::Block, aln::Alignment; minblock=500)
Take a pairwise alignment aln
from the consensus of qry
to ref
and merge both. The resultant new block, with a novel uuid is returned. Alignment aln
is a segmented set of intervals mapping homologous regions of one block into the other. Parameter minblock
is the cutoff length of an indel, above which a new block will be created.
PanGraph.Graphs.Blocks.depth
— Methoddepth(b::Block)
Return the number of genomes contained within the alignment
PanGraph.Graphs.Blocks.diversity
— Methoddiversity(b::Block)
Return the averaged fraction of loci that are mutated within the multiple sequence alignment of block b
.
PanGraph.Graphs.Blocks.partition
— Methodpartition(alignment; minblock=500)
Parse the alignment into matched intervals of a pairwise alignment. If any insertion or deletion is larger than minblock
, a new block is created to hold the homologous interval. This ensures that all blocks are at least minblock
long and no block contains an insertion or deletion longer than itself.
alignment
is assumed to be an data structure from the Utility module
PanGraph.Graphs.Blocks.reconsensus!
— Methodreconsensus!(b::Block)
Update the consensus sequence of block b
by majority-rule over the multiple sequence alignment.
PanGraph.Graphs.Blocks.regap!
— Methodregap!(b::Block)
Recompute the positions of gaps within the multiple sequence of block b
PanGraph.Graphs.Blocks.rereference
— Methodrerefence(qry::Block, ref::Block, aligment)
Take a pairwise alignment segments
from the consensus of qry
to ref
and rereference all polymorphisms of qry
to the consensus sequence of `ref. Low-level function used by higher-level API.
PanGraph.Graphs.Blocks.swap!
— Methodswap!(b::Block, oldkey::Array{Node{Block}}, newkey::Node{Block})
Remove all polymorphisms associated to all keys within oldkey
. Concatenate and reassociate them to newkey
.
PanGraph.Graphs.Blocks.swap!
— Methodswap!(b::Block, oldkey::Node{Block}, newkey::Node{Block})
Remove all polymorphisms associated to oldkey
and reassociate them to newkey
.
PanGraph.Graphs.marshal_fasta
— Methodmarshal_fasta(io::IO, b::Block; opt=nothing)
Serialize the multiple sequence alignment of block b
to a fasta format to IO stream io
. Each sequence will be serialized as-is, i.e. with no gaps.
If opt
is not nothing
, the output will be an aligned fasta file. Futhermore, opt is interpreted as a function to be called per internal node
that gives a unique name for each fasta record that is generated per node
.
PanGraph.Graphs.reverse_complement
— Methodreverse_complement(b::Block; keepid=false)
Return the reverse complement of the multiple sequence alignment within Block b
. By default, will return a block with a new uuid
, unless keepid is set to true
.
PanGraph.Graphs.sequence!
— Methodsequence!(seq, b::Block, node::Node{Block}; gaps=false)
Mutate the sequence buffer seq
in place to hold the sequence associated to genome node
within sequence alignment of block b
. By default, gaps (charater '-') will not be returned, unless gaps
is set to true
. Return the sequence with gap characters to generate the full sequence alignment.
PanGraph.Graphs.sequence
— Methodsequence(seq, b::Block, node::Node{Block}; gaps=false, forward=false)
Return the sequence associated to genome node
within sequence alignment of block b
. By default, gaps (charater '-') will not be returned, unless gaps
is set to true
. Return the sequence with gap characters can be used to generate the full sequence alignment. If forward
is true, the true orientation of the genome is ignored and will be returned to align to the forward consensus.
PanGraph.Graphs.sequence
— Methodsequence(b::Block; gaps=false)
Return the consensus of the multiple sequence alignment within block b
. By default, gaps (charater '-') will not be returned, unless gaps
is set to true
. Return the consensus alignment with gaps is useful for generating the full sequence alignment.