Graphs
Types
PanGraph.Graphs.DelMap
— TypeDelMap = Dict{Int,Int}
A sparse array of deletion events relative to a consensus. The key is the locus (inclusive) of the deletion; the value is the length.
PanGraph.Graphs.Graph
— Typestruct Graph
block :: Dict{String, Block}
sequence :: Dict{String, Path}
end
Representation of a multiple sequence alignment. Alignments of homologous sequences are stored as blocks. A genome is stored as a path, i.e. a list of blocks.
PanGraph.Graphs.Graph
— MethodGraph(name::String, sequence::Array{UInt8}; circular=false)
Creates a singleton graph from sequence
. name
is assumed to be a unique identifier. If circular
is unspecified, the sequence is assumed to be linear.
PanGraph.Graphs.InsMap
— TypeInsMap = Dict{Tuple{Int,Int},Array{UInt8,1}}
A sparse array of insertion sequences relative to a consensus. The key is the (locus(after),offset) of the insertion; the value is the sequence.
PanGraph.Graphs.SNPMap
— TypeSNPMap = Dict{Int,UInt8}
A sparse array of single nucleotide polymorphisms relative to a consensus. The key is the locus of the mutation; the value is the modified nucleotide.
Functions
PanGraph.Graphs.check_duplicate_names
— MethodUtility function that raises an error if the list of records has entries with duplicated names. The error message contains the name in question.
PanGraph.Graphs.consistency_check
— Methodconsistency_check(G::Graph)
performs final consistency checks on the graph. Implemented checks for now are:
- check 1-1 correspondence between gaps and insertion positions in block alignments.
PanGraph.Graphs.copy_graph
— Methodcopy_graph(G::Graph)
Returns a deep copy of G
.
PanGraph.Graphs.detransitive!
— Methoddetransitive!(G::Graph)
Find and remove all transitive edges within the given graph. A transitive chain of edges is defined to be unambiguous: all sequences must enter on one edge and leave on another. Thus, this will not perform paralog splitting.
PanGraph.Graphs.finalize!
— Methodfinalize!(G::Graph)
Compute the position of the breakpoints for each homologous alignment across all sequences within Graph G
. Intended to be ran after multiple sequence alignment is complete
PanGraph.Graphs.graphs
— Methodgraphs(io::IO; circular=false)
Parse a fasta file from stream io
and return an array of singleton graphs. If circular is unspecified, all genomes are assumed to be linear.
PanGraph.Graphs.keeponly!
— Methodkeeponly!(G::Graph, names::String...)
Remove all sequences from graph G
that are passed as variadic parameters names
. This will marginalize a graph, i.e. return the subgraph that contains only isolates contained in names
PanGraph.Graphs.marshal_fasta
— Methodmarshal_fasta(io::IO, G::Graph; opt=nothing)
Serialize graph G
as a fasta format output stream io
. Importantly, this will only serialize the consensus sequences for each block and not the full multiple sequence alignment.
opt
is currently ignored. It is kept for signature uniformity for other marshal functions
PanGraph.Graphs.marshal_json
— Methodmarshal_json(io::IO, G::Graph; opt=nothing)
Serialize graph G
as a json format output stream io
. This is the main storage/exported format for PanGraph. Currently it is the only format that can reconstruct an in-memory pangraph.
opt
is currently ignored. It is kept for signature uniformity for other marshal functions
PanGraph.Graphs.prune!
— Methodprune!(G::Graph)
Remove all blocks from graph G
that are not currently used by any extant sequence. Internal function used during guide tree alignment.
PanGraph.Graphs.purge!
— Methodpurge!(G::Graph)
Remove all blocks from paths found in graph G
that have zero length. Internal function used during guide tree alignment.
PanGraph.Graphs.realign!
— Methodrealign!(G::Graph; accept)
Realign blocks contained within graph G
. Usage of this function requires MAFFT to be on the system PATH accept
should be a function that returns true on blocks you wish to realign. By default, all blocks are realigned.
PanGraph.Graphs.sequence
— Methodsequence(G::Graph, name::String)
Return the sequence corresponding to genome name
within graph G
PanGraph.Graphs.sequence
— Methodsequence(G::Graph)
Return all pairs of name
=> sequence
encoded within graph G
PanGraph.Graphs.test
— Functiontest(path)
Align all sequences found in the fasta file at path
into a pangraph. Verifies that after the alignment is complete, all sequences are correctly reconstructed
PanGraph.Graphs.unmarshal
— Methodunmarshal(io::IO)
Deserialize the json formatted input stream io
into a Graph data structure. Return a Graph
type.