scripts.trees package¶
scripts.trees.build_treebest_trees module¶
Script to build starting gene trees with TreeBeST best, from CDS back translated nucleotide alignments, given a species tree and a gene species mapping file.
Example:
$ python -m build_treebest_trees -a alis_v89.fa.gz -sp species_tree_v89.nwk
-m genesp_v89.txt [-o treebest_forest_v89.nhx] [-nc 1] [-tmp tmp]
-
scripts.trees.build_treebest_trees.
init_worker
()¶
-
scripts.trees.build_treebest_trees.
worker_build_tree
(ali, genes_sp, sptree, ali_id, tmp_folder='', X=10)¶ Build a gene tree from the multiple alignment string in ali, while accounting for the species tree sptree, using treebest best.
If the output tree file already exists, the file will not be updated. This allows to re- execute a SCORPiOs snakemake run without recomputing all trees in case of error.
- Parameters
ali (str) – the fasta multiple alignment
genes_sp (str) – the corresponding genes to species mapping
sptree (str) – path to the newick species tree
ali_id (str) – identifier of the tree, used in the output .nhx file name.
tmp_folder (str) – path to temp individual ali, will store temp individual tree.
X (int, optional) – -X parameter for treebest best (default=10).
- Returns
True if no Exception was raised.
- Return type
bool
scripts.trees.convert_ids module¶
Script to convert gene IDs in the trees and alignment files to shorter IDs. This will allow the alignment to be converted to the phylip format so that phyml can be run with correct input formats (trees and ali). Converted output filenames are input filenames prefixed with tmp_.
Example:
$ python -m scripts.trees.convert_ids -t gene_tree1.nh gene_tree2.nh -a ali.fa
-
scripts.trees.convert_ids.
convert_ali
(fastafile, output, d_conv)¶ Converts gene IDs in an input multiple gene alignment in fasta format. The conversion dictionary must be given.
- Parameters
fastafile (file) – input tree in newick format.
output (str) – name for the output file.
d_conv (dict) – Conversion from old to new IDs.
-
scripts.trees.convert_ids.
convert_tree
(treefile, output, d_conv=None, text='')¶ Converts gene IDs in an input tree. A conversion dictionary can be given, otherwise it is generated.
- Parameters
treefile (file) – input tree in newick format.
output (str) – name for the output file.
d_conv (dict, optional) – Conversion from old to new IDs.
text (str, optional) – Debug information
- Returns
Conversion old to new IDs.
- Return type
dict
scripts.trees.cut_subtrees module¶
From a synteny-derived constrained tree topology, extract genes together in an orthogroup and their sequence alignment, for treebest phyml independent resolution of each orthogroup.
Example:
$ python -m scripts.trees.cut_subtrees -t ctree.nh -a ali.fa -og outgr_gene_name
-oa outali -ot outtree
-
scripts.trees.cut_subtrees.
get_orthogroups_genes
(ctree, outgr_gene_name)¶ Finds the two polytomies in the constrained tree topology.
- Parameters
ctree (str) – input tree file in newick format.
outgr_gene_name (str) – gene name of the outgroup gene.
- Returns
the 1 or 2 polytomy node(s) and their corresponding size. str: full outgroup gene name (with species tag)
- Return type
dict
-
scripts.trees.cut_subtrees.
write_resolved_tree
(orthog_tree, outgr_gene_name, out)¶ Writes solution trees for orthogroup with only 2 genes.
- Parameters
tree (orthogroup) – Node with the 2 descendants of the orthogroup.
outgr_gene_name (str) – full outgroup gene name (with species tag).
outfile (str) – filename to write the tree.
scripts.trees.genetree module¶
Module with functions to work with a gene tree.
Adds leaves attribute stored in a dict as .nhx tags (i.e as features in the ete3.Tree object). Leaf names in dict can be with or without species tags, but leaves should have name and S attributes.
- Parameters
wtree (ete3.Tree) – Tree for which to add new leaves attributes (modified in-place)
features (dict) – A dictionary giving, for each leaves, the pairs (attribute, value) to add.
- Returns
the list of names of all added features
- Return type
list
-
scripts.trees.genetree.
branch_length_closest
(tree, gene, group_of_genes)¶ Finds the gene closest to gene in tree amongst a group_of_genes, i.e with the shortest branch length.
- Parameters
tree (ete3.Tree) – input tree
gene (ete3.TreeNode) – gene for which to search a neighbour
group_of_genes (list of str) – list of candidate neighbour genes
- Returns
name of the closest neighbour, in terms of branch lengths
- Return type
str
-
scripts.trees.genetree.
closest_gene_in_tree
(tree, node, group_of_genes, attr='name')¶ Finds the gene closest to gene in tree amongst a group_of_genes, i.e the gene with (i) the shortest topological distance to gene and (ii) the shortest branch-length distance in case of ties in (i).
- Parameters
tree (ete3.Tree) – input tree
gene (ete3.TreeNode) – gene for which to search for closest neighbour
group_of_genes (list of str) – list of candidate neighbour genes
attr (str, optional) – name of the attriute storing gene names
- Returns
name of closest neighbour
- Return type
str
Copies nhx tags stored in leaves of tree1 to leaves of tree2. tree2 is modified in-place
- Parameters
tree_ref_tags (ete3.Tree) – tree with nhx tags to copy
tree_target (ete3.Tree) – tree to copy tags to
-
scripts.trees.genetree.
find_node_with_most_desc
(tree1, corrected_leaves)¶ Finds the node in tree1 with all its descending leaves in corrected_leaves and the maximum number of descending leaves.
- Parameters
tree1 (ete3.Tree) – input tree
corrected_leaves (list of str) – list of leaves name to maximize in descendants
- Returns
the identified node
- Return type
ete3.TreeNode
-
scripts.trees.genetree.
find_sister_of_outgroup
(leaf_outgr, authorized_sp, sister_outgroup_genes)¶ Extracts a list of genes related to an outgroup gene leaf_outgr: genes belonging to related species authorized_sp and grouped together in the gene tree.
- Parameters
leaf_outgr (ete3 TreeNode) – node of the outgroup gene in the tree
authorized_sp (list of str) – list of related species
sister_outgroup_genes (list of str) – list of related genes (to update in-place)
-
scripts.trees.genetree.
get_solution_subtree
(corr_dict, subtree_name)¶ Loads a SCORPiOs-corrected subtree.
- Parameters
corr_dict (dict) – SCORPiOs-corrected subtrees
subtree_name (str) – name of the corrected subtree to load
- Returns
gene tree in an ete3.Tree object
- Return type
ete3.Tree
-
scripts.trees.genetree.
keep_sis_genes_together
(duplicated_sp_subtree, outgr, sister_outgroup_genes, outgroup_subtree, node_max='node_max')¶ Keeps genes of all outgroup species together when modifying a gene tree, so that the new tree remains species tree consistent for these species that branch between the outgroup and duplicated species.
- Parameters
duplicated_sp_subtree (ete3.Tree) – Synteny-corrected subtree
outgr (str) – name of the non-duplicated outgroup gene
sister_outgroup_genes (list of str) – genes that are grouped with the outgroup gene in the original tree and in related species
outgroup_subtree (ete3.Tree) – subtree with only outgroup and related genes
node_max (str, optional) – internal node name in the outgroup subtree where to paste the duplicated species subtree. If empty a new tree combining both is created.
- Returns
a new tree where the outgroup gene in the synteny-corrected is replaced by the subtree of all outgroup genes
- Return type
ete3.Tree
-
scripts.trees.genetree.
keep_subsequent_wgd_species
(stree, ensembl_tree, missing_leaves_keep, sp_current_wgd, authorized_sp)¶ When re-grafting a subtree corrected for species descending from ‘WGD1’ only, keep positions of species with subsequent WGDs consistent in the tree. To do so, find the closest ‘WGD1-only’ species gene in ensembl tree and keep subsequently duplicated species genes at the same position (relative to it).
Modifies stree in-place.
- Parameters
stree (ete3.Tree) – Tree object for the synteny corrected tree of WGD1
ensembl_tree (ete3.Tree) – Tree object for the full original gene tree
missing_leaves_keep (list of ete3.TreeNode) – Genes of subsequently duplicated species
sp_current_wgd (list of str) – List of WGD1 duplicated species
authorized_sp (dict) – Dict used to keep the tree consistent with the species tree. For a ‘WGD1’ species, a list of WGD1 species that are closer to it than are 4R species.
-
scripts.trees.genetree.
load_corrections
(files)¶ Gets the name and path of SCORPiOs corrected subtrees (i.e accepted by AU-tests).
- Parameters
files (str) – Comma-delimited list of files with accepted corrections.
- Returns
for each corrected subtree, the path to the tree file and the name of the corrected WGD.
- Return type
dict
Stores node features in a dictionary. This dictionary can be used to add .nhx tags to a tree.
- Parameters
attribute_names (list of str) – list of names of features to add
groups_of_leaves (nested list) – For each feature, sub-lists of nodes to annotate with the same feature value, the value being the index of the sub-list.
- Returns
For each node, the attribute name and its associated value.
- Return type
dict
Example
if attribute_names is [‘a’, ‘b’] and groups_of_leaves is [[[‘gene1’, ‘gene2’], [‘gene3’]], [‘gene4’]], the function returns: features = { ‘gene1’: [(‘a’, 1)], ‘gene2’: [(‘a’, 1)], ‘gene3’: [(‘a’, 2)], ‘gene4’: [(‘b’, 1)] }
scripts.trees.inconsistent_trees module¶
Script to load orthogroups defined in the synteny analysis, transform them into a constrained tree and find trees that are inconsistent with the constraints. These constrained trees will be saved to file, along with the corresponding original subtree and sub-alignment for later correction purposes.
Example:
$ python -m scripts.trees.inconsistent_trees -i GraphsOrthogroups -t forest_v89.nhx
-a alis_v89.fa -n Lepisosteus.oculatus -f OrthoTable [-oc ctrees] [-oa subalis]
[-ot subtrees] [-gs GraphsCutSummary] [-s outsummary] [-wgd ''] [-fcombin out]
-
class
scripts.trees.inconsistent_trees.
FamilyOrthologies
¶ Bases:
object
FamilyOrthologies object containing the outgroup gene, the genes in each orthogroup, all genes in the family in the orthologytable, and the corresponding constrained gene tree topology.
-
is_multigenic
()¶ Filters multigenic subtrees, where more duplications than just the 3R duplication is involved. These families are often full of errors in original gene trees and difficult to solve.
- Returns
Is the subtree multigenic (True) or not (False)
- Return type
bool
-
to_constrained_tree
()¶ Transforms the orthogroups + outgroup into a constrained topology, represented by an ete3.Tree object.
-
update_constrained_tree
(leaves_to_place, ensembl_tree)¶ Adds, to the constrained tree, leaves that are under the lca in the original subtree and were predicted to be in the family (orthotable). These can be, for instance, genes of lowcov species that were discarded from the synteny analysis. They will be placed in the same orthogroup as its closest neighbour in the original ensembl tree.
- Parameters
leaves_to_place (list) – list of the name of genes to add to the ctree.
ensembl_tree (ete3 Tree) – original gene tree.
-
update_orthologies
(outgroup_gene, orthogroup)¶ Adds genes of one orthogroup. a’s and b’s are arbitrary.
- Parameters
outgroup_gene (str) – name of the outgroup gene
orthogroup (list) – list of names of genes in one orthogroup
-
-
scripts.trees.inconsistent_trees.
get_inconsistent_trees
(tree, ali, outgroups, all_families, sfile, octr, otr, oal, stats=None, discard_sp=None, no_ctree=False)¶ For a given ensembl tree, check whether synteny-derived constrained topologies are consistent with it. If not, the corresponding constrained trees, ensembl subtrees and ensembl sub-alignments will be saved to file.
- Parameters
tree (str) – ensembl tree in newick format
ali (str) – ensembl ali in fasta format
outgroups (list) – list of outgroup species used in the synteny-analysis
all_families (dict of OrthologyFamily instances) – for each outgroup genes (key) an
instance (OrthologyFamily) –
cfile (str) – file to write name of synteny consistent subtrees
mfile (str) – file to write name of multigenic subtrees
stats (dict, optional) – dict to count the number of consistent and inconsistent trees
-
scripts.trees.inconsistent_trees.
load_pred_file
(input_file, outgr, d_orthotable)¶ Loads predicted orthogroups after community detection in graphs. Stores everything in a FamilyOrthologies object.
- Parameters
input_file (str) – Input filename
outgr (str) – Name of the corresponding outgroup species
d_orthotable (dict) – Orthologytable with outgroup-duplicated species families definition
- Returns
For each outgroup gene (key), orthogroups in a FamilyOrthologies object (value).
- Return type
dict
-
scripts.trees.inconsistent_trees.
print_out_stats
(stats_dict, wgd='')¶ Prints to stdout some statistics on the number of synteny consistent subtrees.
- Parameters
stats_dict (dict) – a dict counting the number of consistent and inconsistent subtrees
wgd (str, optional) – the wgd for which the synteny graphs were computed
scripts.trees.iteration_nhx_tags module¶
This scripts allows to recover correction tags for trees that have been corrected multiple times during iterative correction (currently .nhx tags are wiped out if a same tree is corrected again) Optionally also adds tag to internal corrected nodes that are corrected subtrees.
Example:
$ python -m scripts.trees.iteration_nhx_tags -i 5
-c SCORPiOs_example/corrected_forest_%d.nhx [-o out.nhx] [--internal]
Search for the presence of .nhx tags for leaves below the node node.
- Parameters
node (ete3 TreeNode) – the input node
tags_corr (list of str) – list of tags to search for
- Returns
True if at least one of the input tags_corr is in leaves below node
- Return type
bool
scripts.trees.make_tree_images module¶
This script generates figures of SCORPiOs corrected trees and their uncorrected counterpart. It is specifically designed to visualize SCORPiOs subtree corrections. Therefore, it assumes that inputs are SCORPiOs-generated files, with SCORPiOs file naming and format conventions.
Leaves of each SCORPiOs-corrected wgd subtree are printed with the same color in both original and
corrected tree. Optionally, --show_moved
also assigns a matching lighter color to leaves of
non-wgd species that have been rearranged to reinsert the wgd subtree.
Input can either be a list of files, in any order, containing any number of corrected/uncorrected tree pairs, or a directory. The name of the corrected wgd and of the outgroups used should also be provided.
Examples:
$ python scripts/trees/make_tree_images.py -i SCORPiOs_example/Corrections/tmp_whole_trees_0/cor_27 SCORPiOs_example/Corrections/tmp_whole_trees_0/ori_27 --wgd Salmonidae --outgr Gasterosteus.aculeatus
$ python scripts/trees/make_tree_images.py -i SCORPiOs_example/Corrections/tmp_whole_trees_0/ --wgd Clupeocephala --outgr Lepisosteus.oculatus,Amia.calva -f pdf -o img_clup --color_outgr
-
scripts.trees.make_tree_images.
color_internal_node
(node, is_corrected_wgd=False)¶ Colors an internal node with convention colors: red for duplication, blue for speciation, cyan for dubious duplication.
- Parameters
node (ete3.TreeNode) – node to color
is_corrected_wgd (bool, optional) – set special style if node is corrected wgd node
-
scripts.trees.make_tree_images.
color_leaves
(node, palette, edit_d, wgd, usedict=False, moved=False, ignore=False)¶ Colors names of leaves of corrected subtrees.
- Parameters
node (ete3.TreeNode) – leaf to color
palette (list) – pre-defined list of colors to use
edit_d (dict) – dictionary to store/load colors
wgd (str) – restrict coloring to specified wgd
usedict (bool, optional) – should dictionary be used to load colors
moved (bool, optional) – color rearranged non-wgd species leaves
-
scripts.trees.make_tree_images.
get_corrected_wgd_nodes
(tree, wgd, outgroups)¶ Finds all nodes that correspond to corrected wgd nodes.
- Parameters
tree (ete3.Tree) – input tree
wgd (str) – restricts search to specific wgd
- Returns
the list of matched ete3.TreeNodes
- Return type
list
-
scripts.trees.make_tree_images.
identify_outgroup_vs_wgd_subtree
(subtrees, outgroups, tree)¶ Identifies outgroup subtree amongst given trees.
- Parameters
subtrees (list of ete3.TreeNode) – input subtrees
outgroups (list of str) – name of outgroups
tree (ete3.Tree) – whole tree
- Returns
the node corresponding to wgd corrected subtree
- Return type
ete3.TreeNode
-
scripts.trees.make_tree_images.
make_all_figures
(files, palette, edit_d, outfolder, wgd, outgrs, usedict=False, outformat='png', moved=False, coutgr=False)¶ Generates images for a list of input files.
- Parameters
files (list) – list of input files
palette (list) – pre-defined list of colors to use
edit_d (dict) – dictionary to store/load colors
wgd (str) – restrict coloring to specified wgd
usedict (bool, optional) – should dictionary be used to load colors
outformat (str, optional) – output format for image png (default), svg or pdf
moved (bool, optional) – color rearranged non-wgd species leaves
- Returns
colors to leaf names mapping
- Return type
dict
-
scripts.trees.make_tree_images.
make_tree_figure
(tree, palette, outfile, edit_d, wgd, outgroups, usedict=False, outformat='png', moved=False, coutgr=False)¶ Creates and saves an image for a tree object.
- Parameters
tree (ete3.Tree) – input tree
palette (list) – pre-defined list of colors to use
outfile (str) – name for the output image
edit_d (dict) – dictionary to store/load colors
wgd (str) – restrict coloring to specified wgd
usedict (bool, optional) – should dictionary be used to load colors
outformat (str, optional) – output format for image png (default), svg or pdf
moved (bool, optional) – color rearranged non-wgd species leaves
- Returns
colors to leaf names mapping
- Return type
dict
scripts.trees.merge_subtrees module¶
Script to merge together independently resolved orthogroups of the same family into a single tree.
Example:
$ python -m scripts.trees.merge_subtrees -t orthogroup_tree1.nh orthogroup_tree2.nh
-outgr gene_name [-o out]
-
scripts.trees.merge_subtrees.
merge_trees_and_write
(trees, outgr, outfile, keep_br=False)¶ Merges two subtrees independently resolved into a single tree and adds the outgroup gene. Writes the result to file.
- Parameters
trees (list of ete3.Tree) – Tree(s) to merge
outgr (str) – Outgroup gene name
outfile (str) – Output filename
-
scripts.trees.merge_subtrees.
remove_outgroup
(tree, outgr)¶ Loads a subtree and removes the outgroup gene.
- Parameters
tree (ete3.Tree) – Input trree
outgr (str) – Outgroup gene name
scripts.trees.orthologs module¶
Script to extract orthologous genes within a gene tree forest amongst a given list of species. All pairwise orthologies will be stored in the output folder (one file for each species pair).
Example:
$ python -m scripts.trees.orthologs -t gene_trees.nhx -d Clupeocephala -s sptree.nwk
[-o out] [-ow Salmonids] [-l lowcov_sp1,lowcov_sp2]
-
scripts.trees.orthologs.
get_speciation_events
(tree, species_pairs, sp_ortho_dict)¶ Extracts all orthologies relationships in a gene tree involving the species given in input, and adds them to the orthology dict.
- Parameters
tree (ete3.Tree) – input Tree object.
species_pairs (list) – species pairs to consider.
sp_ortho_dict (dict) – dictionary to store orthologies.
-
scripts.trees.orthologs.
is_speciation
(node)¶ Is the node a speciation node?
- Parameters
tree (ete3.TreeNode) – input node, with duplications annotated with the D attribute. D=Y if duplication, D=N otherwise. Note that dubious nodes (DD=Y or DCS=0) are considered speciation nodes.
- Returns
True if speciation, False otherwise.
- Return type
bool
scripts.trees.parse_au_test module¶
Script to parse results of gene trees likelihood AU-test (here for comparison of 2 trees only), as written by CONSEL (Shimodaira, 2002).
Example:
$ python -m scripts.trees.parse_au_test -i inputs_polyS.txt [-o Accepted_Trees]
[-it inputs_treeB.txt] [-one n] [-wgd Clupeocephala] [-p path/tree.nh] [--lore]
-
scripts.trees.parse_au_test.
count
(filenames, name_sol='', alpha=0.05, item='1', parse_name=True, wgd='')¶ Parses all consel outputs in the input list and returns a list of ‘accepted trees’. Prints to screen the numbers and proportion of trees: (i) accepted by the likelihood test (ii) accepted with a better likelihood (iii) accepted with a significantly better likelihood.
- Parameters
filenames (list of str) – List of consel result files.
name_sol (str, optional) – Tag for tested trees that will be printed with the results. For instance, it can be the program used to build tested trees.
alpha (float) – alpha threshold for significance of the AU-test.
item (str, optional) – tested tree consel label, the other is considered the reference.
parse_name (bool, optional) – parse filename (expects SCORPiOs naming pattern)
wgd (str, optional) – name of the corrected wgd, to print out with the result summary
- Returns
list of names of accepted trees.
- Return type
list
-
scripts.trees.parse_au_test.
lore_aore_summary
(filenames, alpha=0.05, item_dict=None, parse_name=True, wgd='')¶ Parses all consel outputs in the input list and returns a summary, telling, for each tree, if the lore or aore (or both) topologies can be rejected. Also prints statistics to stdout.
- Parameters
filenames (list of str) – List of consel result files.
alpha (float) – alpha threshold for significance of the AU-test.
item_dict (str, optional) – tested tree consel label, the other is considered the reference.
parse_name (bool, optional) – parse filename (expects SCORPiOs naming pattern)
wgd (str, optional) – name of the corrected wgd, to print out with the result summary
- Returns
dictionary with the AU-tests results summary.
- Return type
dict
-
scripts.trees.parse_au_test.
one_file_consel
(filename, alpha, item_test='1')¶ Parses one consel file record to obtain AU-test results. This function assumes that the AU-test was performed to compare two trees only.
By default, the tree labelled ‘1’ in consel is considered the tested tree, result indicate whether this tree is as good as (or better) as the other tree (reference).
- Parameters
fi (str) – input filename.
alpha (float, optional) – alpha threshold for significance of the AU-test.
item (str, optional) – tested tree consel label, the other is considered the reference.
- Returns
One of ‘error’, ‘rejected’, ‘equivalent lower lk’, ‘equivalent higher lk’ or ‘better sign. higher lk’.
- Return type
str
Note
The function returns the ‘error’ string in two cases:
the input file is empty, in SCORPiOs workflow this happens when the synteny aware tree could not be built with ProfileNJ due to fastdist failing to build the distance matrix.
CONSEL failed to compute the log-likelihoods from the phyml or raxml site likelihood file –> logs should be checked.
-
scripts.trees.parse_au_test.
one_file_consel_3_trees
(filename, alpha, item_dict=None)¶ Parses a CONSEL result file for a comparison of 3 trees.
- Parameters
filename (str) – name of CONSEL file to parse
alpha (float) – alpha threshold for significance of the AU-test.
item_dict (dict, optional) – correspondance between item in CONSEL and tree labels
- Returns
One of ‘error’, ‘convergence_pb’, ‘aore rejected’, ‘lore rejected’ or ‘lore and aore rejected’.
- Return type
str
scripts.trees.regraft_subtrees module¶
Script to re-graft corrected subtrees in their original tree and write the corrected gene trees forest.
Example:
$ python -m scripts.trees.regraft_subtrees -t trees.nhx -a alis.fa -s species_tree.nwk
-acc Accepted_trees.txt -o outtrees.nhx -anc Clupeocephala,Salmonidae
-ogr Lepisosteus.oculatus,Amia.calva_Esox.lucius [-n 1] [-tmp path/tmp] [-sa n] [-br y]
Adds nhx tags to corrected leaves and remove species name from gene names.
- Parameters
tree (ete3.Tree) – the whole gene tree
cor_leaves (list) – list of list of leaves belonging to the same corrected subtree
moved_leaves (list) – list of list of leaves rearranged to correct a subtree
tag (str) – name of the wgd, to include in the tag name
rm_species (bool, optional) – Whether species names should be removed
- Returns
names of all added .nhx tags
- Return type
set
-
scripts.trees.regraft_subtrees.
correct_wtrees
(tree, to_cor, res, tree_id, outfiles, outgroup_sp, sp_below_wgd=None, sp_current_wgd=None, tag='')¶ Re-graft all subtrees corrected for one WGD into the initial gene tree.
- Parameters
tree (ete3.Tree) – the initial gene tree
to_cor (dict) – for each corrected subtree (identified by the outgroup gene), the path to the corrected subtree (value)
res (dict) – Stores a summary of applied corrections for tree
tree_id (int) – Index of the gene tree in the forest, used as key in res
outfiles (str) – path to store the gene tree after subtree re-grafting
outgroup_sp (dict) – for each outgroup and WGD species, a list of its sister species
sp_below_wgd (dict, optional) – For each subsequent wgds, the duplicated species and their outgroups
sp_current_wgd (list, optional) – A list of duplicated species for the WGD for which the subtrees were corrected
tag (str, optional) – tag to add (WGD for instance) to document corrections in outputs
Note
The res dictionary is filled in-place with a correction summary for tree.
-
scripts.trees.regraft_subtrees.
init_worker
()¶
-
scripts.trees.regraft_subtrees.
multiprocess_rec_brlgth
(trees, alis, ncores, modified_trees, folder_cor, sptree, prefix='cor', brlengths=True, resume=False, raxml=False)¶ Reconciles with the species tree and optionaly compute branch-lengths for a subset of trees in modified_trees of a gene trees forest, in parallel. Each output reconciled tree is written at outfolder/prefix_treeid.nhx.
- Parameters
trees (str) – path to gene trees forest
alis (str) – path to the corresponding multiple alignments
ncores (int) – number of cores to use for parallel execution
modified_trees (dict) – gene trees to reconcile
folder_cor (str) – path to store each output reconciled tree file
sptree (str) – name of the species tree file
prefix (str, optional) – prefix to add to output trees
brlengths (bool, optional) – Whether branch-lengths should be computed
raxml (bool, optional) – if true, use raxml instead of treebest phyml to compute br. lengths.
-
scripts.trees.regraft_subtrees.
topo_changes
(lca, stree, leaves_to_move, outgr, authorized_sp)¶ Makes necessary topological changes in the initial tree to paste a corrected subtree. Genes in the corrected subtree are grouped as a clade in the new tree, while other branches are modified as less as possible.
- Parameters
lca (ete3.TreeNode) – the original tree topology below the node containing all corrected leaves
stree (ete3.Tree) – the corrected subtree to re-graft
leaves_to_move (list of str) – genes that split genes of the corrected subtrees into several clades the original tree
outgr (str) – Name of the outgroup gene used to build the corrected subtree
authorized_sp (list of str) – list of species whose genes should remained grouped with the outgroup
- Returns
The new gene tree with modified topology
- Return type
ete3.Tree
-
scripts.trees.regraft_subtrees.
worker_rec_brlgth
(tree, outfolder, treeid, sptree, ali='', prefix='cor', brlengths=True, resume=False, raxml=False)¶ Reconciles a given gene tree with the species tree using treebest sdi, and optionaly computes branch-length using treebest phyml. Also adds .nhx tags to corrected leaves if the corrections_summary dict is provided. The output tree is written at outfolder/prefix_treeid.nhx.
- Parameters
tree (ete3.Tree) – input tree to reconcile.
outfolder (str) – path to write the output
tree_id (str) – identifier of the tree, used in the output .nhx file name.
sptree (str) – name of the species tree file
ali (str, optional) – the fasta multiple alignment, required if branch lengths have to be computed
prefix (str, optional) – string to add as prefix to the output file
brlengths (bool, optional) – Whether branch-lengths should be computed
raxml (bool, optional) – if true, use raxml instead of treebest phyml to compute br. lengths.
- Returns
True if no Exception was raised.
- Return type
bool
scripts.trees.speciestree module¶
Module with functions to work with a species tree.
-
scripts.trees.speciestree.
get_anc_order
(tree_file, ancestors=None, tips_to_root=False, prune=True)¶ Orders input ancestors with respect to their position in the species tree. Can be ordered from root to tips (default) or tips to root.
- Parameters
tree_file (str) – Path to the input newick formatted tree.
ancestors (optional, list of str) – List of ancestor names. If unspecified, all the ancestors in the trees will be returned.
- Returns
ancestor names in the requested order (keys) and list of ancestors in the input list that are below it (values).
- Return type
OrderedDict
-
scripts.trees.speciestree.
get_sister_species
(species_tree, species, anc)¶ Extracts a list of species related to a given species: species branching between species and the ancestor anc.
- Parameters
species_tree (ete3 Tree) – ete3 tree object
species (str) – name of the species
anc (str) – name of the ancestor
- Returns
species branching between species and anc
- Return type
list
-
scripts.trees.speciestree.
get_species
(species_tree, anc, other_wgd_anc='', lowcov_species='')¶ Extracts a list of species descending from a given ancestor in a species tree. Filter out species under particular ancestors (i.e subsequent WGDs for instance) given by ‘other_WGD_anc’, as well as ‘low-coverage’ species given by ‘lowcov_species’.
- Parameters
tree_file (str) – Path to the input newick tree.
out_file (str) – Path for the output file.
other_wgd_anc (str, optional) – Comma-delimited names of ancestors with subsequent WGDs.
lowcov_species (str, optional) – Comma-delimited names of ‘lowcoverage’ species to exclude.
- Returns
The list of species.
- Return type
species (list)
-
scripts.trees.speciestree.
is_below
(node1, node2)¶ Checks if node2 is below node1 in the tree topology.
- Parameters
node1 (ete3 TreeNode) – node1
node2 (ete3 TreeNode) – node2
- Returns
True if node2 is below node1, False otherwise.
- Return type
bool
-
scripts.trees.speciestree.
remove_anc
(tree_file, out_file)¶ Removes any internal node name, such as ancestor names, in the input tree and writes it to a new file.
- Parameters
tree_file (str) – Path to the input newick formatted tree.
out_file (str) – Path for the output file.
-
scripts.trees.speciestree.
search_one_node
(tree, node_name)¶ Searches for a node in the input tree given its name. Throws AssertionError if the node name is not found or is not unique.
- Parameters
tree (ete3 Tree) – input tree
node_name (str) – node to search
- Returns
the matched node
- Return type
ete3 TreeNode
scripts.trees.utilities module¶
Module with functions to work with gene trees and gene alignments.
-
scripts.trees.utilities.
delete_gaps_in_all
(ali)¶ Removes columns of an alignment with gaps in all sequences (in-place).
- Parameters
ali (dict) – dictionary storing the alignment with gene names as keys and aligned sequences as values.
Note
Throws an assertion error if the alignment is empty
-
scripts.trees.utilities.
get_subali
(ali_string, genes, d_names=None)¶ Extract a sub-alignment of genes of interest from a fasta alignment string and remove columns with gaps in all sequences.
- Parameters
ali_string (str) – alignment string in fasta format.
genes (list) – list of genes to extract.
d_names (dict, optional) – dictionary to add suffix to gene names, for instance to add species. If used, all genes have to be in this mapping dictionary.
- Returns
sequences sub-alignment gene names as keys and aligned sequences as values.
- Return type
dict
-
scripts.trees.utilities.
read_multiple_objects
(file_object, sep='//')¶ Creates a generator to read a file with several entries (trees, alignments, or other…) one by one.
- Parameters
file_object (file) – python file object of the input file
sep (str, optional) – the separator between entries
- Yields
str – the next tree (or alignment).
-
scripts.trees.utilities.
write_fasta
(ali, outfile, d_names=None)¶ Writes a fasta alignment file.
- Parameters
ali (dict) – dictionary storing the alignment {name1: seq1, name2: seq2}.
outfile (str) – name of the file to write.
d_names (dict, optional) – dictionary to transform names, for instance to add species. If used, all genes have to be in this dictionary.
-
scripts.trees.utilities.
write_forest
(inforest, outforest, corrections, current_wgd='', cor_treefiles='', save_single_treefile=False)¶ Writes a gene tree forest after applying corrections to some gene trees, as described in corrections. Browses the input gene trees forest and writes, for each gene tree, either its unmodified input version or its corrected version if listed in corrections`.
- Parameters
inforest (str) – Name of the .nhx file with the input gene tree forest
outforest (str) – Name of the output .nhx file for the corrected gene tree forest
corrections (dict) – For each corrected tree (described by its index in the forest) a list of 5-elements tuples describing applied corrections: name of wgd, corrected tree file, + 3 correction descriptors If current_wgd is not used, corrections can simply be the list of corrected trees, but cor_treefiles has to be specified.
current_wgd (str, optional) – If the forest is being corrected for one particular wgd, use corrections applied to this wgd. If not used, corrections apply to all trees in corrections but cor_treefiles has to be specified.
cor_treefiles (str, optional) – Path to corrected trees to use if current_wgd is not specified.
save_single_treefile (bool, optional) – Whether individual original input trees should be written to file and the individual corrected tree file kept.