scripts.trees package

scripts.trees.build_treebest_trees module

Script to build starting gene trees with TreeBeST best, from CDS back translated nucleotide alignments, given a species tree and a gene species mapping file.

Example:

 $ python -m build_treebest_trees -a alis_v89.fa.gz -sp species_tree_v89.nwk
-m genesp_v89.txt [-o treebest_forest_v89.nhx] [-nc 1] [-tmp tmp]
scripts.trees.build_treebest_trees.init_worker()
scripts.trees.build_treebest_trees.worker_build_tree(ali, genes_sp, sptree, ali_id, tmp_folder='', X=10)

Build a gene tree from the multiple alignment string in ali, while accounting for the species tree sptree, using treebest best.

If the output tree file already exists, the file will not be updated. This allows to re- execute a SCORPiOs snakemake run without recomputing all trees in case of error.

Parameters
  • ali (str) – the fasta multiple alignment

  • genes_sp (str) – the corresponding genes to species mapping

  • sptree (str) – path to the newick species tree

  • ali_id (str) – identifier of the tree, used in the output .nhx file name.

  • tmp_folder (str) – path to temp individual ali, will store temp individual tree.

  • X (int, optional) – -X parameter for treebest best (default=10).

Returns

True if no Exception was raised.

Return type

bool

scripts.trees.convert_ids module

Script to convert gene IDs in the trees and alignment files to shorter IDs. This will allow the alignment to be converted to the phylip format so that phyml can be run with correct input formats (trees and ali). Converted output filenames are input filenames prefixed with tmp_.

Example:

$ python -m scripts.trees.convert_ids -t gene_tree1.nh gene_tree2.nh -a ali.fa
scripts.trees.convert_ids.convert_ali(fastafile, output, d_conv)

Converts gene IDs in an input multiple gene alignment in fasta format. The conversion dictionary must be given.

Parameters
  • fastafile (file) – input tree in newick format.

  • output (str) – name for the output file.

  • d_conv (dict) – Conversion from old to new IDs.

scripts.trees.convert_ids.convert_tree(treefile, output, d_conv=None, text='')

Converts gene IDs in an input tree. A conversion dictionary can be given, otherwise it is generated.

Parameters
  • treefile (file) – input tree in newick format.

  • output (str) – name for the output file.

  • d_conv (dict, optional) – Conversion from old to new IDs.

  • text (str, optional) – Debug information

Returns

Conversion old to new IDs.

Return type

dict

scripts.trees.cut_subtrees module

From a synteny-derived constrained tree topology, extract genes together in an orthogroup and their sequence alignment, for treebest phyml independent resolution of each orthogroup.

Example:

$ python -m scripts.trees.cut_subtrees -t ctree.nh -a ali.fa -og outgr_gene_name
-oa outali -ot outtree
scripts.trees.cut_subtrees.get_orthogroups_genes(ctree, outgr_gene_name)

Finds the two polytomies in the constrained tree topology.

Parameters
  • ctree (str) – input tree file in newick format.

  • outgr_gene_name (str) – gene name of the outgroup gene.

Returns

the 1 or 2 polytomy node(s) and their corresponding size. str: full outgroup gene name (with species tag)

Return type

dict

scripts.trees.cut_subtrees.write_resolved_tree(orthog_tree, outgr_gene_name, out)

Writes solution trees for orthogroup with only 2 genes.

Parameters
  • tree (orthogroup) – Node with the 2 descendants of the orthogroup.

  • outgr_gene_name (str) – full outgroup gene name (with species tag).

  • outfile (str) – filename to write the tree.

scripts.trees.genetree module

Module with functions to work with a gene tree.

scripts.trees.genetree.add_nhx_tags(wtree, features)

Adds leaves attribute stored in a dict as .nhx tags (i.e as features in the ete3.Tree object). Leaf names in dict can be with or without species tags, but leaves should have name and S attributes.

Parameters
  • wtree (ete3.Tree) – Tree for which to add new leaves attributes (modified in-place)

  • features (dict) – A dictionary giving, for each leaves, the pairs (attribute, value) to add.

Returns

the list of names of all added features

Return type

list

scripts.trees.genetree.branch_length_closest(tree, gene, group_of_genes)

Finds the gene closest to gene in tree amongst a group_of_genes, i.e with the shortest branch length.

Parameters
  • tree (ete3.Tree) – input tree

  • gene (ete3.TreeNode) – gene for which to search a neighbour

  • group_of_genes (list of str) – list of candidate neighbour genes

Returns

name of the closest neighbour, in terms of branch lengths

Return type

str

scripts.trees.genetree.closest_gene_in_tree(tree, node, group_of_genes, attr='name')

Finds the gene closest to gene in tree amongst a group_of_genes, i.e the gene with (i) the shortest topological distance to gene and (ii) the shortest branch-length distance in case of ties in (i).

Parameters
  • tree (ete3.Tree) – input tree

  • gene (ete3.TreeNode) – gene for which to search for closest neighbour

  • group_of_genes (list of str) – list of candidate neighbour genes

  • attr (str, optional) – name of the attriute storing gene names

Returns

name of closest neighbour

Return type

str

scripts.trees.genetree.copy_nhx_tags(tree_ref_tags, tree_target)

Copies nhx tags stored in leaves of tree1 to leaves of tree2. tree2 is modified in-place

Parameters
  • tree_ref_tags (ete3.Tree) – tree with nhx tags to copy

  • tree_target (ete3.Tree) – tree to copy tags to

scripts.trees.genetree.find_node_with_most_desc(tree1, corrected_leaves)

Finds the node in tree1 with all its descending leaves in corrected_leaves and the maximum number of descending leaves.

Parameters
  • tree1 (ete3.Tree) – input tree

  • corrected_leaves (list of str) – list of leaves name to maximize in descendants

Returns

the identified node

Return type

ete3.TreeNode

scripts.trees.genetree.find_sister_of_outgroup(leaf_outgr, authorized_sp, sister_outgroup_genes)

Extracts a list of genes related to an outgroup gene leaf_outgr: genes belonging to related species authorized_sp and grouped together in the gene tree.

Parameters
  • leaf_outgr (ete3 TreeNode) – node of the outgroup gene in the tree

  • authorized_sp (list of str) – list of related species

  • sister_outgroup_genes (list of str) – list of related genes (to update in-place)

scripts.trees.genetree.get_solution_subtree(corr_dict, subtree_name)

Loads a SCORPiOs-corrected subtree.

Parameters
  • corr_dict (dict) – SCORPiOs-corrected subtrees

  • subtree_name (str) – name of the corrected subtree to load

Returns

gene tree in an ete3.Tree object

Return type

ete3.Tree

scripts.trees.genetree.keep_sis_genes_together(duplicated_sp_subtree, outgr, sister_outgroup_genes, outgroup_subtree, node_max='node_max')

Keeps genes of all outgroup species together when modifying a gene tree, so that the new tree remains species tree consistent for these species that branch between the outgroup and duplicated species.

Parameters
  • duplicated_sp_subtree (ete3.Tree) – Synteny-corrected subtree

  • outgr (str) – name of the non-duplicated outgroup gene

  • sister_outgroup_genes (list of str) – genes that are grouped with the outgroup gene in the original tree and in related species

  • outgroup_subtree (ete3.Tree) – subtree with only outgroup and related genes

  • node_max (str, optional) – internal node name in the outgroup subtree where to paste the duplicated species subtree. If empty a new tree combining both is created.

Returns

a new tree where the outgroup gene in the synteny-corrected is replaced by the subtree of all outgroup genes

Return type

ete3.Tree

scripts.trees.genetree.keep_subsequent_wgd_species(stree, ensembl_tree, missing_leaves_keep, sp_current_wgd, authorized_sp)

When re-grafting a subtree corrected for species descending from ‘WGD1’ only, keep positions of species with subsequent WGDs consistent in the tree. To do so, find the closest ‘WGD1-only’ species gene in ensembl tree and keep subsequently duplicated species genes at the same position (relative to it).

Modifies stree in-place.

Parameters
  • stree (ete3.Tree) – Tree object for the synteny corrected tree of WGD1

  • ensembl_tree (ete3.Tree) – Tree object for the full original gene tree

  • missing_leaves_keep (list of ete3.TreeNode) – Genes of subsequently duplicated species

  • sp_current_wgd (list of str) – List of WGD1 duplicated species

  • authorized_sp (dict) – Dict used to keep the tree consistent with the species tree. For a ‘WGD1’ species, a list of WGD1 species that are closer to it than are 4R species.

scripts.trees.genetree.load_corrections(files)

Gets the name and path of SCORPiOs corrected subtrees (i.e accepted by AU-tests).

Parameters

files (str) – Comma-delimited list of files with accepted corrections.

Returns

for each corrected subtree, the path to the tree file and the name of the corrected WGD.

Return type

dict

scripts.trees.genetree.save_nhx_tags(attribute_names, groups_of_leaves)

Stores node features in a dictionary. This dictionary can be used to add .nhx tags to a tree.

Parameters
  • attribute_names (list of str) – list of names of features to add

  • groups_of_leaves (nested list) – For each feature, sub-lists of nodes to annotate with the same feature value, the value being the index of the sub-list.

Returns

For each node, the attribute name and its associated value.

Return type

dict

Example

if attribute_names is [‘a’, ‘b’] and groups_of_leaves is [[[‘gene1’, ‘gene2’], [‘gene3’]], [‘gene4’]], the function returns: features = { ‘gene1’: [(‘a’, 1)], ‘gene2’: [(‘a’, 1)], ‘gene3’: [(‘a’, 2)], ‘gene4’: [(‘b’, 1)] }

scripts.trees.inconsistent_trees module

Script to load orthogroups defined in the synteny analysis, transform them into a constrained tree and find trees that are inconsistent with the constraints. These constrained trees will be saved to file, along with the corresponding original subtree and sub-alignment for later correction purposes.

Example:

$ python -m scripts.trees.inconsistent_trees -i GraphsOrthogroups -t forest_v89.nhx
-a alis_v89.fa -n Lepisosteus.oculatus -f OrthoTable [-oc ctrees] [-oa subalis]
[-ot subtrees] [-gs GraphsCutSummary] [-s outsummary] [-wgd ''] [-fcombin out]
class scripts.trees.inconsistent_trees.FamilyOrthologies

Bases: object

FamilyOrthologies object containing the outgroup gene, the genes in each orthogroup, all genes in the family in the orthologytable, and the corresponding constrained gene tree topology.

is_multigenic()

Filters multigenic subtrees, where more duplications than just the 3R duplication is involved. These families are often full of errors in original gene trees and difficult to solve.

Returns

Is the subtree multigenic (True) or not (False)

Return type

bool

to_constrained_tree()

Transforms the orthogroups + outgroup into a constrained topology, represented by an ete3.Tree object.

update_constrained_tree(leaves_to_place, ensembl_tree)

Adds, to the constrained tree, leaves that are under the lca in the original subtree and were predicted to be in the family (orthotable). These can be, for instance, genes of lowcov species that were discarded from the synteny analysis. They will be placed in the same orthogroup as its closest neighbour in the original ensembl tree.

Parameters
  • leaves_to_place (list) – list of the name of genes to add to the ctree.

  • ensembl_tree (ete3 Tree) – original gene tree.

update_orthologies(outgroup_gene, orthogroup)

Adds genes of one orthogroup. a’s and b’s are arbitrary.

Parameters
  • outgroup_gene (str) – name of the outgroup gene

  • orthogroup (list) – list of names of genes in one orthogroup

scripts.trees.inconsistent_trees.get_inconsistent_trees(tree, ali, outgroups, all_families, sfile, octr, otr, oal, stats=None, discard_sp=None, no_ctree=False)

For a given ensembl tree, check whether synteny-derived constrained topologies are consistent with it. If not, the corresponding constrained trees, ensembl subtrees and ensembl sub-alignments will be saved to file.

Parameters
  • tree (str) – ensembl tree in newick format

  • ali (str) – ensembl ali in fasta format

  • outgroups (list) – list of outgroup species used in the synteny-analysis

  • all_families (dict of OrthologyFamily instances) – for each outgroup genes (key) an

  • instance (OrthologyFamily) –

  • cfile (str) – file to write name of synteny consistent subtrees

  • mfile (str) – file to write name of multigenic subtrees

  • stats (dict, optional) – dict to count the number of consistent and inconsistent trees

scripts.trees.inconsistent_trees.load_pred_file(input_file, outgr, d_orthotable)

Loads predicted orthogroups after community detection in graphs. Stores everything in a FamilyOrthologies object.

Parameters
  • input_file (str) – Input filename

  • outgr (str) – Name of the corresponding outgroup species

  • d_orthotable (dict) – Orthologytable with outgroup-duplicated species families definition

Returns

For each outgroup gene (key), orthogroups in a FamilyOrthologies object (value).

Return type

dict

scripts.trees.inconsistent_trees.print_out_stats(stats_dict, wgd='')

Prints to stdout some statistics on the number of synteny consistent subtrees.

Parameters
  • stats_dict (dict) – a dict counting the number of consistent and inconsistent subtrees

  • wgd (str, optional) – the wgd for which the synteny graphs were computed

scripts.trees.iteration_nhx_tags module

This scripts allows to recover correction tags for trees that have been corrected multiple times during iterative correction (currently .nhx tags are wiped out if a same tree is corrected again) Optionally also adds tag to internal corrected nodes that are corrected subtrees.

Example:

$ python -m scripts.trees.iteration_nhx_tags -i 5
-c SCORPiOs_example/corrected_forest_%d.nhx [-o out.nhx] [--internal]
scripts.trees.iteration_nhx_tags.corr_tag_below_node(node, tags_corr)

Search for the presence of .nhx tags for leaves below the node node.

Parameters
  • node (ete3 TreeNode) – the input node

  • tags_corr (list of str) – list of tags to search for

Returns

True if at least one of the input tags_corr is in leaves below node

Return type

bool

scripts.trees.make_tree_images module

This script generates figures of SCORPiOs corrected trees and their uncorrected counterpart. It is specifically designed to visualize SCORPiOs subtree corrections. Therefore, it assumes that inputs are SCORPiOs-generated files, with SCORPiOs file naming and format conventions.

Leaves of each SCORPiOs-corrected wgd subtree are printed with the same color in both original and corrected tree. Optionally, --show_moved also assigns a matching lighter color to leaves of non-wgd species that have been rearranged to reinsert the wgd subtree.

Input can either be a list of files, in any order, containing any number of corrected/uncorrected tree pairs, or a directory. The name of the corrected wgd and of the outgroups used should also be provided.

Examples:

$ python scripts/trees/make_tree_images.py -i SCORPiOs_example/Corrections/tmp_whole_trees_0/cor_27 SCORPiOs_example/Corrections/tmp_whole_trees_0/ori_27 --wgd Salmonidae --outgr Gasterosteus.aculeatus

$ python scripts/trees/make_tree_images.py -i SCORPiOs_example/Corrections/tmp_whole_trees_0/ --wgd Clupeocephala --outgr Lepisosteus.oculatus,Amia.calva -f pdf -o img_clup --color_outgr
scripts.trees.make_tree_images.color_internal_node(node, is_corrected_wgd=False)

Colors an internal node with convention colors: red for duplication, blue for speciation, cyan for dubious duplication.

Parameters
  • node (ete3.TreeNode) – node to color

  • is_corrected_wgd (bool, optional) – set special style if node is corrected wgd node

scripts.trees.make_tree_images.color_leaves(node, palette, edit_d, wgd, usedict=False, moved=False, ignore=False)

Colors names of leaves of corrected subtrees.

Parameters
  • node (ete3.TreeNode) – leaf to color

  • palette (list) – pre-defined list of colors to use

  • edit_d (dict) – dictionary to store/load colors

  • wgd (str) – restrict coloring to specified wgd

  • usedict (bool, optional) – should dictionary be used to load colors

  • moved (bool, optional) – color rearranged non-wgd species leaves

scripts.trees.make_tree_images.get_corrected_wgd_nodes(tree, wgd, outgroups)

Finds all nodes that correspond to corrected wgd nodes.

Parameters
  • tree (ete3.Tree) – input tree

  • wgd (str) – restricts search to specific wgd

Returns

the list of matched ete3.TreeNodes

Return type

list

scripts.trees.make_tree_images.identify_outgroup_vs_wgd_subtree(subtrees, outgroups, tree)

Identifies outgroup subtree amongst given trees.

Parameters
  • subtrees (list of ete3.TreeNode) – input subtrees

  • outgroups (list of str) – name of outgroups

  • tree (ete3.Tree) – whole tree

Returns

the node corresponding to wgd corrected subtree

Return type

ete3.TreeNode

scripts.trees.make_tree_images.make_all_figures(files, palette, edit_d, outfolder, wgd, outgrs, usedict=False, outformat='png', moved=False, coutgr=False)

Generates images for a list of input files.

Parameters
  • files (list) – list of input files

  • palette (list) – pre-defined list of colors to use

  • edit_d (dict) – dictionary to store/load colors

  • wgd (str) – restrict coloring to specified wgd

  • usedict (bool, optional) – should dictionary be used to load colors

  • outformat (str, optional) – output format for image png (default), svg or pdf

  • moved (bool, optional) – color rearranged non-wgd species leaves

Returns

colors to leaf names mapping

Return type

dict

scripts.trees.make_tree_images.make_tree_figure(tree, palette, outfile, edit_d, wgd, outgroups, usedict=False, outformat='png', moved=False, coutgr=False)

Creates and saves an image for a tree object.

Parameters
  • tree (ete3.Tree) – input tree

  • palette (list) – pre-defined list of colors to use

  • outfile (str) – name for the output image

  • edit_d (dict) – dictionary to store/load colors

  • wgd (str) – restrict coloring to specified wgd

  • usedict (bool, optional) – should dictionary be used to load colors

  • outformat (str, optional) – output format for image png (default), svg or pdf

  • moved (bool, optional) – color rearranged non-wgd species leaves

Returns

colors to leaf names mapping

Return type

dict

scripts.trees.merge_subtrees module

Script to merge together independently resolved orthogroups of the same family into a single tree.

Example:

$ python -m scripts.trees.merge_subtrees -t orthogroup_tree1.nh orthogroup_tree2.nh
-outgr gene_name [-o out]
scripts.trees.merge_subtrees.merge_trees_and_write(trees, outgr, outfile, keep_br=False)

Merges two subtrees independently resolved into a single tree and adds the outgroup gene. Writes the result to file.

Parameters
  • trees (list of ete3.Tree) – Tree(s) to merge

  • outgr (str) – Outgroup gene name

  • outfile (str) – Output filename

scripts.trees.merge_subtrees.remove_outgroup(tree, outgr)

Loads a subtree and removes the outgroup gene.

Parameters
  • tree (ete3.Tree) – Input trree

  • outgr (str) – Outgroup gene name

scripts.trees.orthologs module

Script to extract orthologous genes within a gene tree forest amongst a given list of species. All pairwise orthologies will be stored in the output folder (one file for each species pair).

Example:

$ python -m scripts.trees.orthologs -t gene_trees.nhx -d Clupeocephala -s sptree.nwk
[-o out] [-ow Salmonids] [-l lowcov_sp1,lowcov_sp2]
scripts.trees.orthologs.get_speciation_events(tree, species_pairs, sp_ortho_dict)

Extracts all orthologies relationships in a gene tree involving the species given in input, and adds them to the orthology dict.

Parameters
  • tree (ete3.Tree) – input Tree object.

  • species_pairs (list) – species pairs to consider.

  • sp_ortho_dict (dict) – dictionary to store orthologies.

scripts.trees.orthologs.is_speciation(node)

Is the node a speciation node?

Parameters

tree (ete3.TreeNode) – input node, with duplications annotated with the D attribute. D=Y if duplication, D=N otherwise. Note that dubious nodes (DD=Y or DCS=0) are considered speciation nodes.

Returns

True if speciation, False otherwise.

Return type

bool

scripts.trees.parse_au_test module

Script to parse results of gene trees likelihood AU-test (here for comparison of 2 trees only), as written by CONSEL (Shimodaira, 2002).

Example:

$ python -m scripts.trees.parse_au_test -i inputs_polyS.txt [-o Accepted_Trees]
[-it inputs_treeB.txt] [-one n] [-wgd Clupeocephala] [-p path/tree.nh] [--lore]
scripts.trees.parse_au_test.count(filenames, name_sol='', alpha=0.05, item='1', parse_name=True, wgd='')

Parses all consel outputs in the input list and returns a list of ‘accepted trees’. Prints to screen the numbers and proportion of trees: (i) accepted by the likelihood test (ii) accepted with a better likelihood (iii) accepted with a significantly better likelihood.

Parameters
  • filenames (list of str) – List of consel result files.

  • name_sol (str, optional) – Tag for tested trees that will be printed with the results. For instance, it can be the program used to build tested trees.

  • alpha (float) – alpha threshold for significance of the AU-test.

  • item (str, optional) – tested tree consel label, the other is considered the reference.

  • parse_name (bool, optional) – parse filename (expects SCORPiOs naming pattern)

  • wgd (str, optional) – name of the corrected wgd, to print out with the result summary

Returns

list of names of accepted trees.

Return type

list

scripts.trees.parse_au_test.lore_aore_summary(filenames, alpha=0.05, item_dict=None, parse_name=True, wgd='')

Parses all consel outputs in the input list and returns a summary, telling, for each tree, if the lore or aore (or both) topologies can be rejected. Also prints statistics to stdout.

Parameters
  • filenames (list of str) – List of consel result files.

  • alpha (float) – alpha threshold for significance of the AU-test.

  • item_dict (str, optional) – tested tree consel label, the other is considered the reference.

  • parse_name (bool, optional) – parse filename (expects SCORPiOs naming pattern)

  • wgd (str, optional) – name of the corrected wgd, to print out with the result summary

Returns

dictionary with the AU-tests results summary.

Return type

dict

scripts.trees.parse_au_test.one_file_consel(filename, alpha, item_test='1')

Parses one consel file record to obtain AU-test results. This function assumes that the AU-test was performed to compare two trees only.

By default, the tree labelled ‘1’ in consel is considered the tested tree, result indicate whether this tree is as good as (or better) as the other tree (reference).

Parameters
  • fi (str) – input filename.

  • alpha (float, optional) – alpha threshold for significance of the AU-test.

  • item (str, optional) – tested tree consel label, the other is considered the reference.

Returns

One of ‘error’, ‘rejected’, ‘equivalent lower lk’, ‘equivalent higher lk’ or ‘better sign. higher lk’.

Return type

str

Note

The function returns the ‘error’ string in two cases:

  • the input file is empty, in SCORPiOs workflow this happens when the synteny aware tree could not be built with ProfileNJ due to fastdist failing to build the distance matrix.

  • CONSEL failed to compute the log-likelihoods from the phyml or raxml site likelihood file –> logs should be checked.

scripts.trees.parse_au_test.one_file_consel_3_trees(filename, alpha, item_dict=None)

Parses a CONSEL result file for a comparison of 3 trees.

Parameters
  • filename (str) – name of CONSEL file to parse

  • alpha (float) – alpha threshold for significance of the AU-test.

  • item_dict (dict, optional) – correspondance between item in CONSEL and tree labels

Returns

One of ‘error’, ‘convergence_pb’, ‘aore rejected’, ‘lore rejected’ or ‘lore and aore rejected’.

Return type

str

scripts.trees.regraft_subtrees module

Script to re-graft corrected subtrees in their original tree and write the corrected gene trees forest.

Example:

$ python -m scripts.trees.regraft_subtrees -t trees.nhx -a alis.fa -s species_tree.nwk
-acc Accepted_trees.txt -o outtrees.nhx -anc Clupeocephala,Salmonidae
-ogr Lepisosteus.oculatus,Amia.calva_Esox.lucius [-n 1] [-tmp path/tmp] [-sa n] [-br y]
scripts.trees.regraft_subtrees.add_nhx_tags_and_rm_sp(tree, cor_leaves, moved_leaves, tag, rm_species=True)

Adds nhx tags to corrected leaves and remove species name from gene names.

Parameters
  • tree (ete3.Tree) – the whole gene tree

  • cor_leaves (list) – list of list of leaves belonging to the same corrected subtree

  • moved_leaves (list) – list of list of leaves rearranged to correct a subtree

  • tag (str) – name of the wgd, to include in the tag name

  • rm_species (bool, optional) – Whether species names should be removed

Returns

names of all added .nhx tags

Return type

set

scripts.trees.regraft_subtrees.correct_wtrees(tree, to_cor, res, tree_id, outfiles, outgroup_sp, sp_below_wgd=None, sp_current_wgd=None, tag='')

Re-graft all subtrees corrected for one WGD into the initial gene tree.

Parameters
  • tree (ete3.Tree) – the initial gene tree

  • to_cor (dict) – for each corrected subtree (identified by the outgroup gene), the path to the corrected subtree (value)

  • res (dict) – Stores a summary of applied corrections for tree

  • tree_id (int) – Index of the gene tree in the forest, used as key in res

  • outfiles (str) – path to store the gene tree after subtree re-grafting

  • outgroup_sp (dict) – for each outgroup and WGD species, a list of its sister species

  • sp_below_wgd (dict, optional) – For each subsequent wgds, the duplicated species and their outgroups

  • sp_current_wgd (list, optional) – A list of duplicated species for the WGD for which the subtrees were corrected

  • tag (str, optional) – tag to add (WGD for instance) to document corrections in outputs

Note

The res dictionary is filled in-place with a correction summary for tree.

scripts.trees.regraft_subtrees.init_worker()
scripts.trees.regraft_subtrees.multiprocess_rec_brlgth(trees, alis, ncores, modified_trees, folder_cor, sptree, prefix='cor', brlengths=True, resume=False, raxml=False)

Reconciles with the species tree and optionaly compute branch-lengths for a subset of trees in modified_trees of a gene trees forest, in parallel. Each output reconciled tree is written at outfolder/prefix_treeid.nhx.

Parameters
  • trees (str) – path to gene trees forest

  • alis (str) – path to the corresponding multiple alignments

  • ncores (int) – number of cores to use for parallel execution

  • modified_trees (dict) – gene trees to reconcile

  • folder_cor (str) – path to store each output reconciled tree file

  • sptree (str) – name of the species tree file

  • prefix (str, optional) – prefix to add to output trees

  • brlengths (bool, optional) – Whether branch-lengths should be computed

  • raxml (bool, optional) – if true, use raxml instead of treebest phyml to compute br. lengths.

scripts.trees.regraft_subtrees.topo_changes(lca, stree, leaves_to_move, outgr, authorized_sp)

Makes necessary topological changes in the initial tree to paste a corrected subtree. Genes in the corrected subtree are grouped as a clade in the new tree, while other branches are modified as less as possible.

Parameters
  • lca (ete3.TreeNode) – the original tree topology below the node containing all corrected leaves

  • stree (ete3.Tree) – the corrected subtree to re-graft

  • leaves_to_move (list of str) – genes that split genes of the corrected subtrees into several clades the original tree

  • outgr (str) – Name of the outgroup gene used to build the corrected subtree

  • authorized_sp (list of str) – list of species whose genes should remained grouped with the outgroup

Returns

The new gene tree with modified topology

Return type

ete3.Tree

scripts.trees.regraft_subtrees.worker_rec_brlgth(tree, outfolder, treeid, sptree, ali='', prefix='cor', brlengths=True, resume=False, raxml=False)

Reconciles a given gene tree with the species tree using treebest sdi, and optionaly computes branch-length using treebest phyml. Also adds .nhx tags to corrected leaves if the corrections_summary dict is provided. The output tree is written at outfolder/prefix_treeid.nhx.

Parameters
  • tree (ete3.Tree) – input tree to reconcile.

  • outfolder (str) – path to write the output

  • tree_id (str) – identifier of the tree, used in the output .nhx file name.

  • sptree (str) – name of the species tree file

  • ali (str, optional) – the fasta multiple alignment, required if branch lengths have to be computed

  • prefix (str, optional) – string to add as prefix to the output file

  • brlengths (bool, optional) – Whether branch-lengths should be computed

  • raxml (bool, optional) – if true, use raxml instead of treebest phyml to compute br. lengths.

Returns

True if no Exception was raised.

Return type

bool

scripts.trees.speciestree module

Module with functions to work with a species tree.

scripts.trees.speciestree.get_anc_order(tree_file, ancestors=None, tips_to_root=False, prune=True)

Orders input ancestors with respect to their position in the species tree. Can be ordered from root to tips (default) or tips to root.

Parameters
  • tree_file (str) – Path to the input newick formatted tree.

  • ancestors (optional, list of str) – List of ancestor names. If unspecified, all the ancestors in the trees will be returned.

Returns

ancestor names in the requested order (keys) and list of ancestors in the input list that are below it (values).

Return type

OrderedDict

scripts.trees.speciestree.get_sister_species(species_tree, species, anc)

Extracts a list of species related to a given species: species branching between species and the ancestor anc.

Parameters
  • species_tree (ete3 Tree) – ete3 tree object

  • species (str) – name of the species

  • anc (str) – name of the ancestor

Returns

species branching between species and anc

Return type

list

scripts.trees.speciestree.get_species(species_tree, anc, other_wgd_anc='', lowcov_species='')

Extracts a list of species descending from a given ancestor in a species tree. Filter out species under particular ancestors (i.e subsequent WGDs for instance) given by ‘other_WGD_anc’, as well as ‘low-coverage’ species given by ‘lowcov_species’.

Parameters
  • tree_file (str) – Path to the input newick tree.

  • out_file (str) – Path for the output file.

  • other_wgd_anc (str, optional) – Comma-delimited names of ancestors with subsequent WGDs.

  • lowcov_species (str, optional) – Comma-delimited names of ‘lowcoverage’ species to exclude.

Returns

The list of species.

Return type

species (list)

scripts.trees.speciestree.is_below(node1, node2)

Checks if node2 is below node1 in the tree topology.

Parameters
  • node1 (ete3 TreeNode) – node1

  • node2 (ete3 TreeNode) – node2

Returns

True if node2 is below node1, False otherwise.

Return type

bool

scripts.trees.speciestree.remove_anc(tree_file, out_file)

Removes any internal node name, such as ancestor names, in the input tree and writes it to a new file.

Parameters
  • tree_file (str) – Path to the input newick formatted tree.

  • out_file (str) – Path for the output file.

scripts.trees.speciestree.search_one_node(tree, node_name)

Searches for a node in the input tree given its name. Throws AssertionError if the node name is not found or is not unique.

Parameters
  • tree (ete3 Tree) – input tree

  • node_name (str) – node to search

Returns

the matched node

Return type

ete3 TreeNode

scripts.trees.utilities module

Module with functions to work with gene trees and gene alignments.

scripts.trees.utilities.delete_gaps_in_all(ali)

Removes columns of an alignment with gaps in all sequences (in-place).

Parameters

ali (dict) – dictionary storing the alignment with gene names as keys and aligned sequences as values.

Note

Throws an assertion error if the alignment is empty

scripts.trees.utilities.get_subali(ali_string, genes, d_names=None)

Extract a sub-alignment of genes of interest from a fasta alignment string and remove columns with gaps in all sequences.

Parameters
  • ali_string (str) – alignment string in fasta format.

  • genes (list) – list of genes to extract.

  • d_names (dict, optional) – dictionary to add suffix to gene names, for instance to add species. If used, all genes have to be in this mapping dictionary.

Returns

sequences sub-alignment gene names as keys and aligned sequences as values.

Return type

dict

scripts.trees.utilities.read_multiple_objects(file_object, sep='//')

Creates a generator to read a file with several entries (trees, alignments, or other…) one by one.

Parameters
  • file_object (file) – python file object of the input file

  • sep (str, optional) – the separator between entries

Yields

str – the next tree (or alignment).

scripts.trees.utilities.write_fasta(ali, outfile, d_names=None)

Writes a fasta alignment file.

Parameters
  • ali (dict) – dictionary storing the alignment {name1: seq1, name2: seq2}.

  • outfile (str) – name of the file to write.

  • d_names (dict, optional) – dictionary to transform names, for instance to add species. If used, all genes have to be in this dictionary.

scripts.trees.utilities.write_forest(inforest, outforest, corrections, current_wgd='', cor_treefiles='', save_single_treefile=False)

Writes a gene tree forest after applying corrections to some gene trees, as described in corrections. Browses the input gene trees forest and writes, for each gene tree, either its unmodified input version or its corrected version if listed in corrections`.

Parameters
  • inforest (str) – Name of the .nhx file with the input gene tree forest

  • outforest (str) – Name of the output .nhx file for the corrected gene tree forest

  • corrections (dict) – For each corrected tree (described by its index in the forest) a list of 5-elements tuples describing applied corrections: name of wgd, corrected tree file, + 3 correction descriptors If current_wgd is not used, corrections can simply be the list of corrected trees, but cor_treefiles has to be specified.

  • current_wgd (str, optional) – If the forest is being corrected for one particular wgd, use corrections applied to this wgd. If not used, corrections apply to all trees in corrections but cor_treefiles has to be specified.

  • cor_treefiles (str, optional) – Path to corrected trees to use if current_wgd is not specified.

  • save_single_treefile (bool, optional) – Whether individual original input trees should be written to file and the individual corrected tree file kept.