Intermediary outputs

Beyond description statistics printed to the standard output and final corrected trees, you may want to investigate step-by-step results of SCORPiOs for one or several specific gene families.

Important

A gene family in SCORPiOs consists of a non-duplicated outgroup gene and all potential orthologous gene copies in WGD-duplicated species, based on the uncorrected gene trees. For each family, SCORPiOs computes a synteny-derived orthology graph, then a constrained tree topology based on synteny, and finally, if necessary, a synteny-aware corrected tree. Through each of these steps, a gene family is identified by the outgroup gene name.

Tip

Several suffixes such as the name of the corrected WGD, the outgroup species and SCORPiOs iteration number are added to each output file, in order to precisely identify outputs, even in case of complex configurations.

Comprehensive list of orthologs

The orthology relationships between genes of duplicated species and outgroup are stored in a single file, whose name starts with Homologs_, and located inside the Families/ sub-folder. This table retains all gene copies since the ingroup/outgroup speciation node, as well as any other homologs with a loosely similar syntenic context.

In the example run, one such file is:

SCORPiOs_example/Families/Homologs_Salmonidae_Esox.lucius_0.

The three first columns of the file describe outgroup genes (chromosome, index of the gene on the chromosome, gene name). Other columns gives the predicted orthologous genes in duplicated species, with the same information.

Pairwise synteny orthology predictions

Raw synteny-predicted orthologies amongst duplicated species are stored in a single file, whose name starts with Sorted_SyntenyOrthoPred_, and located in Synteny/.

In the example run, one such file is:

SCORPiOs_example/Synteny/Sorted_SyntenyOrthoPred_Salmonidae_Esox.lucius_0.gz.

It is a 4-columns gunzipped (.gz) file, giving orthologous genes predicted between duplicated species, after the pairwise synteny analysis. The first columns shows a gene in a duplicated species 1, the second gives its predicted ortholog in duplicated species 2, the third gives the associated \({\Delta}S\) synteny score and the fourth the outgroup gene name.

Note

In this file, species names are appended to the gene names.

Orthogroups in synteny graphs

Predicted orthogroups based on community detection in synteny graphs are stored in a single file (GraphsOrthogroups_) in Graphs/, along with a summary of the community detection step (Summary_).

In the example run, the following file gives predicted orthogroups:

SCORPiOs_example/Graphs/GraphsOrthogroups_Clupeocephala_Lepisosteus.oculatus_0.

The first column gives the name of the outgroup gene with an appended “a” or “b” letter to uniquely identify each the two post-WGD orthogroups. Other columns gives the duplicated species gene members.

In addition, SCORPiOs_example/Graphs/Summary_Clupeocephala_Lepisosteus.oculatus_0 is a simple 3-columns table describing the community detection step. The outgroup gene is indicated in the first column, followed by the algorithm used for community detection and the number of graph edges removed in the second and third columns, respectively.

Subtree corrections

Correction summary

The Corrections/ folder stores two files, one detailing trees vs synteny consistency and another with the list of successfully corrected subtrees.

In the example run, the following file gives an inconsistency summary (with respect to the Clupeocephala WGD):

SCORPiOs_example/Corrections/Trees_summary_Clupeocephala_0.

In addition, the following file lists corresponding accepted corrections:

SCORPiOs_example/Corrections/Accepted_Trees_Clupeocephala_0.

Subtree corrections (additional)

Additional files can be saved if specified in the configuration file (see the configuration keyword save_subtrees_lktest).

Constrained tree topologies

Constrained tree topologies are stored in the Trees/ctrees_0/ folder (Trees/ctrees_i/ for each iteration i in iterative mode).

In the example, one constrained topology file is:

SCORPiOs_example/Trees/ctrees_0/Clupeocephala/C_102697250_Lepisosteus.oculatus.nh

This file gives the constrained tree topology for the gene family identified by the outgroup gene 102697250_Lepisosteus.oculatus, in the newick format.

profileNJ and TreeBeST solutions

Synteny-aware trees built with ProfileNJ (an extension of the PolytomySolver package) and TreeBeST phyml, using the constrained tree topology, are stored in the Corrections/PolyS_0/ and Corrections/TreeB_0/ folders, respectively.

In the example, one ProfileNJ tree file is:

SCORPiOs_example/Corrections/PolyS_0/Clupeocephala/102697250_Lepisosteus.oculatus.nh.

Trees are in Newick format.

Note

SCORPiOs does not build a TreeBeST tree if the ProfileNJ solution is accepted. In this case, TreeBeST tree files will be empty.

Likelihood AU-tests

Output of the likelihood AU-tests are stored in the Corrections/Res_polylk_0/ and Corrections/Res_treeBlk_0/ folders. These are direct outputs from the CONSEL software.

In the example, the following file gives AU-test likelihood tests results for the original subtree vs the corresponding synteny-aware tree resolved with profileNJ:

SCORPiOs_example/Corrections/Res_polylk_0/Clupeocephala/Res_102697250_Lepisosteus.oculatus.txt

Similarly, files in the SCORPiOs_example/Corrections/Res_treeBlk_0/Clupeocephala/ stores comparisons of original subtree vs TreeBeST phyml solution.

Note

AU-test result files for TreeBeST solutions will be empty if the profileNJ solution was accepted.