LORelEi usage instructions

SCORPiOs LORelEi is implemented as an extension to the main SCORPiOs pipeline, from which it depends as a snakemake subworkflow. LORelEi can be run after a previously completed SCORPiOs job or be directly invoked, in which case the main SCORPiOs job will be automatically run first (except for more advanced usage, see the Example 3 below).

In all cases, you will need to prepare two configuration files: one for the main SCORPiOs job and a second for LORelEi. Note, however, that the configuration file for LORelEi is a lot simpler than the one for SCORPiOs, and that it requires no additional input data.

Warning

LORelEi was introduced in SCORPiOs v2.0.0. The implementation of subworkflows required a more recent version of Snakemake, which was updated from version 5.5.4 to 6.6.1 in SCORPiOs conda environment. Thus, users having a previous version of the environment need to update it (as explained here). The update of Snakemake also implies that additional command-line arguments are required to run SCORPiOs, see the updated usage instructions.

Running SCORPiOs LORelEi on example data

An example configuration file for SCORPiOs LORelEi is provided: config_example2_lorelei.yaml. This configuration file executes SCORPiOs LORelEi on toy example data. We explain how to format your own configuration file in the next chapter (see LORelEi configuration file).

As introduced in the previous chapter, two modes are available to run SCORPiOs LORelEi: diagnostic (Example 1) and likelihood_tests (Example 2). We also show how to use SCORPiOs iterative correction in conjonction with LORelEi (Example 3).

Important

Remember that you should go to the SCORPiOs root folder and activate the conda environment with the command conda activate scorpios before running SCORPiOs and/or LORelEi.

Example 1: SCORPiOs LORelEi diagnostic mode

To run SCORPiOs LORelEi on example data in diagnostic mode:

snakemake -s scorpios_lorelei.smk --configfile config_example2_lorelei.yaml --use-conda --cores 4 --scheduler=greedy

Outputs for the main SCORPiOs job should be generated in SCORPiOs_example2/, while LORelEi results are stored in SCORPiOs_LORelEi_example2/.

The following LORelEi output figures should be generated: SCORPiOs-LORelEi_example2/diagnostic/seq_synteny_conflicts_by_homeologs.svg and SCORPiOs-LORelEi_example2/diagnostic/seq_synteny_conflicts_on_genome.svg.

Example 2: SCORPiOs LORelEi likelihood_tests mode

If you ran the example in diagnostic mode previously, the SCORPiOs main job will not be re-run and LORelEi will re-use pre-computed outputs from the SCORPiOs_example2/ folder.

The example configuration file contains all necessary arguments to also run LORelEi in likelihood_tests mode, you only need to update the mode parameter, either in the configuration file or directly in the command-line as follows:

snakemake -s scorpios_lorelei.smk --configfile config_example2_lorelei.yaml --config mode=likelihood_tests --use-conda --cores 4 --scheduler=greedy

Since the configuration is very simple here, you can even omit the configuration file and only provide the three only required arguments in the command-line:

snakemake -s scorpios_lorelei.smk --config scorpios_config=config_example2.yaml mode=likelihood_tests dup_genome=Oryzias.latipes --use-conda --cores 4 --scheduler=greedy

The following LORelEi outputs should be generated: SCORPiOs-LORelEi_example2/lktests/lore_aore_on_genome.svg (figure) and SCORPiOs-LORelEi_example2/lktests/lore_aore_summary.tsv (summary of LORe and AORe gene families).

Example 3: SCORPiOs iterative and LORelEi

To run LORelEi in conjonction with SCORPiOs iterative gene tree correction, you will need to run SCORPiOs iterative correction first and then LORelei, specifying the iteration you want to analyze sequence-synteny conflicts on. We recommend using iteration 1 (or 2) of an iterative run for LORelEi, since the number of gene trees considered for correction by SCORPiOs - and thus by LORelEi afterwards - typically decreases a lot in later iterations.

bash iterate_scorpios.sh --snake_args="--configfile config_example2.yaml --cores 4 --scheduler=greedy"
snakemake -s scorpios_lorelei.smk --configfile config_example2_lorelei.yaml --config iter=1 --use-conda --cores 4 --scheduler=greedy

The following LORelEi outputs should be generated: SCORPiOs-LORelEi_example2/diagnostic/seq_synteny_conflicts_by_homeologs.svg and SCORPiOs-LORelEi_example2/diagnostic/seq_synteny_conflicts_on_genome.svg. You can change the jname parameter to not overwrite previous results (see LORelEi configuration file).

Running SCORPiOs LORelEi on your data

Like for SCORPiOs, you have to create a new configuration file to run LORelEi on your own data. You can use the example configuration file as a guide to write your own (see LORelEi configuration file) and then run:

snakemake -s scorpios_lorelei.smk --configfile config_lorelei.yaml --use-conda --cores 4 --scheduler=greedy