Usage instructions¶

Important

Before using SCORPiOs, you should go to the SCORPiOs root folder and activate the conda environment with the command conda activate scorpios.

Running SCORPiOs on example data¶

We recommend running a test with our example data to ensure that installation was successful and to get familiar with the pipeline, inputs and outputs.

SCORPiOs uses a YAML configuration file to specify inputs and parameters for each run. An example configuration file is provided: config_example.yaml. This configuration file executes SCORPiOs on toy example data, that you can use as reference for input formats. We explain how to format your own configuration file and input files in more details in the next chapter (see Data file formats and Configuration file).

Here, we present the main commands to run SCORPiOs.

Example 1: Simple SCORPiOs run¶

The only required Snakemake arguments to run SCORPiOs are --configfile, the --use-conda flag and the --scheduler=greedy option. You also need to specify the number of threads via --cores. For more advanced options, you can look at the Snakemake documentation.

To run SCORPiOs on example data, go to the SCORPiOs root folder and run:

snakemake --configfile config_example.yaml --use-conda --cores 4 --scheduler=greedy

The following output should be generated: SCORPiOs_example/SCORPiOs_output_0.nhx.

To separate stdout and stderr (recommended, as SCORPiOs writes statistics on key steps of the workflow to the standard output):

snakemake --configfile config_example.yaml --use-conda --cores 4 --scheduler=greedy >out 2>err

Example 2: Iterative SCORPiOs run¶

SCORPiOs can run in iterative mode: SCORPiOs improves the gene trees a first time, and then uses the corrected set of gene trees again as input for a new correction run, until convergence. Correcting gene trees improves orthologies accuracy, which in turn makes synteny conservation patterns more informative, improving the gene tree reconstructions after successive runs. Usually, a small number of iterations (2-3) suffice to reach convergence.

To run SCORPiOs in iterative mode on example data, execute the wrapper bash script iterate_scorpios.sh as follows:

bash iterate_scorpios.sh --snake_args="--configfile config_example.yaml --cores 4 --scheduler=greedy" > out 2>err

The following output should be generated: SCORPiOs_example/SCORPiOs_output_2_with_tags.nhx.

Command-line arguments for `iterate_scorpios.sh`¶

Required:

--snake_args=snakemake_arguments: Snakemake arguments, should at minimum contain --configfile, --cores and --scheduler=greedy.

Optional:

--max_iter=maxiter: Maximum number of iterations to run (default=5).
--min_corr=mincorr: Minimum number of corrected subtrees to continue to the next iteration (default=1).
--starting_iter=iter: Starting iteration, to resume a run at a given iteration (default=1).

Running SCORPiOs on your data¶

To run SCORPiOs on your data, you have to create a new configuration file for your SCORPiOs run. You will need to format your input data adequately and write your configuration file, using the provided example config_example.yaml as a guide.

Copy the example config file cp config_example.yaml config.yaml
Open and edit config.yaml to specify paths, files and parameters for your data

To check your configuration, you can execute a dry-run with -n.

snakemake --configfile config.yaml --use-conda -n

Finally, you can run SCORPiOs as described above:

snakemake --configfile config.yaml --use-conda --cores 4 --scheduler=greedy

or in iterative mode:

bash iterate_scorpios.sh --snake_args="--configfile config.yaml --cores 4 --scheduler=greedy"