Usage instructions¶
Important
Before using SCORPiOs, you should go to the SCORPiOs root folder and activate the conda environment with the command conda activate scorpios
.
Running SCORPiOs on example data¶
We recommend running a test with our example data to ensure that installation was successful and to get familiar with the pipeline, inputs and outputs.
SCORPiOs uses a YAML configuration file to specify inputs and parameters for each run. An example configuration file is provided: config_example.yaml. This configuration file executes SCORPiOs on toy example data, that you can use as reference for input formats. We explain how to format your own configuration file and input files in more details in the next chapter (see Data file formats and Configuration file).
Here, we present the main commands to run SCORPiOs.
Example 1: Simple SCORPiOs run¶
The only required Snakemake arguments to run SCORPiOs are --configfile
, the --use-conda
flag and the --scheduler=greedy
option. You also need to specify the number of threads via --cores
. For more advanced options, you can look at the Snakemake documentation.
To run SCORPiOs on example data, go to the SCORPiOs root folder and run:
snakemake --configfile config_example.yaml --use-conda --cores 4 --scheduler=greedy
The following output should be generated: SCORPiOs_example/SCORPiOs_output_0.nhx
.
To separate stdout and stderr (recommended, as SCORPiOs writes statistics on key steps of the workflow to the standard output):
snakemake --configfile config_example.yaml --use-conda --cores 4 --scheduler=greedy >out 2>err
Example 2: Iterative SCORPiOs run¶
SCORPiOs can run in iterative mode: SCORPiOs improves the gene trees a first time, and then uses the corrected set of gene trees again as input for a new correction run, until convergence. Correcting gene trees improves orthologies accuracy, which in turn makes synteny conservation patterns more informative, improving the gene tree reconstructions after successive runs. Usually, a small number of iterations (2-3) suffice to reach convergence.
To run SCORPiOs in iterative mode on example data, execute the wrapper bash script iterate_scorpios.sh
as follows:
bash iterate_scorpios.sh --snake_args="--configfile config_example.yaml --cores 4 --scheduler=greedy" > out 2>err
The following output should be generated: SCORPiOs_example/SCORPiOs_output_2_with_tags.nhx
.
Command-line arguments for iterate_scorpios.sh
¶
Required:
- --snake_args=snakemake_arguments
Snakemake arguments, should at minimum contain
--configfile
,--cores
and--scheduler=greedy
.
Optional:
- --max_iter=maxiter
Maximum number of iterations to run (default=5).
- --min_corr=mincorr
Minimum number of corrected subtrees to continue to the next iteration (default=1).
- --starting_iter=iter
Starting iteration, to resume a run at a given iteration (default=1).
Running SCORPiOs on your data¶
To run SCORPiOs on your data, you have to create a new configuration file for your SCORPiOs run. You will need to format your input data adequately and write your configuration file, using the provided example config_example.yaml as a guide.
Copy the example config file
cp config_example.yaml config.yaml
Open and edit
config.yaml
to specify paths, files and parameters for your data
To check your configuration, you can execute a dry-run with -n
.
snakemake --configfile config.yaml --use-conda -n
Finally, you can run SCORPiOs as described above:
snakemake --configfile config.yaml --use-conda --cores 4 --scheduler=greedy
or in iterative mode:
bash iterate_scorpios.sh --snake_args="--configfile config.yaml --cores 4 --scheduler=greedy"