scripts.synteny package

scripts.synteny.duplicated_families module

Script to find all orthology relationships between a group of WGD duplicated species and a non duplicated outgroup. These ortholog groups define gene families.

Example:

$ python -m scripts.synteny.duplicated_families -t forest_v89.nhx -n Lepisosteus.oculatus
-d Clupeocephala -s Species_tree_v89.nwk -g genes89/genesST.%s.list.bed [-o out]
[-ow anc1,anc2] [-u ufile]
scripts.synteny.duplicated_families.get_genes_positions(genes, species, dict_genes)

Gets genomic position of given genes of a species.

Parameters
  • sp (str) – input species name

  • genes (list of str) – list of the genes to search

  • dict_genes (dict of str to GeneSpeciesPosition tuples) – genes location

Returns

genes and their position as a list of GeneSpeciesPosition tuples

Return type

list

scripts.synteny.duplicated_families.orthologies_with_outgroup(forest, duplicated_sp, outgroup, dict_genes, out)

Browses a gene tree forest and searches for orthologs with the outgroup. Writes genes without phylogenetic orthologs to a file. Also writes files with high-confidence orthologs and paralogs to use to otpimize the synteny support threshold to call orthology.

Parameters
  • forest (str) – name of the gene trees forest file

  • duplicated_sp (list of str) – list of all duplicated species for the considered WGD

  • outgroup (str) – non-duplicated outgroup

  • dict_genes (dict of GeneSpeciesPosition tuples) – all gene positions for each species

  • out (str) – output file to write genes without phylogenetic orthologs

Returns

orthologs of outgroup genes in each duplicated species

Return type

dict

Note

#FIXME Written to work within scorpios as orthologs and paralogs file names are derived from output file patterns, assuming it contains an ‘_’.

scripts.synteny.duplicated_families.print_out_stats(stats_dict, wgd='')

Prints to stdout some statistics on the families in the phylogenetic Orthology Table.

Parameters
  • stats_dict (dict) – a dict counting number of families and genes in the families

  • wgd (str, optional) – the wgd for which the Orhtology Table was built

scripts.synteny.duplicated_families.tag_duplicated_species(leaves, duplicated)

Adds a tag to genes of duplicated species in an ete3.Tree instance, in-place.

Parameters
  • leaves (list of ete3.TreeNode) – leaves of the tree

  • duplicated (list of str) – list of the names of all duplicated species

scripts.synteny.duplicated_families.write_orthologs(orthos, dicgenomes, dict_genes, outgroup, duplicated_sp, out, min_length=20)

Writes to a file gene orthologies between the non-duplicated species and all duplicated species (orthologytable), with all gene names and gene positions. All these gene families are ordered along the outgroup genome in the output.

Parameters
  • orthos (dict of str to str to GeneSpeciesPosition tuples) – orthologs of outgroup genes in each duplicated species

  • dicgenomes (dict of str to mygenome.Genome) – genomes

  • dict_genes (dict of str to GeneSpeciesPosition tuples) – genes location

  • outgroup (str) – non-duplicated outgroup

  • duplicated_sp (list of str) – list of duplicated species to include in the results

  • out (str) – output file name for genes without orthologs

  • min_length (int, optional) – minimum length for a chromosome in the outgroup, gene families mapping to smaller chromosomes won’t be included

scripts.synteny.f1_score_optimization module

This script loads 2 scores distributions and finds the optimal discriminative threshold to separate distributions based on the F1-score, assuming true positives to recover are in the distribution of higher scores.

Inputs are python lists pickled in files, output is written to file with the --support prefix, to call the script missed_orthologies.py in snakemake with the --support arg.

Example:

$ python -m scripts.synteny.f1_score_optimization -i1 scores_1.pkl -i2 scores_2.pkl
[-out out]
scripts.synteny.f1_score_optimization.compute_f1(scores1, scores2, threshold)

Computes the F1-score for a given threshold.

Parameters
  • scores1 (list) – list of scores 1

  • scores2 (list) – list of scores 2

  • threshold (float) – threshold value

Returns

F1-score

Return type

float

scripts.synteny.f1_score_optimization.get_discriminant_threshold(input1, input2, test_range=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

Finds the most discriminative threshold between the two distributions based on F1-score.

Parameters
  • input2 (input1,) – paths to the pickled objects

  • test_range (list, optional) – list of thresholds to test

Returns

optimized threshold based on F1-score

Return type

int

scripts.synteny.f1_score_optimization.load_scores(input1, input2)

Unpickles the lists of scores.

Parameters
  • input1 (str) – paths to the pickled object 1

  • input2 (str) – paths to the pickled object 2

Returns

a tuple containing:

scores1, scores2: the unpickled lists

Return type

tuple

scripts.synteny.filter_no_synteny_genes module

This script identifies genes in the orthology table that never, in any of their sliding windows, have genes on the same chromosome in the orthology table. A new orthology table is written as output, where genomic posistion of these genes is omitted, which forces SCORPiOs other scripts to not use them in the synteny analysis.

Example:

$ python -m scripts.synteny.filter_no_synteny_genes -i OrthoTable.txt -chr Chr_outgr_file
[-o out] [-w 15]
scripts.synteny.filter_no_synteny_genes.print_out_stats(stats_dict, wgd='')

Prints to stdout some statistics on the genes without syteny support that will be ignored in scorpios synteny analysis.

Parameters
  • stats_dict (dict) – a dict with the number of filtered genes per species

  • wgd (str, optional) – the wgd for which the filter was run

scripts.synteny.filter_regions module

Module with functions to extract gene families having updated synteny information in SCORPiOs iteration n versus iteration n-1.

scripts.synteny.filter_regions.get_genes_to_keep(orthotable, modified_fam, windowsize)

Extracts all families with updated synteny information after a SCORPiOs iteration, i.e. all families within the same window as a modified family.

Parameters
  • orthotable (dict) – gene families at iteration n

  • modified_fam (dict) – modified gene families

Returns

a tuple containing:

dict: for each chromosome, families with updated synteny information

list: flat list of updated families (outgroup gene name)

Return type

tuple

scripts.synteny.filter_regions.get_modified_families(orthotable, orthotable_prev, corrected_fam, mapping_fam=None)

For OrthologyTables of two successive SCORPiOs iterations, find families with updated homologies in iteration n compared to iteration n-1.

Updated families are either (i) a corrected tree or (ii) an outgroup gene in iteration n without duplicated species orthologs in iteration n-1.

Parameters
  • orthotable (dict) – gene families at iteration n

  • orthotable_prev (dict) – gene families at iteration n-1

  • corrected_fam (list) – list of corrected families

  • mapping_fam (dict, optional) – when multiple outgroups are used, a dictionary with correspondence of families ids across outgroups, useful when a tree was corrected using an other outgroup

Returns

modified gene families

Return type

dict

scripts.synteny.filter_regions.make_region_file(orthotable_file, orthotable_file_previous, corrections_file, outfile, win=15, file_fam_no_graph='', wgd='', file_combin_graphs='')

Builds and writes a file with gene families having updated synteny information in SCORPiOs iteration n versus iteration n-1.

Parameters
  • orthotable_file (str) – file with gene families at iteration n

  • orthotable_file_prev (str) – file with gene families at iteration n-1

  • corrections_file (str) – file with corrected families

  • outfile (str) – name of the output file.

  • win (int) – side of SCORPiOs sliding window for synteny orthology predictions

  • file_fam_no_graph (str, optional) – file with families that can’t result in a synteny graph

  • wgd (str, optional) – the wgd for which the Orhtology Table was built

  • file_combin_graphs (str, optional) – summary file of graphs combination across outgroups

scripts.synteny.filter_regions.print_out_stats(fam_up, file_fam_no_graph='', wgd='')

Prints to stdout some statistics on the families with updated synteny.

Parameters
  • modified_fam (dict) – a dict listing for each outgroup chromosome, the updated families

  • file_fam_no_graph (str, optional) – file with families that can’t result in a synteny graph

  • wgd (str, optional) – the wgd for which the Orhtology Table was built

scripts.synteny.filter_regions.read_authorized_regions(region_file, chrom, windowsize)

Reads a file with gene families having updated synteny information in SCORPiOs iteration n versus iteration n-1. Returns a list of regions, which are bounds for windows to be considered in iteration n and a list of genes which are genes for which orthologies can be updated. This two list differ in the fact that gene can be in a considered window without having updated synteny information.

Parameters
  • region_file (str) – input file

  • chrom (str) – outgroup chromosome considered

  • windowsize (int) – size of the sliding window

Returns

a tuple containing:

regions (list of tuple): list of regions, as tuples (start_index, stop_index), corresponding to index in the OrthologyTable.

genes (list of str): list of genes with updated synteny information

Return type

tuple

Note

If the region_file is empty, regions is set to None (and we don’t filter regions in SCORPiOs main). If the regions file is not empty, but no family has an updated synteny context, regions is set to [(0, 0)] i.e. no window will be computed on this chromosome.

scripts.synteny.filter_regions.read_combin_file(file_combin_graphs)

Reads the summary of graphs combination across outgroups. Corrected subtrees with another outgroup should be marked as an updated family for all outgroups.

Parameters

file_combin_graphs (str) – input summary file

Returns

for each gene in the current outgroup, the corresponding selected graph if from another outgroup

Return type

dict

scripts.synteny.filter_regions.write_regions_file(fam_to_keep, outfile)

Writes a file with families that have updated synteny information after a SCORPiOs iteration.

Parameters
  • fam_to_keep (dict) – for each outgroup chromosome, families with updated synteny info.

  • outfile (str) – name of the output file.

scripts.synteny.missed_orthologies module

This script finds potential orthologs between an outgroup and duplicated species based on synteny, for genes without obvious orthologs in trees.

Example:

$ python -m scripts.synteny.missed_orthologies -i Orthotable -u UncertainGenes -c Chroms
[-o output] [-wgd ''] [-w 0] [-f out]
scripts.synteny.missed_orthologies.find_synteny_orthologs(input_file, optimize=False, threshold=2.0, opt_fam=None)

Browses ingroup genes without phylogenetic orthologs in the outgroup and attempts to find synteny-supported orthologs

Parameters
  • input_file (str) – name of the input file storing genes without orthologs in ingroups

  • optimize (bool, optional) – option to use if the script is called to optimize the threshold

  • threshold (float, optional) – synteny support threshold

  • opt_fam (list, optional) – if defined, restricts fmailies to use for optimization to the ones in this list

Returns

identified synteny-supported orthologies, stored in nested dict with, for each outgroup gene with newly identified ortholog(s) (GeneSpeciesPosition tuple, key1) and for each duplicated species with such ortholog(s) (str, key2), orthologous gene as GeneSpeciesPosition tuple(s).

Return type

dict

scripts.synteny.missed_orthologies.load_genes(genes, outgr=False)

Parses an entry in the “no phylogenetic ortholog” file and loads genes as GeneSpeciesPosition namedTuples.

Parameters
  • genes (str) – a line of the input file

  • outgr (bool) – Whether entry of ingroups (True) or outgroup should be parsed (False)

Returns

a tuple containing:

dict_genes (dict): for each species (key), genes in the entry (value) as a GeneSpeciesPosition namedtuple

unplaced_genes (dict): stores genes with no gene position entry in the .bed file in a dict of similar structure as dict_genes

Return type

tuple

scripts.synteny.missed_orthologies.neighbour_outgr_ortholog(ortho_neighbours, all_outgroup_candidates)

Searches for syntenic neighbours between ingroup and outgroup genes. Gene neighbouring ingroup genes have their orthologs in the outgroup stored in ortho_neighbours. This function searches if ortho_neighbours are in the neighbourhood of an outgroup gene all_outgroup_candidates (genes in the same tree as ingroup genes).

Parameters
  • ortho_neighbours (list) – list of orthologs of neighbours of ingroup genes, as tuples (chromosome, index)

  • all_outgroup_candidates (list) – all outgroup genes in the same tree, as a list of GeneSpeciesPosition tuples.

Returns

list of outgroup genes in the same tree with at least one syntenic neighbour, with repetitions. The number of repetitions indicates the number of syntenic neighbours. For instance, [gene_a, gene_a, gene_b, gene_a, gene_a] indicates that gene a has four syntenic neighbours with ingroup genes and gene_b one.

Return type

list

scripts.synteny.missed_orthologies.print_out_stats(stats_dict, wgd='', file_fam_nograph='out_nog')

Prints to stdout some statistics on the families in the final Orthology Table.

Parameters
  • stats_dict (dict) – a dict counting number of families and genes in the families

  • wgd (str, optional) – the wgd for which the Orhtology Table was built

  • file_fam_nograph (str, optional) – file to write families that can’t result in a graph (won’t be in a large enough window or has too few genes)

scripts.synteny.missed_orthologies.search_closest_neighbours(ingroup_genes, dup_sp, all_genefam, all_outgroup_candidates)

Extracts orthologs, in the outgroup species, of genes in the neighbourhood of genes without phylogenetic orthologs in species dup sp .

Parameters
  • ingroup_genes (dict) – a clade of ingroup genes without phylogenetic orthologs, as a dict, giving, for each species, a list of GeneSpeciesPosition tuples.

  • dup_sp (str) – name of the considered duplicated species.

  • all_genefam (nested dict) – Pre-computed orthology table based on phylogenetic orthologs used to search for syntenic neighbours, represented by a nested dict, giving for each outgroup chromosome (key1) and each duplicated species (key2), a list of GeneFamily objects.

  • all_outgroup_candidates (list) – all outgroup genes in the same tree, as a list of GeneSpeciesPosition tuples.

Returns

a tuple containing:

ortho_neighbours (list): a list of orthologs of ingroup genes in the outgroup, in a tuple (chromosome, gene index)

skip (bool): If True, we should not use dup_sp to search for syntenic neighbours because one neighbour is orthologous to another outgroup gene in the same tree (i.e history of tandem duplication which will artefactually inflate the number of syntenic neighbours). Conservation of synteny in the case of tandem duplication is not a proof for orthology.

Return type

tuple

scripts.synteny.mygenome module

Module with functions to load a genome from a .bed (or a .bz2 in DYOGEN format) gene file.

class scripts.synteny.mygenome.ContigType

Bases: enum.Enum

Enum grouping all possible values describing the type of a contig.

Chromosome = 'Chromosome'
Mitochondrial = 'Mitochondrial'
Random = 'Random'
Scaffold = 'Scaffold'
class scripts.synteny.mygenome.Gene(chromosome, beginning, end, names)

Bases: tuple

property beginning

Alias for field number 1

property chromosome

Alias for field number 0

property end

Alias for field number 2

property names

Alias for field number 3

class scripts.synteny.mygenome.GenePosition(chromosome, index)

Bases: tuple

property chromosome

Alias for field number 0

property index

Alias for field number 1

class scripts.synteny.mygenome.Genome(fichier, file_format)

Bases: object

Object representing genomic position of genes in a species, as loaded from a .bed (or in DYOGEN format) gene file. Can load bzipped (.bz2) files.

name

name of the input gene file

Type

str

genes_list

For each chromosome (key), a list of Gene namedtuples.

Type

dict

chr_list

For each ContigType (key), list of chromosomes with this type (value).

Type

dict

dict_genes

For each gene name (key), its position in a GenePosition namedtuple.

Type

dict

add_gene(names, chromosome, beg, end)

Adds a gene to the genes_list.

Parameters
  • names (list) – list of gene names

  • chromosome (str) – chromosome name

  • end (beg,) – start and end positions of the gene

init_other_attributes()

Inits the genes and chromosomes dictionaries.

scripts.synteny.mygenome.contig_type(chr_name)

Deduces the type of a contig from its name.

Arg:

chr_name (str): Name of the contig

Returns

The type of the contig, either Chromosome, Mitochondrial, Scaffold or

Random

Return type

ContigType object

scripts.synteny.mygenome.is_bz2(filename)

Checks if file extension is bz2 (looks at file extension only, not its encoding, could be improved).

Arg:

filename (str): input file name

Returns

boolean: True if extension is bz2, False otherwise.

scripts.synteny.mygenome.toint(chr_name)

Converts the input to an integer, if possible. Otherwise leave the name unchanged, as str.

Parameters

chr_name (str) – String to convert, for instance a chromosome name.

Returns

Converted input if possible, input otherwise

Return type

int or str

scripts.synteny.pairwise_orthology_synteny module

This script uses synteny conservation patterns to predict orthologous gene pairs in 2 wgd-duplicated species.

Example:

$ python -m scripts.synteny.pairwise_orthology_synteny -i OrthoTable.txt
-p Oryzias.latipes_Danio.rerio -chr LG1 -ortho TreesOrthologies/ [-o out] [-w 15]
[-cutoff 0] [-filter None]
scripts.synteny.pairwise_orthology_synteny.find_best_threading(dup_seg_sp1, dup_seg_sp2, tree_orthos)

For all threading possibilities for duplicated segments in sp1 and sp2, finds the most parsimonious scenario.

Parameters
  • dup_seg_sp2 (dup_seg_sp1,) – duplicated segments in sp1 and sp2

  • tree_orthos (dict) – Orthologous gene pairs in sp1 and sp2, defined from molecular evolution

Returns

a tuple containing:

best (tuple): most parsimonious threading scenario for sp1 and for sp2

s_max (float): corresponding synteny similarity score (delta score)

Return type

tuple

scripts.synteny.pairwise_orthology_synteny.load_tree_orthologies(orthology_file, rev=False)

Loads orthologies from a tabulated-separated orthology files giving pre-computed orthologous gene pairs in species1 and species2, based on molecular sequence evolution.

Parameters
  • orthology_file (str) – name of the input orthology file

  • rev (bool, optional) – should species in column 1 and 2 be inverted (i.e use sp2 genes as dict keys)

Returns

for each gene in sp1 (keys), a list of orthologous genes in sp2 (values), resp. sp2 and sp1 if rev is True.

Return type

dict

scripts.synteny.pairwise_orthology_synteny.synteny_orthology_prediction(orthotable, sp1, sp2, chrom, tree_orthos, res_orthologies, win_size=15, cutoff=0, regions=None)

Compares synteny similarity of duplicated segments stored in the Orthology Table, for sp1 and sp2, using a sliding window on chromosomes `chrom of the outgroup. Gene pairs in similar syntenic context are predicted orthologs.

Parameters
  • orthotable (str) – Name of the file with the Orthology Table

  • sp2 (sp1,) – Name of compared duplicated species

  • chrom (str) – Name of the outgroup chromosome

  • tree_orthos (dict) – Orthologous gene pairs in sp1 and sp2, defined from molecular evolution

  • res_orthologies (dict) – dict to store results

  • win_size (int, optional) – Size of the sliding window to browse the orthology table

  • cutoff (int, optional) – cutoff on synteny similarity delta scores to predict orthology

  • regions (list, optional) – List of regions on the outgroup chromosome to restrict the analysis on

Returns

Synteny-predicted orthologous gene pairs

Return type

dict

scripts.synteny.pairwise_orthology_synteny.write_orthologies(out, all_orthologies, sp1, sp2, filter_genes=None)

Writes synteny-predicted orthologies to file.

Parameters
  • out (str) – name of the output file

  • all_orthologies (dict) – Synteny-predicted orthologous gene pairs

  • sp2 (sp1,) – Name of compared duplicated species

  • filter_genes (list of str, optional) – Restricted list of gene families to write (restrict the orthology prediction to some families)

scripts.synteny.syntenycompare module

Module with functions to compare duplicated segments in 2 duplicated species.

class scripts.synteny.syntenycompare.DupSegments(family_ids, chromosomes, matrix, genes_dict)

Bases: object

Object to represent a list of GeneFamilies (i.e entries of a given duplicated species in a window of the OrthologyTable). This object is used to perform duplicated segment threading and synteny similarity comparisons in pairs of duplicated species.

family_ids

Names of gene families, given by the outgroup gene in the OrthologyTable

Type

list of str

chromosomes

Names of genomic segments with a gene copy in the duplicated species

Type

list of str

matrix

A binary matrix, representing absence/presence of a duplicated gene copy in each genomic segment. Columns are genomic segments, with order given in list (ii). Rows are gene families.

Type

numpy.array

genes_dict

For each ‘1’ in the matrix, the corresponding duplicated species gene name(s)

Type

dict

all_reduce_in_two_blocks()

Gives a list of all possible ways to thread duplicated segments together.

Returns

all possible threadings.

For instance, a threading given by [[0, 2], [1, 3]] means segments 0 and 2 threaded together to form an ancestrally duplicated region track1 and segment 1 and 3 threaded together to form track2.

Return type

nested list

get_score(dup_seg_sp2, tree_orthos, threadingsp1, threadingsp2)

Computes the two delta scores between tracks of threaded duplicated segments in 2 species.

Parameters
  • dup_seg_sp2 (DupSegments) – Corresponding duplicated segments in species 2

  • tree_orthos (dict) – Orthologous gene pairs in sp1 and sp2, defined from molecular evolution

  • threadingsp1 (nested list) – duplicated segment threading for species 1

  • threadingsp2 (nested list) – duplicated segment threading for each species 2

Returns

tuple of 2 floats, delta score based on the ‘pattern of retentions and losses’ and delta score based on ‘syntenic neighbours’

Return type

tuple

orthologs_score(dup_seg_sp2, tree_orthos, threadingsp1, threadingsp2)

Computes delta score based on ‘syntenic neighbours’ between tracks of threaded duplicated segments in 2 species.

Parameters
  • dup_seg_sp2 (DupSegments) – Corresponding duplicated segments in species 2

  • tree_orthos (dict) – Orthologous gene pairs in sp1 and sp2, defined from molecular evolution

  • threadingsp2 (threadingsp1,) – duplicated segments threading for each species

Returns

delta score based on ‘syntenic neighbours’

Return type

float

retention_loss_score(dup_seg_sp2, threadingsp1, threadingsp2)

Computes delta score based on the ‘pattern of retentions and losses’ between tracks of threaded duplicated segments in 2 species.

Parameters
  • dup_seg_sp2 (DupSegments) – Corresponding duplicated segments in species 2

  • threadingsp2 (threadingsp1,) – duplicated segments threading for each species

Returns

delta score based on the ‘pattern of retentions and losses’

Return type

float

sort()

Orders duplicated segments by descending number of genes.

update_discard(threading)

Updates the discard attribute.

Parameters

threading (nested list) – threading scenario.

update_orthologies(dup_seg_sp2, score, threading2sp, all_orthologies)

Stores genes in identified orthologous duplicated segment. Fills all_orthologies in-place.

Parameters
  • dup_seg_sp2 (DupSegments) – corresponding duplicated segments in species 2

  • score (float) – delta score of synteny similarity (diff. of the 2 orthology scenarios)

  • threading2sp (nested list) – duplicated segment threading for each species

  • all_orthologies (dict) – stores orthologies, for each family (key) gives a tuple (value) with the confidence score and predicted orthologs.

scripts.synteny.syntenycompare.check_orthology(orthologous_chroms, dup_seg_sp2, ortho_genes, loc)

Checks if there is a pre-computed orthology relation between genes of matched duplicated segments in 2 species for the family loc.

Parameters
  • orthologous_chroms (list) – list of orthologous segments in species 2

  • dup_seg_sp2 (DupSegments) – duplicated segments object for species 2

  • ortho_genes (list) – list of orthologous genes in species 2 for a gene in species 1

  • loc (int) – index of the gene family

Returns

True if there is a pre-computed gene orthology, False otherwise.

Return type

bool

scripts.synteny.syntenycompare.to_dup_segments(fams)

Transforms a list of GeneFamilies (i.e entries of a given duplicated species in a window of the OrthologyTable) into a DupSegments object.

A DupSegments object consist in:

  • (i): a list of names of each gene family, given by the corresponding outgroup gene in the OrthologyTable

  • (ii): a list of all genomic segments with a gene copy in the duplicated species

  • (iii): a binary matrix, representing absence/presence of a duplicated gene copy in each genomic segment. Columns are genomic segments, with order given in list (ii). Rows are gene families.

  • (iv): a dictionary, giving for each ‘1’ in the matrix, corresponding duplicated species gene names

  • (v): a list keeping track of discarded families segments threadings

Arg:

fams (list of GeneFamily objects): object to transform

Returns

the transformed object

Return type

DupSegments

scripts.synteny.utilities module

Module with functions to load and write a duplicated ingroups-outgroup orthology table.

class scripts.synteny.utilities.GeneFamily(outgr_genename, outgr_chr, outgr_position, all_duplicate_genes, involved_chromosomes)

Bases: object

Stores an entry in the orthology table for one duplicated species.

outgr_genename

name of the outgroup gene, giving an unique IDs to the family

Type

str

outgr_chr

name of the chromosome of the outgroup gene

Type

str

outgr_position

index of the outgroup gene on its chromosome

Type

int

all_duplicate_genes

gene copies in the duplicated species and their genomic location

Type

list of GeneSpeciesPosition

involved_chromosomes

list of chromosomes in the duplicated species with a gene copy

Type

list of str

Note

No public method, used as a structure to store data. GeneFamily objects are manipulated in lists with functions on `GeneFamily`lists defined below for better readability of manipulations.

scripts.synteny.utilities.GeneSpeciesPosition

alias of scripts.synteny.utilities.GenePosition

scripts.synteny.utilities.add_gene(list_of_genefam, ind, gene)

Adds a gene copy member of the duplicated species in the orthology table (i.e in the corresponding GeneFamily).

Parameters
  • list_of_genefam (list of GeneFamily) – input list of GeneFamily

  • ind (int) – family index to add the gene in the list

  • gene (GeneSpeciesPosition namedtuple) – gene to add

scripts.synteny.utilities.complete_load_orthotable(table_file, chrom_outgr, species, load_no_position_genes=False)

Loads entries for one duplicated species species in the orthologytable, corresponding to chromosome chrom_outgr`in the outgroup, as a list of `GeneFamily objects.

Parameters
  • table_file (str) – Name of the orthologytable file

  • chrom_outgr (str) – Name of the considered outgroup chromosome

  • species (str) – Name of the considered duplicated species

Returns

list of GeneFamily objects

scripts.synteny.utilities.find_closest(number, number_list, index=False)

Finds, in a list of int number_list, the closest integer to number, or its index. Assumes the list is sorted. If two values are equally close to number, gives the smallest.

Parameters
  • number (int) – the input number to search

  • number_list (list) – the list of int to mine

  • index (bool, optional) – Whether the index of the closest element should be returned instead of its value.

Returns

closest number in list (or ist index if index is True)

Return type

int

scripts.synteny.utilities.get_all_chromosome_and_position(list_of_genefam)

Gets chromosome and chromosomal location index of all the duplicated species genes in a list of GeneFamily objects.

Parameters

list_of_genefam (list of GeneFamily) – input list of GeneFamily

Returns

for each chromosome (key), list of gene positions (value)

Return type

dict

scripts.synteny.utilities.get_all_chromosomes_involved(list_of_genefam)

Gets all chromosome with a gene copy in a list of GeneFamily objects.

Parameters

list_of_genefam (list of GeneFamily) – input list of GeneFamily

Returns

list of chromosome names

Return type

list of str

scripts.synteny.utilities.get_all_outgr_names(list_of_genefam)

Gets gene names of all outgroup genes in a list of GeneFamily objects.

Parameters

list_of_genefam (list of GeneFamily) – input list of GeneFamily

Returns

list of gene names

Return type

list of str

scripts.synteny.utilities.get_all_outgr_pos(list_of_genefam)

Gets chromosomal location index of all outgroup genes in a list of GeneFamily objects.

Parameters

list_of_genefam (list of GeneFamily) – input list of GeneFamily

Returns

list of chromosomal indexes

Return type

list of int

scripts.synteny.utilities.insert_outgr_gene(list_of_genefam, ind, gene)

Inserts an outgroup gene in the orthology table (i.e in the list of GeneFamily).

Parameters
  • list_of_genefam (list of GeneFamily) – input list of GeneFamily

  • ind (int) – index to insert the gene in the list

  • gene (GeneSpeciesPosition namedtuple) – gene to add

scripts.synteny.utilities.light_load_orthotable(table_file)

Another simplified loading function for the orthologytable, in order to only get outgroup genes in the orthology table and their corresponding chromosomes.

Parameters

input_file (str) – Input file name.

Returns

Correspondence between chromosome of the outrgoup (key) and its genes in the orthology table (value). Genes are given in order of along each chromosome.

Return type

names (dict)

scripts.synteny.utilities.load_orthotable(table_file)

Simplified loading function for the orthologytable, in order to get outgroup genes in the orthology table and all duplicated species gene copies in the corresponding family.

Parameters

table_file (str) – Input file name.

Returns

Correspondence between genes in the outgroup (key) and duplicated species genes in its family (value).

Return type

orthotable (dict)

scripts.synteny.utilities.outgr_chromosomes(chr_file)

Reads a simple file with a single entry on each line (for instance chrom names) on each line.

Parameters

chr_file (str) – input file name

Returns

entry on each line of the file

Return type

list

scripts.synteny.utilities.split_chr_with_ohnologs(list_of_genefam)

Splits the duplicated species chromosomes in two separate regions if there are two ohnolgs on the same chromosome but more than 100 genes apart. This can potentially be the result of different duplicated chromosomes that fused together.

Parameters

list_of_genefam (list of GeneFamily) – input list of GeneFamily

Returns

list tuples with an historic of ohnologs 100 genes apart on the same

chromosome

splits (dict of dict): for each split chromosome (key1), each gene copy on it (key2) and

its corresponding after-split region (‘a’ or ‘b’)

Return type

store (store)

scripts.synteny.utilities.update_orthologytable(all_genefam, res_dict, sp_list)

Updates the orthology table by adding all newly found orthologies between ingroups and the outgroup. Inserts the new family in the orthology table (i.e in the list of GeneFamily).

Parameters
  • all_genefam (nested dict) – Stores the full orthology table. For each chromosome of the outgroup (key1), for each duplicated species (key2), a list of GeneFamily objects (value).

  • res_dict (nested dict) – Stores new orthologies. For each gene family (key1; represented by its family id, the outgroup gene name), for each duplicated species (key2), the corresponding gene copies in the duplicated species (value).

  • sp_list (list of str) – list of duplicated species

scripts.synteny.utilities.write_updated_orthotable(all_genefam, outgr, sp_list, chr_outgr, out, wsize=0, filt_genes=None)

Writes an orthology table file from data stored in all_genefam.

Parameters
  • all_genefam (nested dict) – Stores the full orthology table. For each chromosome of the outgroup (key1), for each duplicated species (key2), a list of GeneFamily objects (value).

  • outgr (str) – name of the outgroup species

  • sp_list (list of str) – list of duplicated species

  • chr_outgr (str) – chromosome of the outgroup

  • out (str) – name of the output file