scripts.graphs package

scripts.graphs.combine_outgroups module

Module with functions to combine SCORPiOs synteny predictions (orthogroups in graphs) across multiple outgroups.

scripts.graphs.combine_outgroups.choose_best_graph(m_fam, mapped_ids, all_graphs, final_graphs, summ)

Chooses orthogroup prediction where the families were the less aggregated (one outgroup can over-aggregate families if one ortholog was lost) and community detection was easiest.

Parameters
  • m_fam (list) – a group of graphs matched across outgroups (same gene family)

  • mapped_ids (dict) – For each graph_id, store its position in m_fam and the name of the outgroup species.

  • all_graphs (OrderedDict of dict) – For each outgroup (key level1), orthogroups in graphs of each families (key level 2), represented by a FamilyOrthologies object.

  • final_graphs (dict) – Dict to store results

  • summ (dict) – For each graph (key), the number of cuts (value).

Returns

a tuple containing:

graphs (dict): for each outgroup, the gene id(s) of the combined family

selected_graph (list): list of selected gene id(s) (outgroup with best prediction)

Return type

tuple

scripts.graphs.combine_outgroups.combine_outgroups(all_graphs, summ_files, out='out')

Combines orthogroups predictions across all outgroups using best graphs.

Parameters
  • all_graphs (OrderedDict of dict) – for each outgroup (key level1), orthogroups in graphs of each families (key level 2), represented by a FamilyOrthologies object.

  • summ_files (str) – comma-separated file names summarizing graphs cuts for each outgroup, outgroups should be in the same order as in all_graphs.

  • outfile (str, optional) – file to write a summary of outgroup graphs selected

Returns

for each family, orthogroup predictions of chosen graph represented by a FamilyOrthologies object.

Return type

dict

scripts.graphs.combine_outgroups.map_families_across_outgr(graphs)

Extracts corresponding graphs across outgroups.

Parameters

graphs (OrderedDict of dict) – For each outgroup (key level1), orthogroups in graphs of each families (key level 2), represented by a FamilyOrthologies object.

Returns

a tuple containing:

combin (list of list): for all graphs, lists of all graphs_ids in all outgroups.

mapped_ids (dict): for each graph_id, its position in the combin list and the name of the outgroup species.

Return type

tuple

scripts.graphs.combine_outgroups.read_summary(input_file)

Loads a tab-delimited file summarizing the number of edges that were cut in each graphs to find the orthogroups.

Parameters

input_file (str) – input file name.

Returns

for each graph (key), the number of cuts (value).

Return type

dict

scripts.graphs.orthogroups module

Script to build orthology graphs and detect communities in the graphs. For each gene family, the script searches for two orthologous gene communities that split from a whole genome duplication.

Example:

$ python -m scripts.graphs.orthogroups -i orthology_file.gz [-o out] [-w n] [-n 1]
[-s Summary] [-ignSg y] [-wgd ''] [--spectral] [--verbose]
scripts.graphs.orthogroups.are_species_sep(partitions)

Checks whether genes (or place-holders loss of a duplicate) of each species are split in the two communities.

Parameters

partitions (tuple) – genes in each graph community in tuples of tuple

Returns

True if genes of the same species are in two communities, False otherwise.

Return type

bool

scripts.graphs.orthogroups.collapse_nodes(graph)

Collapses tandem duplicates into a single node. Tandem duplicates are genes of a species having the same edges in the graph.

Parameters

graph (networkx.Graph) – orthology graph

scripts.graphs.orthogroups.contracted_nodes(graph, u, v)

Modifies the graph by contracting u and v. Node contraction identifies the two nodes as a single node incident to any edge that was incident to the original two nodes. The right node v will be merged into the node u, so only u will appear in the returned graph.

Parameters
  • graph (networkx.Graph) – orthology graph

  • v (u,) – name of nodes to contract, must be in graph.

Note

Adapted from https://www.bountysource.com/issues/46183711-contracted_nodes-with-weights- giving-different-answers-according-to-order-of-inputs

scripts.graphs.orthogroups.init_worker()
scripts.graphs.orthogroups.lazy_load_pairwise_file(file_object, use_weights=False)

Loads orthologies for a gene family and builds the graph, one gene family at a time. The input should be a tab-delimited file, with the following columns: ortho_gene1, ortho_gene2, orthology confidence, gene family ID.

Parameters
  • file_object (File Object) – python file object for the input file

  • use_weights (bool, optional) – whether weights should be used in the graphs

Yields

tuple

a tuple containing:

fam (networkx graph): orthology graph of the gene family

prev_id (str): unique id of the gene family

scripts.graphs.orthogroups.load_line(line, use_weights=False)

Loads a single orthology, which will be used to update the graphs.

Parameters

line (str) – a single line of the orthology file.

Returns

a tuple containing:

edges (list of tuples): list of orthology pairs

weights (list): list of corresponding weights

fam_id (str): Unique id of the gene family

Return type

tuple

scripts.graphs.orthogroups.min_cut(graph, spectral=False)

Detects two orthologous communities in the graph.

Parameters
  • graph (networkx.Graph) – orthology graph

  • spectral (bool, optional) – use spectral clustering instead of default Girvan-Newman (faster)

scripts.graphs.orthogroups.most_central_edge(graph)

Extracts the most central edge of a graph, taking weights into account. If all weights are equal, the most central edge is the edge with the highest unweighted betweenness centrality.

Parameters

graph (networkx.Graph) – input graph

Returns

The most central edge.

Return type

networkx.edge

scripts.graphs.orthogroups.print_out_stats(stats_dict, wgd='')

Prints to stdout some statistics about community detection in graphs.

Parameters
  • stats_dict (dict) – a dict counting the number of graphs that were either multigenic and discarded or processed and cut with different algo

  • wgd (str, optional) – the wgd for which the synteny graphs were computed

scripts.graphs.orthogroups.species_name(gene)

Parses gene name to extract species name. Expects species name after the last ‘_’.

Parameters

gene (str) – Gene name.

Returns

Species name

Return type

str

scripts.graphs.orthogroups.worker_cut_graph(family, fam, res, spectral=False, g_id=0, verbose=False)

Worker for parallel graph cutting. Collapses tandem duplicates, detects the two communities in the graph and store results and statistics about the cuts in the res dictionary.

Parameters
  • family (networkx.Graph) – orthology graph of the gene family

  • fam (str) – unique id of the gene family

  • res (dict) – dictionary storing the results, shared between processes

  • spectral (bool, optional) – use spectral clustering instead of default Girvan-Newman (faster)

  • g_id (int, optional) – unique id for the cut graph

  • verbose (bool, optional) – print progress

Returns

True if no Exception was raised.

Return type

bool