scripts.graphs package¶
scripts.graphs.combine_outgroups module¶
Module with functions to combine SCORPiOs synteny predictions (orthogroups in graphs) across multiple outgroups.
-
scripts.graphs.combine_outgroups.
choose_best_graph
(m_fam, mapped_ids, all_graphs, final_graphs, summ)¶ Chooses orthogroup prediction where the families were the less aggregated (one outgroup can over-aggregate families if one ortholog was lost) and community detection was easiest.
- Parameters
m_fam (list) – a group of graphs matched across outgroups (same gene family)
mapped_ids (dict) – For each graph_id, store its position in m_fam and the name of the outgroup species.
all_graphs (OrderedDict of dict) – For each outgroup (key level1), orthogroups in graphs of each families (key level 2), represented by a FamilyOrthologies object.
final_graphs (dict) – Dict to store results
summ (dict) – For each graph (key), the number of cuts (value).
- Returns
- a tuple containing:
graphs (dict): for each outgroup, the gene id(s) of the combined family
selected_graph (list): list of selected gene id(s) (outgroup with best prediction)
- Return type
tuple
-
scripts.graphs.combine_outgroups.
combine_outgroups
(all_graphs, summ_files, out='out')¶ Combines orthogroups predictions across all outgroups using best graphs.
- Parameters
all_graphs (OrderedDict of dict) – for each outgroup (key level1), orthogroups in graphs of each families (key level 2), represented by a FamilyOrthologies object.
summ_files (str) – comma-separated file names summarizing graphs cuts for each outgroup, outgroups should be in the same order as in all_graphs.
outfile (str, optional) – file to write a summary of outgroup graphs selected
- Returns
for each family, orthogroup predictions of chosen graph represented by a FamilyOrthologies object.
- Return type
dict
-
scripts.graphs.combine_outgroups.
map_families_across_outgr
(graphs)¶ Extracts corresponding graphs across outgroups.
- Parameters
graphs (OrderedDict of dict) – For each outgroup (key level1), orthogroups in graphs of each families (key level 2), represented by a FamilyOrthologies object.
- Returns
- a tuple containing:
combin (list of list): for all graphs, lists of all graphs_ids in all outgroups.
mapped_ids (dict): for each graph_id, its position in the combin list and the name of the outgroup species.
- Return type
tuple
-
scripts.graphs.combine_outgroups.
read_summary
(input_file)¶ Loads a tab-delimited file summarizing the number of edges that were cut in each graphs to find the orthogroups.
- Parameters
input_file (str) – input file name.
- Returns
for each graph (key), the number of cuts (value).
- Return type
dict
scripts.graphs.orthogroups module¶
Script to build orthology graphs and detect communities in the graphs. For each gene family, the script searches for two orthologous gene communities that split from a whole genome duplication.
Example:
$ python -m scripts.graphs.orthogroups -i orthology_file.gz [-o out] [-w n] [-n 1]
[-s Summary] [-ignSg y] [-wgd ''] [--spectral] [--verbose]
-
scripts.graphs.orthogroups.
are_species_sep
(partitions)¶ Checks whether genes (or place-holders loss of a duplicate) of each species are split in the two communities.
- Parameters
partitions (tuple) – genes in each graph community in tuples of tuple
- Returns
True if genes of the same species are in two communities, False otherwise.
- Return type
bool
-
scripts.graphs.orthogroups.
collapse_nodes
(graph)¶ Collapses tandem duplicates into a single node. Tandem duplicates are genes of a species having the same edges in the graph.
- Parameters
graph (networkx.Graph) – orthology graph
-
scripts.graphs.orthogroups.
contracted_nodes
(graph, u, v)¶ Modifies the graph by contracting u and v. Node contraction identifies the two nodes as a single node incident to any edge that was incident to the original two nodes. The right node v will be merged into the node u, so only u will appear in the returned graph.
- Parameters
graph (networkx.Graph) – orthology graph
v (u,) – name of nodes to contract, must be in graph.
Note
Adapted from https://www.bountysource.com/issues/46183711-contracted_nodes-with-weights- giving-different-answers-according-to-order-of-inputs
-
scripts.graphs.orthogroups.
init_worker
()¶
-
scripts.graphs.orthogroups.
lazy_load_pairwise_file
(file_object, use_weights=False)¶ Loads orthologies for a gene family and builds the graph, one gene family at a time. The input should be a tab-delimited file, with the following columns: ortho_gene1, ortho_gene2, orthology confidence, gene family ID.
- Parameters
file_object (File Object) – python file object for the input file
use_weights (bool, optional) – whether weights should be used in the graphs
- Yields
tuple –
a tuple containing:
fam (networkx graph): orthology graph of the gene family
prev_id (str): unique id of the gene family
-
scripts.graphs.orthogroups.
load_line
(line, use_weights=False)¶ Loads a single orthology, which will be used to update the graphs.
- Parameters
line (str) – a single line of the orthology file.
- Returns
- a tuple containing:
edges (list of tuples): list of orthology pairs
weights (list): list of corresponding weights
fam_id (str): Unique id of the gene family
- Return type
tuple
-
scripts.graphs.orthogroups.
min_cut
(graph, spectral=False)¶ Detects two orthologous communities in the graph.
- Parameters
graph (networkx.Graph) – orthology graph
spectral (bool, optional) – use spectral clustering instead of default Girvan-Newman (faster)
-
scripts.graphs.orthogroups.
most_central_edge
(graph)¶ Extracts the most central edge of a graph, taking weights into account. If all weights are equal, the most central edge is the edge with the highest unweighted betweenness centrality.
- Parameters
graph (networkx.Graph) – input graph
- Returns
The most central edge.
- Return type
networkx.edge
-
scripts.graphs.orthogroups.
print_out_stats
(stats_dict, wgd='')¶ Prints to stdout some statistics about community detection in graphs.
- Parameters
stats_dict (dict) – a dict counting the number of graphs that were either multigenic and discarded or processed and cut with different algo
wgd (str, optional) – the wgd for which the synteny graphs were computed
-
scripts.graphs.orthogroups.
species_name
(gene)¶ Parses gene name to extract species name. Expects species name after the last ‘_’.
- Parameters
gene (str) – Gene name.
- Returns
Species name
- Return type
str
-
scripts.graphs.orthogroups.
worker_cut_graph
(family, fam, res, spectral=False, g_id=0, verbose=False)¶ Worker for parallel graph cutting. Collapses tandem duplicates, detects the two communities in the graph and store results and statistics about the cuts in the res dictionary.
- Parameters
family (networkx.Graph) – orthology graph of the gene family
fam (str) – unique id of the gene family
res (dict) – dictionary storing the results, shared between processes
spectral (bool, optional) – use spectral clustering instead of default Girvan-Newman (faster)
g_id (int, optional) – unique id for the cut graph
verbose (bool, optional) – print progress
- Returns
True if no Exception was raised.
- Return type
bool