A global analysis of genetic interactions in Caenorhabditis elegans
© Byrne et al.. 2007
Received: 4 June 2007
Accepted: 17 August 2007
Published: 26 September 2007
Skip to main content
© Byrne et al.. 2007
Received: 4 June 2007
Accepted: 17 August 2007
Published: 26 September 2007
Understanding gene function and genetic relationships is fundamental to our efforts to better understand biological systems. Previous studies systematically describing genetic interactions on a global scale have either focused on core biological processes in protozoans or surveyed catastrophic interactions in metazoans. Here, we describe a reliable high-throughput approach capable of revealing both weak and strong genetic interactions in the nematode Caenorhabditis elegans.
We investigated interactions between 11 'query' mutants in conserved signal transduction pathways and hundreds of 'target' genes compromised by RNA interference (RNAi). Mutant-RNAi combinations that grew more slowly than controls were identified, and genetic interactions inferred through an unbiased global analysis of the interaction matrix. A network of 1,246 interactions was uncovered, establishing the largest metazoan genetic-interaction network to date. We refer to this approach as systematic genetic interaction analysis (SGI). To investigate how genetic interactions connect genes on a global scale, we superimposed the SGI network on existing networks of physical, genetic, phenotypic and coexpression interactions. We identified 56 putative functional modules within the superimposed network, one of which regulates fat accumulation and is coordinated by interactions with bar-1(ga80), which encodes a homolog of β-catenin. We also discovered that SGI interactions link distinct subnetworks on a global scale. Finally, we showed that the properties of genetic networks are conserved between C. elegans and Saccharomyces cerevisiae, but that the connectivity of interactions within the current networks is not.
Synthetic genetic interactions may reveal redundancy among functional modules on a global scale, which is a previously unappreciated level of organization within metazoan systems. Although the buffering between functional modules may differ between species, studying these differences may provide insight into the evolution of divergent form and function.
A basic premise of genetics is that the biological role of a gene can be inferred from the consequence of its disruption. For many genes, however, genetic disruption yields no detectable phenotype in a laboratory setting. For example, approximately 66% of genes deleted in Saccharomyces cerevisiae have no obvious phenotype . A similar fraction of genes in Caenorhabditis elegans is also expected to be phenotypically wild type [2–4]. Elucidating the function of these genes therefore requires an alternative approach to single gene disruption.
One way to uncover biological roles for phenotypically silent genes is through genetic modifier screens. Genetic modifiers are traditionally identified through a random mutagenesis of individuals harboring one mutant gene followed by a screen for second-site mutations that either enhance or suppress the primary phenotype (reviewed in ). Modifying genes identified in this way clearly participate in the regulation of the process of interest, yet often have no detectable phenotype on their own [6–10]. Thus, forward genetic modifier screens are a useful but indirect approach to ascribe function to genes that otherwise have no phenotype.
An elegant approach called synthetic genetic array (SGA) analysis was devised to systematically analyze the phenotypic consequences of double mutant combinations in S. cerevisiae . With SGA, a 'query' deletion strain is mated to a comprehensive library of the nonessential deletion strains  through a mechanical pinning process. Resulting double-mutant combinations typically have growth rates indistinguishable from single-mutant controls. However, some deletion pairs produce a 'synthetic' sick or lethal phenotype not shared by either single mutant, indicating a genetic interaction. The revelation that most nonessential genes synthetically interact with several partners from different pathways [11, 12] was a major biological insight, as it suggests that many genes have multiple redundant functions and provides a satisfying explanation for the apparent lack of phenotype for the majority of gene disruptions. Other SGA-related techniques have been devised to investigate interactions with essential genes  and to mine the consequences of interactions in great detail . An alternative approach to SGA has been developed to create double mutants en masse by transforming the entire deletion library in liquid with a transgene that targets a query gene for deletion .
Synthetic interactions can reveal several classes of genetic relationships. First, disrupting a pair of genes that belong to parallel pathways that regulate the same essential process may reveal a 'between-pathway' interaction. Second, compromising a pair of genes that act either at the same level of the pathway or are ancillary components at different levels of the pathway may reveal a 'within-pathway' interaction. Finally, each gene of an interacting pair may act in unrelated processes that collapse the system when compromised together through poorly understood mechanisms, revealing an 'indirect' interaction . We note that as the cell may function by coordinating collections of gene products that work together as discrete units, called molecular machines or functional modules [17, 18], these 'indirect interactions' may actually reveal redundancy between previously unrecognized functional modules. To investigate which model best describes an interaction in yeast, physical-interaction data have been mapped onto synthetic genetic-interaction networks [11, 12, 16, 19]. This type of analysis suggests that between-pathway models account for roughly three and a half times as many synthetic genetic interactions compared with 'within-pathway' models.
Although the tools that accompany S. cerevisiae as a model system make it ideal for genome-wide analyses of genetic interactions in a single-celled organism, we wanted to apply a similar systematic approach towards a global understanding of genetic interactions in an animal. There is, however, no comprehensive collection of mutants, null or otherwise, in any animal model system. Notwithstanding this, several features make the nematode worm Caenorhabditis elegans uniquely suited among animal model systems to systematically investigate genetic interactions in a high-throughput manner. First, the worm has only a three-day life cycle. Second, animals can be easily cultured in multiwell-plate format, making the preparation of large numbers of samples economical. Third, around 99.8% of the individuals within a population are hermaphrodites. Strains therefore propagate during an experiment without the need for human intervention. Fourth, genes can be specifically targeted for reduction-of-function through RNA interference (RNAi) by feeding . A library of Escherichia coli strains has been generated in which each strain expresses double-stranded (ds) RNA whose sequence corresponds to a particular worm gene. Upon ingesting the E. coli, the dsRNAs are systemically distributed and target a particular gene for a reduction-of-function by RNAi . RNAi-inducing bacterial strains targeting over 80% of the 20,604 protein-coding genes of C. elegans have been generated [3, 22]. Another useful feature of the worm is the large collection of publicly available mutants representing most of the conserved pathways that control development in all animals . Together, these features make C. elegans a unique whole-animal model to systematically probe genetic interactions in a high-throughput fashion.
Here, we describe a novel approach towards a global analysis of genetic interactions in C. elegans. Our approach is called systematic genetic interaction analysis (SGI) and relies on targeting one gene by RNAi in a strain that carries a mutation in a second gene of interest. The SGI approach is similar in principle to that used by Fraser and colleagues (Lehner et al. ), but with four key differences. First, Lehner et al. investigated interactions in liquid culture, whereas we carried out all experiments on the solid agar substrate commonly used by C. elegans geneticists. Second, rather than score population growth in a binary manner, we used a graded scoring scheme to measure population growth. Third, rather than test all potential interactions in side-by-side duplicates , we performed all experiments in at least three independent replicates in a blind fashion. Finally, we used a global analysis of our data to identify interacting gene pairs in an unbiased fashion. Using SGI analysis, we identified 1,246 interactions between 461 genes, which is the largest genetic-interaction network reported to date.
We present several lines of evidence showing that the SGI network meets or exceeds the quality of other large-scale interaction datasets. Analysis of the SGI network reveals new functions for both uncharacterized and previously characterized genes, as well as new links between well-studied signal transduction pathways. We integrated the SGI network with other networks and found that synthetic genetic interactions typically bridge different subnetworks, revealing redundancy between functional modules . Finally, we provide evidence that the properties of the C. elegans synthetic genetic network are conserved with S. cerevisiae, but the network connectivity of the interactions differs between the two systems. Thus, SGI analysis not only reveals novel gene function, but also contributes to our understanding of genetic-interaction networks in an animal model system.
A summary of the query genes
Null/strong loss-of-function phenotype(s)
Early larval arrest (s2887)
scrawny, Slo (s2613)**
FGF receptor (FGF)
Early larval arrest (n1456)
scrawny, Egl (n1477)**
EGF receptor (EGF)
L1 arrest (mn23)
ts Vul, pleotropic (n1045)**
Insulin growth factor receptor (insulin)
ts Daf-c (e1370)**
GRB-2 (EGF, FGF, insulin)
L1 arrest (leaky) (n1619)
Egl, Vul (n2019)*
Guanine-nucleotide exchange factor (EGF, FGF)
ts Egl, Vul (cs41)*
RAS (EGF, FGF, insulin, Wingless/Wnt)
Mid-larval lethal (leaky) (s1124)
Egl, Vul (n2021)*
Notch receptor (Notch)
ts Emb (gp60)
ts Emb, Glp, Muv (or178)*
β-catenin (Wingless (Wnt))
Mig, Vul, Pvl (ga80)**
Mig, Vul, Pvl (mu63)
Type I TGF-β receptor (TGF-β)
Sma, Mab (wk7)
Tel-2p (DNA-damage response)
Slo, Ste, ts Emb (mn159)**
To identify the network variant that maximized the number of likely true positives but minimized the number of likely false positives, we first identified those interacting pairs that share the same Gene Ontology (GO) biological process  (see Materials and methods). We calculated 'recall' for each variant by dividing the number of co-classified interacting pairs by the number of all possible co-classified pairs within the variant. Similarly, we calculated 'precision' by dividing the number of co-classified interacting pairs by the total number of interacting pairs in the variant. A variant with high recall and low precision is likely to have good recovery of all possible co-classified genetic interactions, but its low stringency will result in a high number of false positives. On the other hand, a network with low recall and high precision will have a low number of false positives, but may have a greater number of false negatives. As is evident from the recall and precision plot (see Figure 2a), there are several network variants with high recall and precision values. We estimated the significance of the extent to which each variant network links genes in the same GO biological process using the hypergeometric distribution (see Materials and methods). Henceforth, we denote p values calculated using the hypergeometric distribution with 'hg'. The most significant variant contains 656 unique interactions among 253 genes (p < 10-22)hg and has a precision and recall of 42% and 16%, respectively. The next best variant (p < 10-21)hg contains nearly twice as many interactions (1,246) among 461 genes, and has 10% higher recall. We chose to restrict all further analysis to the latter network in order to capture more previously uncharacterized interactions. We refer to this variant as the SGI network (Figure 2b, and Additional data file 3). All 656 interactions within the smaller variant are contained within the SGI network and are hereafter referred to as 'high confidence SGI interactions'. The SGI network contains 833 interactions between query genes and signaling targets (67%), and another 421 between query genes and LGIII targets (33%). These 1,246 interactions range in strength from weak to very strong (Additional data file 4). Each of the 1,246 gene pairs within the SGI network synthetically interact by a conservative estimate, as the double gene perturbation phenotype is greater than the product of the two single gene perturbations (see Additional data file 5) [14, 27]. All of the interactions fell within one interconnected component because each query gene shared interaction targets with at least one other query gene.
We assessed the reproducibility of SGI interactions by analyzing reciprocal and technical replicates. Reciprocal reproducibility was measured by interchanging the method used to downregulate each member of selected query-target gene pairs. Interacting query-target pairs were retested by targeting the query gene by RNAi in the background of a mutated 'target' gene. Six of the queries in our matrix were also included as RNAi targets, providing 15 gene pairs to test for reciprocity. All of the 15 gene pairs interacted in one test, and six (40%) also interacted in the reciprocal test (Additional data file 6). Reciprocity of 100% is not expected because mutations and RNAi experiments often differ in their effects on gene function [3, 22, 28]. We also measured the technical reproducibility of the assay. For technical replicates, 15 of the target genes and six of the query genes were included in both the signaling and LGIII matrices, providing replicates for 90 query-target pairs. Of these, eight are positive and 67 are negative in both sets, yielding a technical reproducibility of 83% (75/90). Together, these results demonstrate that SGI interactions are reproducible.
All of the query genes included in this study, except clk-2, are required in signal transduction from the plasma membrane. clk-2 was included as a query gene in our screen to gauge the specificity of SGI interactions on a global scale. We expected that clk-2 would interact with fewer 'signaling' targets than would the signaling queries. In addition, we expected that clk-2 would interact with a similar number of signaling targets compared to LGIII targets, whereas the signaling queries would preferentially interact with other signaling genes. Indeed, we found that clk-2 interacts with half as many signaling genes compared with the average signaling query (11.0% versus 21.5%, respectively) and interacts with the fewest signaling targets overall (Figure 2c). By contrast, let-60, which encodes the C. elegans ortholog of the small GTPase Ras, interacts with the greatest number of signaling targets (29.2%), probably because of the pleiotropic function of Ras in signal transduction . The fraction of LGIII targets that interact with signaling queries is 32% less than the fraction of signaling targets that interact with signaling queries (14.7% versus 21.5%). By contrast, the fraction of clk-2 interactions with signaling or LGIII targets is nearly identical (11.0% versus 10.6%, respectively). These results further support the validity of the SGI approach.
To explore the connectivity between the EGF, FGF, Notch, insulin, Wnt, and TGF-β signaling pathways, we analyzed the SGI data in three ways. First, we examined the clusters of query genes on the clustergram and found some expected patterns, including the grouping of the genes for the FGF receptor (egl-15), its ligand (let-756), and their downstream mediator (let-60/RAS) (Figure 3a). As expected, clk-2 and glp-1 do not cluster with the receptor tyrosine kinases or their downstream mediators. By contrast, sma-6 and bar-1/β-catenin are closely linked, suggesting cooperation between TGF-β and the Wnt/β-catenin pathways, as previously reported in other organisms . Second, we investigated the connectivity between the signaling pathways by creating a network of query genes (Figure 3b, and Additional data file 3). Because six of the query mutants were also included as RNAi targets within the SGI matrix, we tested query pairs directly for interactions and found 25 interactions among 45 pairs. In addition, we examined the pattern of interactions between each query gene and the entire set of RNAi targets. Functionally related query genes are expected to interact with an overlapping set of target genes [11, 12, 32]. We therefore connected queries within the query network with a 'congruent' link if they shared interactions with the same targets more frequently than expected by chance (p < 10-9)hg (see Materials and methods). As expected, the proximity of query genes to each other in the clustergram is reflected in the congruent links. Finally, we added links to the query network derived from other datasets considered throughout this study. These included protein-protein interactions, coexpression links, phenotype links, and other genetic data, all of which are described in detail below. The resulting query network contains 11 nodes and 33 query-query interactions, 16 of which are supported by multiple sources. Of the 24 SGI links within the query network, eight are supported by other lines of evidence that include previously described genetic interactions between genes within defined pathways. Therefore, 16 of the SGI links represent previously unreported interactions, seven of which are also supported by congruent links.
Many of the interaction patterns within the query network are expected. For example, the downstream mediators of receptor tyrosine kinase signaling (let-60, sem-5 (homologous to the human gene encoding the adaptor protein GRB2), and sos-1 (encoding a homolog of the SOS2 adaptor protein)) have the highest number of links within the query network (21, 21, and 18 respectively). This pattern is expected given that almost half of the pathways analyzed involve receptor tyrosine kinase signaling. Interestingly, let-60 and sem-5 each interact with all of the query genes but do not interact with clk-2, suggesting that they are common mediators of signal transduction. As expected, clk-2 has the fewest links. We also identified many multiply supported links between let-23, let-60, sem-5, and sos-1, which are previously characterized components of the EGF pathway [29, 33]. Furthermore, previously characterized cross-talk between let-60 and bar-1 , and between daf-2 (encoding the insulin receptor) and sem-5  is supported. The query network provides the first evidence of genetic interactions between the FGF gene let-756 and downstream mediators of the FGF pathway, including the FGF receptor gene egl-15, let-60, sem-5, and sos-1, affirming several previous lines of evidence . Furthermore, let-756 and egl-15 each interact with six query genes, five of which are shared between the two. Finally, the query network reveals novel interactions between bar-1 and glp-1, between bar-1 and sma-6, and between bar-1 and multiple components of the FGF and EGF pathways. Further investigation will be required to elucidate the precise role of these interactions during development.
We next examined how the recall and precision of the SGI network compared with other large eukaryotic interaction networks, including a previously described C. elegans genetic-interaction network (Lehner et al. ), a C. elegans protein-interaction network (Li et al. ), a eukaryotic protein-interaction network that augments the C. elegans protein-interaction network with orthologous interactions from S. cerevisiae, Drosophila melanogaster, and human protein interactions contained in BioGRID , an mRNA coexpression network constructed from C. elegans, S. cerevisiae, D. melanogaster, and human expression data [38, 40], an S. cerevisiae synthetic genetic-interaction network (Tong et al. ), and a network we created based on the similarity of C. elegans RNAi-induced phenotypes [3, 4, 22, 42] (Figure 4c, and Materials and methods). We refer to these networks as the Lehner, Li, interolog, coexpression, Tong, and co-phenotype networks, respectively. In addition, we examined a network of fine genetic interactions, which consists of genetic interactions identified from low-throughput experiments that were collected from the literature by WormBase . The fine genetic network excludes interactions identified solely through high-throughput analysis. The SGI network has an average precision, but a higher recall than all other datasets examined. We investigated whether the SGI network has a higher recall because of a preselection of signaling target genes, but found this not to be true: the recall of the SGI network remains the highest of all networks examined when only the LGIII target genes are considered (recall = 0.23). Together, our analyses suggest that the SGI approach is at least as proficient as other efforts that describe interactions on a large scale.
Comparison of SGI and Lehner genetic interactions
Type of link
Number of links*
Tested in SGI and Lehner analyses
Negative in SGI and Lehner analyses
Positive in SGI and Lehner analyses
Positive only in SGI analysis
Positive only in Lehner analysis
We extended the comparison between the SGI and Lehner networks by using previously computed prediction scores for C. elegans genetic interactions based on characterized physical interactions, gene expression, phenotypes, and functional annotation from C. elegans, D. melanogaster, and S. cerevisiae (Zhong and Sternberg ). The probability scores assigned by Zhong and Sternberg for all pairs of genes in the SGI network were divided into three categories: low probability of interaction; intermediate probability of interaction; and high probability of interaction. We found roughly twice as many SGI interactions as expected in the high-probability category and fewer gene pairs than expected in the low probability of interaction category (p < 10-25) (Figure 4d). The 'high confidence' SGI interactions have more high probability scores than expected compared with the whole SGI network (see Figure 2a), and the SGI interactions with the greatest interaction strengths (greater than 4.4) have more still. The Lehner genetic interactions have the greatest number of high-probability interactions relative to that expected by chance. As Lehner et al.  exclusively scored catastrophic interactions, this analysis suggests that the Zhong and Sternberg probability score not only reflects the likelihood of interaction, but also the strength of that interaction. Together, our comparison of SGI interactions to other observed and predicted networks further supports confidence in SGI interactions.
We next asked how worm genetic interactions relate to other interaction datasets and how this adds to our understanding of systems in animals. To do so, we first created a superimposed network by combining published interaction data from numerous sources using a method similar to that used in . We then investigated the patterns of SGI interactions within it. The superimposed network was constructed from several large-scale interaction datasets, including the Li, interolog, Lehner, coexpression, co-phenotype, and fine genetic-interaction networks (see above). In addition, the SGA network  was mapped onto C. elegans orthologs and is referred to as the 'transposed SGA network' (see Materials and methods). The links from all of these networks were combined with the SGI network to form a single superimposed network.
Composition of the C. elegans superimposed network
Genetically supported links (A)
Genetically supported links (B)
Physically supported links
Coexpression supported links
Co-phenotype supported links
Fine genetic interactions
C. elegans protein interaction
C. elegans co-phenotype
Genetic interactions within the bar-1 module
bar-1-linked (in SGI network)
Genes within the bar-1 module linked by co-phenotype exhibit a pale and scrawny phenotype when targeted by RNAi . We also found that RNAi-targeted lin-35 and T20B12.7 exhibit the same pale and scrawny phenotype in a bar-1(ga80) background. We hypothesized that the pale phenotype is due to decreased fat production or storage. A common method for examining fat accumulation in C. elegans is to incubate worms in Nile Red vital dye, which stains lipids and readily accumulates within the triglyceride deposits in the intestine . We therefore targeted each gene within the subnetwork by RNAi in the presence of Nile Red and measured the accumulation of Nile Red microscopically (see Materials and methods). Fifteen of the 20 genes targeted gave a phenotype of significant decrease in Nile Red accumulation in an N2 background (Figure 7b,c). Five of the nine genes that present the pale and scrawny phenotype also showed the decrease in Nile Red staining, suggesting that defects in fat metabolism and/or accumulation may account for the phenotypes observed with the transmitted light dissection microscope. Moreover, 10 of the 11 genes that did not present the pale phenotype also retained less Nile Red than controls. Together, these results suggest that the bar-1 module may regulate fat production or storage. Furthermore, the analysis of the bar-1 module illustrates how SGI interactions can reveal coordinated activity between otherwise disparate genes within the superimposed network.
To further investigate the propensity of SGI interactions to bridge subnetworks, we relaxed the stringency with which we identified subnetworks to create 'broad' subnetworks that contain up to hundreds of genes (see Materials and methods and Additional data file 9). We reasoned that broad subnetworks are likely to contain genes that belong to common pathways, complexes, and functional modules. Interactions that bridge broad subnetworks are therefore likely to reveal functional redundancy among these components. Consistent with the idea that broad subnetworks are enriched for functional modules, the protein (p < 10-4)hg, coexpression (p < 0)hg, and co-phenotype (p < 10-26)hg networks are each significantly enriched for interactions within broad subnetworks (Additional data file 11). By contrast, we found that SGI interactions significantly bridge broad subnetworks (p < 10-6)hg (Figure 8c). Six hundred and twelve SGI interactions bridge subnetworks, compared to an expected 569.6 based on chance. These results further demonstrate that SGI interactions have the propensity to bridge distinct functional modules. Together, these results provide the first evidence that functional redundancy may extend beyond individual gene pairs to a higher level of organization within the system – the functional module.
We developed systematic genetic interaction analysis (SGI) to identify biologically relevant genetic interactions in a systematic and high-throughput manner. Through our unique approach, we were able to extract 3.5-fold more interactions than a previous study , despite testing 9.2-fold fewer gene pairs for interaction. The resulting SGI network of 1,246 interactions is the largest metazoan genetic network reported to date. Four lines of evidence support the validity of SGI interactions. First, replicates of 90 query-target pairs were included in both the signaling and the LGIII matrix, yielding a technical reproducibility of 83%. Second, six of the query genes were also included as RNAi targets, yielding a reciprocal reproducibility of 40%. Full reciprocity is not expected because of the varying degree of gene inactivation in the background of different alleles and RNAi conditions. Third, of the 1,165 gene pairs examined in both this study and by Lehner et al. , SGI identified 64% of the 28 interactions found by Lehner et al., and there is 98.9% agreement between the negative calls. Fourth, an independent method of assessing the likelihood of genetic interactions between gene pairs  determined that the SGI network is enriched for interactions that are predicted to be true (p < 10-25).
Four lines of evidence suggest that the interactions uncovered by SGI are also biologically meaningful. First, query genes involved in signal transduction have dramatically more interactions with signaling targets than with random targets. By contrast, a query gene involved in an unrelated process (DNA-damage response) interacts with signaling and random targets with equal frequency. Second, the SGI network contains 26% of all gene pairs within the interaction test matrix that have similar GO annotation, suggesting that our network is greatly enriched for interactions between functionally related genes (p < 10-21)hg. Third, a cluster analysis reveals many expected patterns within the query gene network, and between query and target genes. For example, a glp-1-interacting cluster is enriched for 'Notch-receptor processing' activity [47, 48], a sem-5-interacting cluster is enriched for 'muscle-development' activity [49, 50], and a bar-1 interacting cluster is enriched for 'establishment of cell polarity' activity. Finally, genetic interactions between genes within the bar-1 module predict a common function: the regulation of fat storage or metabolism. Thus, the dataset contains biologically meaningful relationships that can be mined for further insights.
The SGI approach facilitates the discovery of interactions with a wide range of strength and reveals many network variants from which the most biologically relevant network can be extracted. Although our chosen SGI network is significantly enriched with known functional categories, a number of criteria can be modified to mine SGI data for more or less stringent interactions. For example, the SGI variant with the most significant precision and recall (see Figure 2a) had greater overlap with predicted interactions than did the larger SGI network (see Figure 4d). With the SGI approach, tailored sets of genetic interactions can be revealed that either facilitate detailed biological analysis by limiting false positives at the expense of some true positives, or facilitate global network analyses by increasing the capture rate of true positives at the expense of including more false positives.
Our chosen SGI network has good recall and precision when compared to other interaction datasets. As a quality benchmark of precision, we considered the network of fine genetic interactions, which is assembled from low-throughput biological analyses and probably contains few false-positive interactions. The SGI network has a precision similar to the network of fine genetic interactions, which suggests that SGI interactions do not simply represent the additive perturbation of functionally unrelated genes. Although much of the precision score of the SGI network is due to interactions among known signaling components, the precision of the LGIII network remains significant, suggesting that more uncharacterized interactions are uncovered within the LGIII network than within the signaling network, as expected.
Surprisingly, the SGI network has a higher recall than all of the other datasets examined. This is not due to the preselection of signaling targets, as a network created with random LGIII targets also has a higher recall than the other datasets. By comparison, the Lehner network , which is similar to our signaling network in that it derives from a matrix of preselected signaling genes, has much lower recall than all SGI-related networks. We suspect that the difference lies in the methodology of identifying interactions: The SGI approach detects interactions ranging from weak to strong, while Lehner et al.  report only strong interactions. Restricting analyses to strong interactions evidently neglects a large proportion of meaningful interactions between genes known to function within the same biological process, and must therefore miss interactions between genes with no previously shared annotation as well.
To explore how genetic interactions integrate into the biological system, we integrated the SGI interactions with other genetic interactions and with data from the C. elegans interactome, transcriptome, and phenome into a superimposed network. An investigation of the overlap between SGI and other contributing interactions within the superimposed network revealed little overlap. Given that only approximately 1% of the links in the superimposed network are multiply supported, this is not surprising. The lack of overlap cannot be attributed solely to the sparseness of available data in the superimposed network, as both the coexpression and co-phenotype networks were created from nearly genome-scale datasets. In addition, the lack of overlap is unlikely to reflect poor-quality data, as we have demonstrated that the interactions within the SGI network and other datasets contain significant numbers of functionally related gene pairs. This paradox may suggest that most high-throughput datasets generated so far have many false negatives. Alternatively, different interaction modes may have little real correspondence with one another, and instead yield complementary information about the system. In either case, a better understanding of biological systems may be achieved by investigating the entirety of superimposed networks and not just multiply supported links.
Three lines of evidence suggest that multiply supported subnetworks can help predict the function of uncharacterized genes. First, the subnetworks are significantly enriched for GO biological processes, suggesting that uncharacterized genes within the subnetworks may have similar functions. Second, a detailed examination of the bar-1 module revealed new genetic interactions that were not tested within the SGI matrix. Third, a shared role in fat accumulation was discovered among the genes of the bar-1 module. Of note, the gene prx-5 of the bar-1 module is required for import into peroxisomes, which carry out β-oxidation of long-chain fatty acids, and has previously been identified in a genome-wide screen for fat-regulatory mutants [51, 52]. In humans, peroxisomal misregulation results in defective lipid metabolism and is associated with diseases such as Zellweger syndrome . How other components of the bar-1 module regulate fat will be an interesting avenue for further investigation. Our data therefore show that the addition of SGI interactions to other datasets enhances the ability to predict gene function.
The general lack of overlap between contributing datasets of the superimposed network, along with the topology of the bar-1 module, led us to the finding that SGI interactions bridge different subnetworks. Subnetworks enriched for particular functions probably work towards a common goal and may define a higher level of organization within the cell, such as molecular machines  or functional modules . In one example, SGI interactions with sma-6 bridge a subnetwork enriched for 'regulation of body size' genes and a subnetwork enriched for 'germline development' genes. SMA-6 is an ortholog of type I TGF-β receptors [53, 54]. While sma-6 regulates body size, TGF-β signaling can also regulate germline proliferation in both C. elegans and Drosophila [55–57]. Thus, interactions with sma-6 revealed a putative novel redundant function for the two modules. By overlaying SGI interactions onto a superimposed network, we have discovered significant redundancy between functional modules and revealed a new layer of interactions within a biological system.
Approximately 18% of the 7,008 gene pairs that we tested interact genetically. We rationalize this large fraction of interacting gene pairs uncovered by SGI in four ways. First, genes within the same local neighborhood on a network graph are more likely to interact with each other than with randomly selected targets. For example, in S. cerevisiae, 18–24% of genes linked to the same query gene interact with each other, compared to the interaction rate of 1% for the average query [11, 12]. Similarly, a majority of the SGI genetic tests are between genes known or predicted to be involved in signal transduction; a relatively high number of interactions may therefore be expected. Second, essential genes genetically interact with more genes than nonessential genes. For example, when conditional alleles of essential yeast genes are used as queries in SGA screens, the fraction of interactions identified is 5.5-fold more than the number of interactions with nonessential queries (0.6%) . Of the 11 query genes investigated in this study, nine are essential. Thus, by using hypomorphic alleles of genes that probably teeter on the brink of collapse, and designing an approach that can reliably detect both strong and weak interactions, we have created a very sensitive system to detect genetic interactions. Third, multicellular organisms may have more vulnerabilities than unicellular organisms. Each cell type within an animal is likely to be governed by a system with a distinct set of genetic vulnerabilities that is different from other cell types. Because compromising the development or physiology of any one of the major tissue types will probably kill the animal, the vulnerability of the entire system is greater than that of any one cell type. This effect may be further compounded by a complex developmental program. Finally, the total number of anticipated genetic interactions in C. elegans as revealed by SGI is in the realm of expectation when compared to that of S. cerevisiae. On the basis of the fraction of genes that interacted in the LGIII network (14%), which represents a nearly random set of genes, we estimate there to be approximately 61 million genetic interactions in C. elegans that involve an essential gene. The number of expected genetic interactions in C. elegans as revealed by SGI analysis is therefore around 120 times that of S. cerevisiae [11–13]. By comparison, the number of all possible gene pairs in C. elegans is around 11-fold more than the number of all gene pairs in S. cerevisiae. Thus, the ratio of expected genetic interactions in worms compared to yeast is only around 11-fold more than the respective ratio of all possible gene pairs in both organisms. This difference probably reflects the increase in complexity of nematodes compared to yeast. By contrast, Lehner et al.  reported an interaction rate of 0.5%. This fraction would suggest that the ratio of the number of expected genetic interactions in worms compared to yeast is around 0.4-fold less than the ratio of all possible gene pairs in worms compared to yeast, which is inconsistent with expectations. We therefore conclude that the number of interactions revealed by SGI is not unexpectedly high.
Whether the connectivity of genetic interactions is conserved, rather than just the principles of network biology, remains an open question. A comparison between the only two organisms in which genetic interactions have been systematically investigated – S. cerevisiae and C. elegans – suggests not. We have evidence against the conservation of genetic interactions at both the level of individual gene pairs and at the level of subnetwork connectivity. Our observations are consistent with a previous report that less than 1% of around 1,000 yeast interactions are conserved in C. elegans . How can this be, given that individual genes , homologous physical interactions (interologs), the essentiality of hubs, and network principles are all clearly conserved [3, 24, 37, 44, 59, 60]? There are at least three trivial explanations for the apparent lack of conservation in the connectivity of synthetic genetic networks. First, the different approaches used to uncover interactions may have led to an artificial difference in the genetic network connectivity within the two systems. Second, synthetic genetic-interaction analysis in C. elegans has focused on signaling pathways that are largely absent from S. cerevisiae, hindering direct comparisons. Third, only a tiny fraction of the synthetic genetic network has been probed in either system. An expanded investigation of the networks may yield more commonalities. Finally, a nontrivial explanation for the apparent lack of conservation may lie in the nature of synthetic genetic networks, which overwhelmingly reveal redundancy between pathways and functional modules as we show here (see also [16, 19]). Thus, perturbations in the connectivity between modules may change through random mutation of genes without phenotypic consequence. Over an evolutionary time scale, synthetic genetic relationships may therefore drift and/or be selected for or against to satisfy new constraints during speciation [18, 61]. If one mode of evolution is the shuffling of relationships between functional modules, then there may be no reason to expect that the connectivity of genetic networks will be conserved. Whereas model systems have repeatedly proven their utility for discovering and understanding basic biological processes and monogenic diseases, our results suggest that understanding the complex network of interactions that underlie polygenic diseases may require network analysis of systems more closely related to humans. Regardless of this, a study of the connectivity of synthetic genetic networks from different species may provide insight into the evolution of divergent form and function.
We have developed a novel, sensitive, and reproducible approach called SGI for systematically investigating genetic interactions in C. elegans. Using this approach, we identified a network of 1,246 interactions among 461 genes, providing functional annotation for many poorly characterized signal transduction genes. When integrated with other interaction data into a superimposed network, the SGI interactions help reveal new putative functional modules. Because genetic links are largely orthogonal to other interaction modes, SGI data make a significant contribution to connectivity within the superimposed network. Furthermore, SGI interactions link distinct functional modules on a global scale, revealing a new level of organization within the system. Finally, we find that genetic network properties are conserved between yeast and worms, but the connectivity may not be. Together, our results indicate that a comprehensive investigation of genetic interactions is critical to our understanding of the metazoan biological system.
Query-target gene pairs were tested for interaction by feeding target gene RNAi to worms with a mutation in the query gene. RNAi cultures were grown in 100 μg/ml LB Amp overnight at 37°C. 40 μl of culture was placed on each well of 12-well plates containing 3.5 ml NGM  supplemented with 105.6 μg/ml carbenicillin and 1 mM isopropyl-beta-D-thiogalactopyranoside (IPTG). Plates seeded with bacteria were dried overnight at room temperature and for 40 min in a flow hood. Two stage L3–L4 worms (N2, egl-15(n1477), let-756(s2613), sos-1(cs41), sem-5(n2019), let-23(n1045), let-60(n2021), clk-2(mn159), daf-2(e1370), glp-1(or178), sma-6(e1482), bar-1(ga80)) were placed in each well of a 12-well plate using a COPAS BioSort worm sorter (Union Biometrica, Holliston, MA). Worms were grown at 20°C (egl-15(n1477), let-756(s2613), sos-1(cs41), sem-5(n2019), let-60(n2021), sma-6(e1482), bar-1(ga80)) or at 16°C (glp-1(or178), let-23(n1045), clk-2(mn159), daf-2(e1370)). The following controls were grown in each experiment. As a positive control for RNAi efficiency, wild-type (N2) worms and the query mutants were fed pop-1(RNAi). As negative controls for background growth levels, N2 worms were fed target RNAi and query mutants were fed L4440 mock-RNAi.
Typically, one person can prepare and process experiments with four worm strains fed 384 RNAi-inducing bacterial strains in triplicate over the course of two weeks. Overlapping sets of experiments of similar size can be prepared while the worms in the first experiment are growing, resulting in an average throughput of 1,920 genetic tests per week per person.
Within the LGIII set of genes, there are 203 genes annotated with at least one GO biological process. These genes represent 280 unique GO Process 1000 categories. One thousand samples from the C. elegans genome of 203 genes with at least one GO biological process were then chosen randomly. The random set has a mean of 322.5 unique GO Process 1000 categories with a standard of deviation of 32.8. Compared to the random set, there is no significant difference in the number of unique GO processes in the LGIII set (z-score = -1.298; p = 0.097 after Bonferroni correction). Furthermore, of the 280 unique GO biological processes in the LGIII set, only 18 are significantly enriched (p > 0.01) in the LGIII set, and all of these are represented by only one (12 processes), two (four processes) or three (two processes) genes (see Additional data file 2).
The number of progeny counted in a well that resulted from each query-target pair and control combination was counted and recorded as growth scores. A well with no progeny was given a growth score of zero, whereas a well overgrown with progeny was given a growth score of six. Growth scores of 1 to 5 were assigned to wells with increasing numbers of worms (1, 1–10 progeny; 2, 11–50 progeny; 3, 51–100 progeny; 4, 101–200 progeny; 5, 200+ progeny). From pilot experiments performed by two independent investigators, we found that worm populations can be quickly and reliably binned into these categories. We took several counts of the same maturing population over the course of several days. Each query-target pair and its two controls were tested in at least three rounds. Experiments suspected of contamination were flagged as suspect and repeated. Counts obtained in a round were annotated with confidence scores of 0, 1, or 2, reflecting whether they were suspect, not suspect, or resulted from a second attempt, respectively. A large fraction of all experiments was digitally archived using a high-throughput digital imager [63, 64].
Let G(Q, T,i,j) be the growth score for the (Q,T) query-target pair on the jth day of round i. For each query-target pair, two growth score differences were calculated: 1, D null (i,j) = G(Q, null ,i,j) - G(Q,T,i,j), the difference between the experimental population (query mutant; target RNAi) and the mock RNAi vector control (query mutant; L4440 RNAi); and 2, D wt (i,j) = G(wt,T,i,j) - G(Q,T,i,j), the difference between the experimental population and the wild-type control (N2; target RNAi). The following sequential rules were used to call a (Q,T) pair an interaction:
For round i, its jth day's counts were called 'deviant' if both D wt (i,j) and D null (i,j) were at least d.
A round's set of counts was labeled 'positive' if at least e of its days were found to be deviant (e = 1 or 2) or a majority of its days were deviant (e = 0).
A (Q,T) pair was then called an interaction if at least s of its rounds were positive (s = 1 or 2) or a majority of its rounds were positive (s = 0).
Three additional criteria were used to determine how counts from suspect rounds were treated:
Suspect rounds were excluded from the analysis if the confidence score was less than a threshold c (c = 0, 1, or 2).
Counts derived from suspect rounds were removed if a second attempt was conducted as long as the parameter r was set; if r was not set, all counts were retained.
Suspect rounds were included to bring the total number of rounds to a minimum of m (m = 1 or 2).
We applied all combinations of the above criteria to generate 51 unique network variants. All interacting pairs within a network variant were query-target pairs that had satisfied all of the criteria imposed by the variant. For example, in a variant with the following criteria: d = 3, e = 1, s = 2, r = 1, c = 0, and m = 2, all query-target pairs that were called interacting were found in at least two (s = 2) positive rounds that had at least one deviant day (e = 1), for which the difference between the growth scores of the experimental population and the control populations was at least three (d = 3). If any round was considered suspect and the experiment for that round had been repeated, only growth scores from the second attempt were used (r = 1). Otherwise, rounds with all levels of confidence were used (c = 0). If fewer than two rounds of data were available for a specific query-target pair, data from additional rounds were included, so that at least two rounds of data were available, starting from the most confident rounds (m = 2).
To compare network variants, we identified pairs of genes within each variant that share a GO biological process classification . Only categories with fewer than 1,000 genes were considered. We calculated 'recall' and 'precision' for each variant, V, as:
Recall (V) = (number of co-classified interacting pairs in V)/(number of possible co-classified pairs) and
Precision (V) = (number of co-classified interacting pairs in V)/(number of interacting pairs in V)
We estimated the significance of the degree to which each network linked genes in the same GO biological process category using the hypergeometric distribution. The hyper-geometric distribution takes into account the number of co-classified interacting pairs in each variant relative to the size of the variant, the total number of all possible co-classified gene pairs, and the total number of gene pairs tested, and is thus a measure of the significance of both the recall and precision of a variant.
where 1(i) was 1 if round i passed the above criteria and was 0 otherwise, h is the total number of rounds that passed the criteria, and n i is the number of days in round i. IS represents the average growth score for a query-target pair calculated over its valid data.
Target and query genes were clustered on the basis of their interaction strengths. Hierarchical agglomerative clustering was run using Cluster 3.0 [65, 66] on both the target and query dimensions using average linkage as the cluster similarity metric and uncentered Pearson correlation as the IS profile similarity metric, respectively. Individual target gene clusters were defined by cutting the hierarchical tree at a height of 0.4. The degree to which each cluster contained genes assigned to the same gene functional category was measured using the hypergeometric distribution and a significance cutoff of P < 0.01.
We searched for common functional annotation present in clusters of genes generated by the hierachical clustering. To do so, we collected several datasets of gene functional categories described for C. elegans genes specifically as well as for predicted C. elegans orthologs from other organisms. We collected C. elegans gene categories from GO  (downloaded from  on 17 January, 2007) and KEGG  (downloaded from  on 13 June, 2005). We restricted GO process categories to those containing 1,000 genes or fewer. Annotations implied by the 'is-a' or 'part-of' subsumption GO hierarchies were automatically added. We also collected S. cerevisae gene pathways from MIPS  (downloaded on 12 May, 2002) and H. sapiens gene pathways from BioCarta  (downloaded on 13 June, 2005). For the MIPS and BioCarta datasets, we found the predicted C. elegans ortholog for each gene in a pathway by identifying the reciprocal best match protein using the BLASTP program . All of the categories with their associated genes can be found in Additional data file 12.
where K is the number of target genes linked to query gene A, n is the number of target genes linked to query gene B, and N is the number of tested target genes. A P value cutoff of p < 10-9 yielded a total of 16 congruent links.
We tested whether targets with high degree (those linked to many query genes) have an increased tendency to produce a strong phenotype when targeted by RNAi compared to targets with low degree (those linked to few query genes). The phenotype data of Kamath et al.  were used. We define a strong phenotype as any of the following: Emb (embryonic lethal), Ste (sterile), Let (lethal), Lva (larval arrest), Lvl (larval lethal), or Adl (adult lethal). Our null hypothesis is that the degree of a target gene is not correlated with strong RNAi phenotypes. Under the null hypothesis, we expect to find an equal proportion of strong RNAi phenotypes among targets with any degree. We quantified the difference between the observed and expected number of target genes with a strong RNAi phenotype for each degree using a chi-square test with 10 degrees of freedom (one less than the number of query genes).
To measure topological network properties of the SGI and yeast SGA genetic-interaction networks, we used the program tYNA  to analyze the variance of the SGI and yeast SGA network properties. The resulting standard errors of the mean for the SGI network parameters are reported in the text.
A co-phenotype network was created by linking genes with similar loss-of-function phenotypes detected in recently published high-throughput RNAi screens [3, 4, 42]. An RNAi phenotype compendium was assembled by compiling the results of three genome-wide RNAi studies: 31 phenotypes scored for 1,472 RNAi from the Kamath et al.  dataset; 25 phenotypes scored for 1,486 RNAi from the Simmer et al.  dataset; and 26 phenotypes scored for 1,066 RNAi from the Rual et al.  dataset. Several phenotypic annotations in the datasets were converted to provide a uniform terminology that allowed the three datasets to be integrated. These conversions included labeling brood counts scored as '1–5' and '6–10' as 'Ste'; relabeling 'Prz' as 'Prl'; relabeling 'Lvl' as 'Let'; and labeling any embryonic lethal percentages over 10% as 'Emb.' In total, 37 phenotypes scored across 2,327 unique RNAi experiments were collected from the three studies and recorded in a 2,327 × 37 RNAi phenotype matrix, K. Each entry in the matrix, K iv was set to 1 if RNAi against gene i produced phenotype v in one of the three studies and was set to 0 otherwise. Each row in the matrix is referred to as a gene's RNAi phenotype profile.
We devised a measure of phenotypic similarity motivated by the uncentered Pearson correlation coefficient (phenotypic PCC) approach of Gunsalus et al. . However, we chose not to use the phenotypic PCC as it can produce false-positive links between genes with a high correlation that is based on a single (or even a few) shared common phenotype(s) when the two genes fail to produce phenotypes in all (or many) of the other phenotypes. Inspection of the compiled RNAi phenotype dataset reveals thousands of gene pairs that result in such spurious, yet perfect, correlation. In addition, phenotypic PCC will result in false negatives due to low correlations between genes that share several rare phenotypes but that differ in only a few others.
where f v is the frequency of phenotype v across the genome and K iv is the (i,v)th entry from the RNAi phenotype compendium matrix as described above. If RNAi produces phenotype v in two genes, the LOFA score is increased by -log(f v ). The boost is larger for more infrequent phenotypes. For example, a phenotype that occurs in 1 out of 100 genes will increase the score by 2 units, whereas a phenotype that occurs in 1 out of 10 genes will contribute only 1 unit of score. The LOFA's second term gives a bonus to two genes if they both do not share a common phenotype in an analogous fashion.
The LOFA and phenotypic PCC measures of similarity were compared by measuring their ability to predict genes of related function. For each score, we constructed networks induced by using a cutoff above which genes were considered to be functionally related. We first varied the LOFA score cutoff from high to low, producing 51 networks of increasing size. Similarly, 51 networks of increasing size were produced for phenotypic PCC by lowering the phenotypic PCC cutoff. The precision of each network was measured by calculating the fraction of linked genes found to be annotated with a common GO category. Precision levels were then plotted against the network size. LOFA was found to be superior to phenotypic PCC for connecting genes of related function as it produced substantially higher precision levels than phenotypic PCC for every network size (Additional data file 13).
A final co-phenotype network was constructed by linking genes exhibiting significant levels of agreement. The significance of the LOFA score was assessed by generating 3 million random LOFA values. We first constructed a random dataset in which the genes associated with loss-of-function phenotype v in the RNAi phenotype compendium were permuted. This was repeated for each phenotype to produce one permuted dataset from which 100,000 random pairs were then picked and LOFA was calculated. We repeated this procedure for 30 different permuted datasets. We found that a cutoff of 7.0 was equivalent to an estimated significance level of 0.001, as approximately 100 LOFAs computed from random datasets exceeded this value on average in each of the 30 permuted trials.
We constructed the transposed SGA network of synthetic genetic interactions from those interactions described in  by mapping each yeast gene to its predicted worm ortholog(s). Maps were created containing all gene pairs with BLASTP significance values of p < 10-30 or better . For interactions between yeast genes with multiple predicted worm orthologs, transposed interactions were created for all combinations of predicted orthologs.
The interolog network was created from eukaryotic protein-protein interactions reported in BioGRID . All interactions assembled from organisms other than C. elegans were mapped to predicted worm ortholog pairs using BLASTP with a significance cutoff of p < 10-30 .
To gauge the significance of various network properties, 1,000 randomly permuted networks were constructed for each data type. Permuted SGI networks were created by combining permuted signaling and LGIII networks. A link in each of these networks associates one query gene with one RNAi target gene. The permuted SGI networks link each query gene to a random set of target genes by randomly picking genes from the entire set of target genes tested in the screen. The number of target genes linked to each query was held fixed in the permuted networks to preserve the degree distribution across query genes. We also created permuted Lehner et al.  networks, yeast SGA networks, and protein-interaction networks using this method. Permuted coexpression, co-phenotype, and fine genetic networks were created by randomly linking genes present in each network. Random superimposed networks were created by taking the union of all links from the permuted networks obtained from the separate data types.
The significance of the number of supported links (gene pairs linked by more than one data type) in the superimposed network was estimated by comparing the observed number of supported links to the number of supported links in 1,000 randomly permuted superimposed networks. Significance was calculated with a standard Z-score transformation using the mean and standard deviation of the number of supported links across the random networks. The significance of the overlap of two data types was estimated in a similar manner.
We identified subnetworks, defined as small- to medium-sized groups of possibly overlapping genes, by searching for densely connected sets of genes in individual networks and in the superimposed network using MODES . We used MODES parameter settings such that a subnetwork must have at least 50% connectivity, cannot overlap any other subnetwork by more than half of its genes, and must contain a minimum of four genes.
A connectivity significance score was assigned to each subnetwork based on the number of links connecting each of its members. The connectivity significance score for a subnetwork containing n genes was calculated as a standard Z-score (l - m)/s where l is the observed number of links in the subnetwork, and m and s are the mean and standard deviation of the number of links across 1,000 random collections of n genes.
As a post-processing step, any gene that was not grouped into a subnetwork by MODES was iteratively considered for addition to each subnetwork. To achieve this, a hierarchical clustering merge step was performed on all such genes across all subnetworks, using the connectivity score as the basis for a similarity metric. At each step in the clustering, the gene/subnetwork pair with the largest increase in connectivity score was combined. The connectivity score increase was calculated as the subnetwork's connectivity score upon addition of the gene minus its connectivity score before the addition of the gene.
Broad subnetworks were identified in single-data-type networks using the VxOrd algorithm . VxOrd clusters a network of genes on a two-dimensional surface using multidimensional scaling . The links between genes are treated as spring constants and a configuration of the springs is sought that minimizes the total free energy of the system. The result is a collection of genes arranged on the X-Y plane. We partitioned the genes into clusters using the dense subregions obtained from two-dimensional density estimation over a grid superimposed on the X-Y plane. We formed clusters of genes in contiguous regions whose densities were at least 10% of the maximum density and matched a minimum area cutoff.
Each subnetwork identified in the superimposed network was inspected to determine which types of data significantly link its gene members. For each subnetwork, the significance of the number of links of a specific data type that connected two genes within the subnetwork was calculated using the connectivity significance score (see previous section). Subnetworks were annotated as enriched for a data source if the connectivity score had an associated P value of 0.01 or less.
The bar-1 module was identified in a search for multiply-supported subnetworks within an earlier version of the superimposed network. The links within the subnetwork were updated using the same data as reported in the current subnetwork. This resulted in the addition of two links to the module: an interolog interaction between efl-1 and lin-35 and a Lehner interaction between ubc-18 and lin-35.
L4 parental worms were placed on NGM plates seeded with RNAi or mock-RNAi bacteria and 0.015 μg/ml Nile Red. L4 F1 and F2 progeny were analyzed by fluorescence microscopy for Nile Red intensity. To quantify Nile Red intensity, Openlab software (Improvision, Lexington, MA) was used to calculate mean fluorescence within a measured area as well as the length of the worm. Nile Red intensity was calculated as: mean fluorescence × area/length of worm.
All pairs of subnetworks derived from the coexpression, co-phenotype, and interolog networks were inspected for significant bridging by SGI links. An SGI link is considered to bridge a pair of subnetworks if it connects a gene in one subnetwork to a gene in another subnetwork. The total number of bridges was counted for each pair of subnetworks. The significance of the number of bridges for each subnetwork pair was then determined with a standard Z-score transformation using the mean and standard deviation of the number of bridges between that subnetwork pair in 1,000 randomly permuted SGI networks (see Additional data file 14 for evidence that a normal approximation in the Z-score transformation is valid). In addition to a cutoff of P < 0.01, a subnetwork pair was required to have at least three bridges to be considered significantly bridged.
We estimated the significance of the number of significantly bridged subnetwork pairs by comparing to the number of pairs significantly bridged by permuted SGI networks. Each of the 1,000 randomly permuted SGI networks was used to search for significantly bridged subnetwork pairs using the same method described above for the true SGI network. The mean and standard deviation of the number of significantly bridged subnetwork pairs were then calculated across all permuted networks. The number of subnetwork pairs significantly bridged by the SGI network was then compared to these values using a standard Z-score transformation to obtain a single significance value.
To measure the propensity for a given data type to bridge subnetworks more than expected by chance, we restricted our analysis to all subnetwork-to-subnetwork links (SSLs). We defined an SSL as a linked gene pair (A,B) in which both A and B were included in at least one broad subnetwork of any data type. Over all SSLs we counted the number of 'supports', those links in which genes A and B occurred in the same subnetwork, as well as 'bridges', those links in which A and B occurred in separate subnetworks. Links that both bridge and support were counted as supports. The 'bridging fraction' was then calculated as the total number of bridges divided by the total number of SSLs. The observed bridging fraction was calculated using all SSLs in the network. The expected bridging fraction was calculated using all SSLs tested in the dataset. To measure the tendency for a given data type to link across versus within broad subnetworks, we calculated the 'bridging propensity' as the observed bridging fraction divided by the expected bridging fraction, minus 1. Positive bridging propensities are indicative of a link type tending to bridge (as opposed to fall within) broad subnetworks more than expected by chance.
To determine if the same subnetwork pairs were bridged in worm and yeast, we identified significantly bridged subnetwork pairs separately in each species. We used a compendium of SGI and Lehner et al.  interactions for worm, and transposed SGA links for yeast. We examined all pairs of subnetworks and broad subnetworks separately. We calculated the expected number of bridges as the number of possible (tested) gene pairs between the subnetworks times the probability of linking a gene pair for that data type. An estimate of the probability of a data type linking a gene pair was calculated as the number of links in its network divided by the number of possible (tested) links. This yielded an estimated background probability of 0.039 for worm, and 0.034 for yeast.
To determine the degree of subnetwork bridging conservation among all possible pairs of subnetworks, we created contingency tables containing the observed and expected number of subnetwork pairs significantly bridged only in worm, only in yeast, in both, and in neither. The expected number of pairs for each of these four categories was then calculated, assuming independence of worm and yeast bridging. We first calculated the worm bridging probability, P w (P y for yeast), as the number of bridged subnetwork pairs divided by the total number of pairs, N. The expected number of subnetwork pairs bridged only in worm was then calculated as NP w (1 - P y ). Likewise, the expected number of bridged pairs only in yeast was calculated as N (1 - P w ) P y . The expected number of bridged pairs in both species was calculated as NP w P y . Finally, the expected number of pairs bridged by neither was N (1 - P w )(1 - P y ). We used a chi-square test with 3 degrees of freedom to determine if the observed and expected counts for each of these categories were significantly different.
Additional data file 1 is a table listing average growth scores for each query-target pair tested in the SGI analysis. Additional data file 2 is a table listing the distribution of functional categories within the LGIII set. Additional data file 3 is a table listing gene interactions in networks created for this study. Additional data file 4 is a table with a sorted list of average interaction strengths for each query-target pair tested. Additional data file 5 contains a detailed assessment of the nature of the SGI interactions. Additional data file 6 is a table listing reciprocal query-query interactions. Additional data file 7 is a clustered table of growth scores. Gene function descriptions are from WormBase version 170 . Additional data file 8 is a table listing multiply supported subnetworks enriched for genes with similar GO annotations. Additional data file 9 is a table listing genes and functional annotations for all subnetworks. Additional data file 10 is a table listing 33 focused subnetwork pairs along with the corresponding enrichment of SGI links that bridge them. Additional data file 11 is a table comparing bridging propensities among high-throughput datasets. Additional data file 12 is a table listing all functional categories and their associated genes. Additional data file 13 is a figure plotting precision levels of networks created using various cutoffs of the LOFA and PCC scores against network size. All files are also accessible at . Additional data file 14 presents evidence supporting the validity of using normal approximation of the Z-transformation to estimate bridging significance.
We thank Andrew Spence, Charlie Boone, Gary Bader, Jeff Wrana, and Brenda Andrews for helpful comments on the work and the manuscript. We thank Jason Moffat for efforts at the proof-of-principle stage and thank Theresa Stiernagle and the C. elegans Genetic Center, which is funded by the NIH National Center for Research Resources, for several worm strains used in this work. This work was supported by a Canadian Institute of Health Research operating grant and infrastructure awards from the Canadian Foundation for Innovation and Genome Canada to P.J.R. J.M.S. was supported by a grant from the National Science Foundation's Division of Biological Infrastructure DBI-0543197 and by a grant from the Alfred P. Sloan foundation. M.T.W. was supported by a National Institutes of Health training grant 1 T32 GM070386-01. M.K. was supported by a training grant from the California Institute of Regenerative Medicine.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.