Network motifs have previously been sought in simple networks [5–7, 10, 11] and recently in an integrated network of transcriptional regulation and protein-protein interaction . In this study, we sought network motifs in an integrated S. cerevisiae network with five types of biological interaction. We identified many significantly enriched motifs, which fall into several classes with distinct biological implications, revealing the interplay of different types of biological interaction in local network neighborhoods. Previously, motifs have been described as elementary building blocks of complex networks [5–7, 9, 11]. Here, we describe network themes – recurring higher-order interconnection patterns that encompass multiple occurrences of network motifs. We show that the abundance of most motifs in the integrated S. cerevisiae network can be explained in terms of a network theme.
Network themes represent a more fundamental level of abstraction that may often be preferable to network motifs for several reasons. Network motifs have been defined with artificial restrictions on the number of nodes and the specific interconnection patterns, and gene triads or tetrads corresponding to these motifs often do not exist in isolation in the network. Rather, they often overlap extensively with one another to form higher-order structures corresponding in many cases to known biological phenomena; this is supported by observations from other studies [9, 10]. This phenomenon suggests that motifs are often not 'atomic' elements of the network, but are instead signatures or symptoms of more fundamental higher-order structures, or network themes. Although many motifs can be explained in terms of higher-order themes, some network motifs have an elemental function that is preserved even when that motif is embedded within a larger theme. This was demonstrated, for example, by Alon and colleagues for the coherent feed-forward loop .
In addition to the network themes and motifs depicted in Figure 1a–g, there are five motifs that we did not categorize (Figure 1h). Each of these motifs contains: a transcriptional regulation link, with a third node connecting to the transcription factor and its target via two stable physical interactions (motif H1); two sequence homology links (motif H2); one correlated expression link and one homology link, respectively (motif H3); one homology link and one correlated expression link, respectively (motif H4), or two correlated expression links (motif H5). Given that physical interaction links are mostly transitive, motif H1 indicates that transcription factors often co-complex with the target proteins they regulate, and suggests a mechanism of feedback regulation for transcription through protein-protein interaction. Motif H2 implies sequence homology between a transcription factor and its target, given the near transitivity of homology links. Such homology may seem unexpected but can be explained if there is frequent serial regulation of one transcription factor by another, since transcriptional factors often share homology, for example in their DNA binding domains. Motif H5 may be due simply to the overlap between transcriptional regulation links and correlated expression links, and the near transitivity of correlated expression links. The implications of motifs H3 and H4 are unclear to us; they might represent currently unknown trends in transcriptional regulatory mechanism. We hope to address some of these questions in the future by investigating the roles of genes in the subnetworks corresponding to the motifs (for example, whether the target gene in motif H2 is often a transcription factor).
Both network motifs and themes represent network characteristics that can be exploited to predict individual interactions given sometimes-uncertain experimental evidence. As has recently been shown, integration of multiple evidence types [22, 36–38] can be successfully used to predict protein-protein interactions and synthetic genetic interactions, or to stratify them by confidence. In addition, the dense local neighborhood characteristic of the protein-protein interaction network can be exploited to predict protein-protein interactions [39–42]. This idea, extended to multi-color network motifs, allows us to make predictions based on topological relationships involving multiple types of links. In particular, we may predict a certain type of link between a given pair of nodes if its addition would complete a structure matching an enriched network motif. For example, two genes with a common SSL interaction partner may have increased probability of protein-protein interaction, because the addition of a protein-protein interaction link between these two genes results in a match to motif G1 (Figure 1g). Similarly, an SSL link between two genes can complete a match to motif G1 if the two genes are connected to a third gene by a protein-protein interaction link and an SSL link, respectively (Figure 1g). Such a 'two-hop physical-SSL' relationship has been recently shown to be a strong predictor of SSL interaction . An interaction can also be predicted if its addition fits into a recurring network theme. For instance, there are significantly enriched SSL interactions between the ER protein-translocation subcomplex and the Gim complex (Figure 2). However, no SSL interactions have been observed between Sec62 or Sec63, two members of the ER protein-translocation subcomplex and any protein in the Gim complex because Sec62 and Sec63 were not used as queries in the SGA analysis . We therefore hypothesize that Sec62 or Sec63 has SSL interactions with many members of the Gim complex.
In addition, since themes represent the network organization at the functional level, they can also be used to predict functions for genes involved in a specific theme. For example, in the feed-forward theme depicted in Figure 1a, most of the genes regulated by both Mcm1 and Swi4 are involved in control or execution of the cell cycle. We therefore hypothesize that Yor315w, a protein of unknown function, is involved in the cell cycle. More refined hypotheses can be achieved by incorporating other information such as sequence data and expression profiles. Predictions based on network themes may be robust with respect to errors in the input data, since they depend on connectivity patterns in extended network neighborhoods instead of one or very few links.
To assess whether SSL interactions involving essential genes are enriched in subgraphs matching the motifs, we counted, for each motif containing an SSL link, the fraction of subgraphs with at least one SSL interaction involving an essential gene. The results are summarized in Additional data file 2. In the SGA analysis, 11 of the 132 query genes are essential. Among the 3,060 SSL interactions, 322 of them (10.5%) involve an essential gene. Results for the network motifs are mostly consistent with this frequency of essentiality: for most motifs (E1, E2, E3, G1, G4 and G5), approximately 10% of the matching subgraphs contain SSL interactions involving an essential gene (see Additional data file 2). It is interesting, however, that subgraphs matching motifs F1 and F3 are particularly enriched with SSL interactions involving essential genes (36.4% and 24.4%, respectively). This suggests that SSL interactions within a protein complex may often involve essential genes.
Each network theme has a different biological implication, and each permits a natural simplification of the integrated network. To demonstrate this, we produced thematic maps of compensatory complexes and of regulonic complexes. The map of compensatory complexes identifies specific protein complexes with overlapping or compensatory function. Many of the links connect functionally related complexes, as supported by previous experimental evidence. For example, the replication complex, is 'genetically connected' to the Mre11/Rad50/Xrs2 complex , the Rad54-Rad51 complex , and the Rad17/Mec3/Ddc1 complex . The first two function in the repair of double-strand DNA breaks [44, 46] and the third is required for cell-cycle checkpoint control after DNA damage , both of which are associated with DNA replication. The histone deacetylase B (HDB) complex [48, 49] is linked to the SAGA complex ; both of these affect histone acetylation and are important components of transcriptional regulation . There are also some unverified but intriguing links, such as the one between the Gim complex  and the CCAAT-binding factor , which connects two seemingly unrelated complexes (Figure 3). The potential functional relationship between these complexes awaits further experimental validation.
Novel predictions for synthetic sick or lethal interactions can be made from the thematic map of compensatory complexes. Specifically, we can predict any two proteins to have an SSL interaction if they are members of two separate complexes bridged by a link in the map. There were 1,134 such protein pairs that had not been previously tested by the SGA study used to derive the compensatory complex map. We sought independent validation of these predictions among published smaller-scale studies of genetic interaction. We conservatively estimate that 10% of these pairs will have been examined for genetic interaction (note that Tong et al. , the largest systematic study to date, examined only approximately 4% of all gene pairs). Therefore, we might only hope to find approximately 113 validated pairs (10% of 1,134 predictions). Tong et al.  observed the baseline rate of SSL interaction to be 0.5%, so by chance we might expect to find fewer than one SSL interaction (0.5% of 10% of 1,134). Our literature search revealed ten gene pairs with known SSL interactions among the predictions: Arp2-Myo1 , Vrp1-Myo1 , Las17-Myo1 , Bem1-Myo1 , Rvs167-Myo1 , Rvs167-Myo2 , Smy1-Pfy1 , Rad50-Cdc2 [57, 58], Rad54-Cdc2 , and Rad51-Cdc2 . From this we conservatively estimate a success rate of around 9%, demonstrating the value of the thematic map in predicting new SSL interactions. Our use of the thematic map to predict genetic interactions differs from the previous prediction approach based on two-hop physical-SSL interactions  in that here we required a greater abundance of SSL interactions between two protein complexes than would be expected by chance, whereas previous work did not exploit the number of observed two-hop physical-SSL interactions. Furthermore, the thematic map approach has the potential to predict genetic interaction between two genes even if neither gene has any previously known SSL interactions.
In producing the thematic map of compensatory complexes, the statistical power was limited because only 4% of yeast gene pairs have been examined for synthetic genetic interactions . Many compensatory complex pairs have escaped detection because too few inter-complex protein pairs have been tested for SSL to achieve statistical significance. We expect this map to grow substantially as large-scale studies of genetic interaction proceed . In higher organisms for which exhaustive determination of genetic interaction is a more distant goal, we may advance our understanding more rapidly by choosing a 'scaffold' set of genes such that each known or hypothesized protein complex or pathway is represented by at least one query gene in an SSL screen.