- Research article
- Open Access
Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae
- Teresa Reguly†1,
- Ashton Breitkreutz†1,
- Lorrie Boucher†1, 2,
- Bobby-Joe Breitkreutz†1,
- Gary C Hon3,
- Chad L Myers4, 5,
- Ainslie Parsons2, 6,
- Helena Friesen6,
- Rose Oughtred4,
- Amy Tong2, 6,
- Chris Stark1,
- Yuen Ho6,
- David Botstein4,
- Brenda Andrews2, 6,
- Charles Boone2, 6,
- Olga G Troyanskya4, 5,
- Trey Ideker3,
- Kara Dolinski4,
- Nizar N Batada†1 and
- Mike Tyers1, 2Email author
© Reguly et al.; licensee BioMed Central Ltd. 2006
- Received: 18 October 2005
- Accepted: 30 March 2006
- Published: 8 June 2006
The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference.
We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID (http://www.thebiogrid.org) and SGD (http://www.yeastgenome.org/) databases.
Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks.
- Gene Ontology
- Fluorescence Resonance Energy Transfer
- Additional Data File
- Cluster Coefficient
- Biological Network
The molecular biology, biochemistry and genetics of the budding yeast Saccharomyces cerevisiae have been intensively studied for decades; it remains the best-understood eukaryote at the molecular genetic level. Completion of the S. cerevisiae genome sequence nearly a decade ago spawned a host of functional genomic tools for interrogation of gene and protein function, including DNA microarrays for global gene-expression profiling and location of DNA-binding factors, and a comprehensive set of gene deletion strains for phenotypic analysis [1, 2]. In the post-genome sequence era, high-throughput (HTP) screening techniques aimed at identifying novel protein complexes and gene networks have begun to complement conventional biochemical and genetic approaches [3, 4]. Systematic elucidation of protein interactions in S. cerevisiae has been carried out by the two-hybrid method, which detects pair-wise interactions [5–7], and by mass spectrometric (MS) analysis of purified protein complexes [8, 9]. In parallel, the synthetic genetic array (SGA) and synthetic lethal analysis by microarray (dSLAM) methods have been used to systematically uncover synthetic lethal genetic interactions, in which non-lethal gene mutations combine to cause inviability [10–13]. In addition to HTP analyses of yeast protein-interaction networks, initial yeast two-hybrid maps have been generated for the nematode worm Caenorhabditis elegans, the fruit fly Drosophila melanogaster and, most recently, for humans [14–17]. The various datasets generated by these techniques have begun to unveil the global network that underlies cellular complexity.
The networks implicit in HTP datasets from yeast, and to a limited extent from other organisms, have been analyzed using graph theory. A primary attribute of biological interaction networks is a scale-free distribution of connections, as described by an apparent power-law formulation . Most nodes – that is, genes or proteins – in biological networks are sparsely connected, whereas a few nodes, called hubs, are highly connected. This class of network is robust to the random disruption of individual nodes, but sensitive to an attack on specific highly connected hubs . Whether this property has actually been selected for in biological networks or is a simple consequence of multilayered regulatory control is open to debate . Biological networks also appear to exhibit small-world organization - namely, locally dense regions that are sparsely connected to other regions but with a short average path length [21–23]. Recurrent patterns of regulatory interactions, termed motifs, have also recently been discerned [24, 25]. In conjunction with global profiles of gene expression, HTP datasets have been used in a variety of schemes to predict biological function for characterized and uncharacterized proteins [3, 26–32]. These initial network approaches to system-level understanding hold considerable promise.
Despite these successes, all network analyses undertaken so far have relied exclusively on HTP datasets that are burdened with false-positive and false-negative interactions [33, 34]. The inherent noise in these datasets has compromised attempts to build a comprehensive view of cellular architecture. For example, yeast two-hybrid datasets in general exhibit poor concordance . The unreliability of such datasets, together with the still sparse coverage of known biological interaction space, clearly limit studies of biological networks, and may well bias conclusions obtained to date.
A vast resource of previously discovered physical and genetic interactions is recorded in the primary literature for many species, including yeast. In general, interactions reported in the literature are reliable: many have been verified by multiple experimental methods and/or more than one research group; most are based on methods of known sensitivity and reproducibility in well controlled experiments; most are reported in the context of supporting cell biological information; and all have been subjected to the scrutiny of peer review. But while publications on individual genes are readily accessed through public databases such as PubMed, the embedded interaction data have not been systematically compiled in a searchable relational database. The Yeast Proteome Database (YPD) represented the first systematic effort to compile protein-interaction and other data from the literature ; but although originally free of charge to academic users, YPD is now available only on a subscription basis. A number of important databases that curate protein and genetic interactions from the literature have been developed, including the Munich Information Center for Protein Sequences (MIPS) database , the Molecular Interactions (MINT) database , the IntAct database , the Database of Interacting Proteins (DIP) , the Biomolecular Interaction Network Database (BIND) , the Human Protein Reference Database (HPRD) , and the BioGRID database [43, 44]. At present, however, interactions recorded in these databases represent only partial coverage of the primary literature. The efforts of these databases will be facilitated by a recently established consortium of interaction databases, termed the International Molecular Exchange Consortium (IMEx) , which aims both to implement a structured vocabulary to describe interaction data (the Protein Standards Initiative-Molecular Interaction, PSI-MI ) and to openly disseminate interaction records. A systematic international effort to codify gene function by the Gene Ontology (GO) Consortium also records protein and genetic interactions as functional evidence codes , which can therefore be used to infer interaction networks .
Despite the fact that many interactions are clearly documented in the literature, these data are not yet in a form that can be readily applied to network or system-level analysis. Manual curation of the literature specifically for gene and protein interactions poses a number of problems, including curation consistency, the myriad possible levels of annotation detail, and the sheer volume of text that must be distilled. Moreover, because structured vocabularies have not been implemented in biological publications, automated machine-learning methods are unable to reliably extract most interaction information from full-text sources . Budding yeast represents an ideal test case for systematic literature curation, both because the genome is annotated to an unparalleled degree of accuracy and because a large fraction of genes are characterized . Approximately 4,200 budding yeast open reading frames (ORFs) have been functionally interrogated by one means or another . At the same time, because some 1,500 are currently classified by the GO term 'biological process unknown', a substantial number of gene functions remain to be assigned or inferred.
Here we report a literature-curated (LC) dataset of 33,311 protein and genetic interactions, representing 19,499 non-redundant interactions, from a total of 6,148 publications in the primary literature. The low overlap between the LC dataset and existing HTP datasets suggests that known physical and genetic interaction space may be far from saturating. Analysis of the network properties of the LC dataset supports some conclusions based on HTP data but refutes others. The systematic LC dataset improves prediction of gene function and provides a resource for future endeavors in network biology.
A search of the available online literature in PubMed yielded 53,117 publications as of November 1, 2005 that potentially contain interaction data on one or more budding yeast genes and/or proteins. A total of 5,434 of the 5,726 currently predicted proteins  are referred to at least once in the primary literature. All abstracts associated with yeast gene names or registered aliases were retrieved from PubMed and then examined by curators for evidence of interaction data. Where available, the full text of papers, including figures and tables, was read to capture all potential protein and genetic interactions. A curation database was constructed to house protein-protein, protein-RNA and gene-gene interactions associated with all known or predicted proteins in S. cerevisiae, analogous in structure to the BioGRID interaction database [43, 53]. Each interaction was assigned a unique identifier that tracked the source, date of entry, and curator name. To expedite curation, we recorded the direct experimental evidence for interactions but not other potentially useful information such as strain background, mutant alleles, specific interaction domains or subcellular localization. Interactions reported in reviews or as unpublished data were not considered sufficiently validated. Protein-RNA and protein-DNA associations detected by genome-wide microarray methods were also not included in the dataset. Finally, we did not record interactions between S. cerevisiae genes/proteins and those of another species, even when such interactions were detected in yeast.
Abstracts were inspected with efficient web-based tools for candidate interaction data. Of the initial set of 53,117 abstracts, 21,324 were immediately designated as 'wrong organism', usually because of a direct reference to a yeast homolog or to a yeast two-hybrid screen carried out with a non-yeast bait (that is, the capturing protein) and library. This class of incorrect assignment is not easily recognized by text-mining algorithms but is readily discerned by curators. Of the remaining 31,793 yeast-specific abstracts, 9,145 were associated with accessible electronic versions of the full paper, which were then manually curated for protein and genetic interactions by directly examining data figures and tables.
We defined a minimal set of experimental method categories to describe the evidence for each recorded interaction (see Materials and methods for definitions). Physical interactions were divided into eight in vivo categories (affinity capture-mass spectrometry, affinity capture-western, affinity capture-RNA, co-fractionation, co-localization, co-purification, fluorescence resonance energy transfer (FRET), two-hybrid) and six in vitro categories (biochemical activity, co-crystal structure, far western, protein-peptide, protein-RNA, reconstituted complex). In each of these categories, except co-purification, the protein-interaction pair corresponded to that described in the experiment, typically as the bait and prey (that is, the capturing protein and the captured protein(s), respectively). For co-purification, in which a purified intact protein complex is isolated by conventional chromotography or other means, a virtual bait was assigned (see Material and methods). A final biochemical interaction category, called co-purification, was used to indicate a purified intact protein complex isolated by conventional chromatography or other means. Genetic interactions were divided into eight categories (dosage growth defect, dosage lethality, dosage rescue, phenotypic enhancement, phenotypic suppression, synthetic growth defect, synthetic lethality, synthetic rescue). Genetic interactions with RNA-encoding ORFs were not scored separately from protein-coding genes. In rare instances in which an interaction could not be readily assigned a protein or genetic interaction category, the closest substitute was chosen and an explanation of the exact experimental context was noted in a free-text qualification box.
Number of total nodes
Number of edges
Number of baits
Number of sources
Total interactions (includes self edges, multiple sources/experimental systems, RNA genes)
Total LC (LC-PI+LC-GI)
Filtered interactions (excludes self edges, redundant edges, RNA genes in LC-PI)
Total PI (HTP-PI+LC-PI)
Total GI (HTP-GI+LC-GI)
Total (Total PI + Total GI)
Total LC (LC-PI+LC-GI)
Two genetic interaction (GI) datasets were constructed as follows. All data derived from systematic SGA and dSLAM approaches were grouped into a single dataset termed HTP-GI that contained 6,103 nonredundant interactions. This designation was possible because each SGA or dSLAM screen is carried out on a genome-wide scale using the same set of deletion strains [10, 12, 13]. We note that most SGA and dSLAM genetic interactions reported to date have been independently validated by either tetrad or random spore analysis. All other genetic interactions determined by conventional means were combined to form a dataset termed LC-GI dataset that contained 8,165 nonredundant interactions. The combined LC-GI and HTP-GI datasets contain 13,963 unique interactions (Table 1).
To benchmark our curation effort, we assessed the overlap between the LC interaction dataset and interactions housed in the MIPS, BIND, and DIP databases [37, 40, 41]. Interactions attributed to 1,773 publications that were shared between at least one of these databases and the LC dataset were reinvestigated in detail. Depending on the particular comparison dataset, the false-negative rate for the LC dataset ranged from 5% to 20%, whereas the false-negative rates for other datasets varied from 36% to 50% (see Additional data files 2 and 3). To estimate our curation fidelity more precisely, 4,111 LC interactions between 1,203 nodes in a recently defined network termed the filtered yeast interactome (FYI)  were re-examined interaction-by-interaction and found to contain curation errors at an overall rate of around 4% (see Additional data file 3). All errors and missing interactions detected in these comparative analyses were corrected in the final dataset. Discordances between the different datasets underscore the need for parallel curation efforts in order to maximize curation coverage and accuracy.
Overview of the LC dataset
The final LC dataset contains 33,311 physical and genetic interactions, representing 19,499 nonredundant entries derived from 6,148 different publications. The total size of the LC dataset exceeds that of all combined HTP datasets published before 1 November, 2005 (Figure 1a). The rate of growth of publications that document interactions in budding yeast has seemingly reached a plateau of about 600 publications per year, while the total number of interactions documented per year has on average continued to increase (Figure 1b). Protein interactions were supported mainly by three experimental methods: affinity capture with mass spectrometric detection, affinity capture with western blot detection, and two-hybrid assays (Figure 1c). In addition, 258 protein complexes were biochemically purified, minimally representing 1,104 interactions (see Additional data file 1 for a list of purified complexes). More arduous techniques such as FRET and structure determination of protein complexes accounted for far fewer interactions. Genetic interactions were documented by a spectrum of techniques, with some propensity towards synthetic lethal and dosage rescue interactions (Figure 1c). The numbers of interactions in each experimental method category are listed in Additional data file 1.
The distinction between HTP surveys and meticulous focused studies cannot be made by a simple cutoff in the number of interactions. Genetic interactions are usually robust, so the distinction by interaction number is less critical. Protein interactions on the other hand are inherently more variable, and as a consequence are usually validated by well controlled experiments in most focused studies. Approximately 50% of the LC-PI dataset derives from recent publications that report 50 or more protein interactions (Figure 1d). In many of these publications, interactions are interrogated via multiple bait proteins, typically by mass spectrometric or two-hybrid analysis. While not all of these interactions are individually validated in replicate experiments, in most cases there is sufficient experimental signal (for example, peptide coverage by mass spectrometry or different interacting fragments by two-hybrid) and overlap between different experiments that reasonable confidence is warranted. We designated these publications as systematic interrogation (SI) to indicate that most interactions are verified and of reasonable confidence. Five other publications designated as HTP surveys (HS) reported single broad screens that contained a total of 870 interactions, including interactions inferred from covalent modifications such as phosphorylation and conjugation of ubiquitin-like modifiers (ULMs). Systematic interrogation and HTP survey data were included in the LC-PI dataset for the purposes of network analysis below. For future applications of the dataset, publications that contain SI or HS interactions, as well as any posttranslational modifications associated with interactions, are listed in Additional data file 1. Because all interactions are documented both by PubMed identifiers and by a structured vocabulary of experimental evidence, these potentially less well substantiated interactions or data types can be readily removed from the dataset if desired.
Replication and bias of interactions
It has been noted that persistently cited genes are not more connected than average, based on HTP networks . To reveal potential bias in the extent of investigation of any given node in the LC datasets, we determined the number of total interactions (that is, including redundant interactions) in excess of connectivity for each node (see Materials and methods). Within the LC-PI and LC-GI datasets, it is evident that the more a protein or gene is studied, the more connections it is likely to exhibit (Figure 2c). A modest study bias of 23% towards essential genes was evident in the LC-PI dataset (Figure 2d). Whether these effects are due to increased coverage upon further study or the tendency of highly connected proteins to be studied in more detail is unclear.
Finally, we determined the extent to which evolutionarily conserved proteins are studied in each dataset. Each dataset was binned according to conservation of yeast proteins across seven species using the Clusters of Orthologous Groups (COG) database . The HTP datasets were enriched towards nonconserved proteins, whereas the LC datasets were enriched for proteins conserved across the seven eukaryotic test species (Figure 2e). This bias probably reflects the tendency to study conserved proteins, which are more likely to be essential [57, 58].
GO coverage and coherence
To assess the coherence of each interaction dataset, we then determined the fraction of interactions that contained the same high level GO terms for each interaction partner across each of the GO categories (Figure 3b). By this criterion, the LC datasets were more coherent than the HTP datasets. This result reflects the higher false-positive rates in the HTP datasets, the higher incidence of uncharacterized genes in HTP datasets and also the potential for genome-wide approaches to identify new connections between previously unrelated pathways.
Size estimate of the global protein-interaction network
On the basis of analysis of both two-hybrid HTP datasets and combined HTP and MIPS datasets, it has been estimated that there are on average five interaction partners per protein in the yeast proteome, and that by extrapolation the entire proteome contains 16,000–26,000 interactions . Similar estimates of 20,000–30,000 interactions have been obtained by scaling the power-law connectivity distribution of an integrated data set of HTP interactions  and by the overlap of the HTP and MIPS datasets . To reassess these estimates based on our LC-PI dataset, we began with the observation that the current LC-PI network contains roughly half of all predicted yeast proteins. We partitioned nodes into two sets, namely those nodes present in the LC-PI network (called S = seen, S × S defines the LC-PI dataset) and those nodes absent from the LC-PI network (called U = unseen). As U is about the same size as S, if the density of U × U is no more than that of S × S, then U × U will at most contain around 10,000 interactions. Similarly, because U × S is twice the size of U × U or S × S, it will contain 20,000 interactions. The sum total of all interactions predicted from LC-PI is thus 40,000. This estimate is subject to two countervailing reservations: the density of U × U may in fact be lower than for S regions (see below), while conversely, the current density of S × S may be an underestimate. The observations that well studied proteins are more highly connected and that the HTP-PI datasets undoubtedly contain bona fide interactions not present in S × S suggest that the density of S will certainly increase with further investigation. Extrapolations based on either mean node degree or degree distribution of LC-PI yielded values in the range of 21,000 to 40,000 interactions, again assuming that the density of S × S is saturating (data not shown).
Coverage in HTP datasets
To examine the false-negative rate in HTP-PI datasets, we directly compared the LC-PI dataset to four extant HTP-PI datasets, two from large-scale two-hybrid analysis [5, 7] and two from large-scale mass spectrometric identification of affinity-purified protein complexes [8, 9]. Two-hybrid datasets tend to have a high rate of false-positive hits [33–35]; consistently, only 2–3% of interactions reported in two-hybrid screens have been substantiated elsewhere in the literature to date (Figure 4d). Because affinity-purification methods directly capture interaction partners in a physiological context, HTP mass spectrometric datasets fared somewhat better: around 9% of the 3,402 interactions reported by Gavin et al.  and around 4% of the 3,683 interactions reported by Ho et al.  have been documented elsewhere in the literature (Figure 4d).
Given that the HTP mass spectrometric studies were initiated with largely nonoverlapping sets of baits that represented only around 10% of the yeast proteome [8, 9], we also assessed the extent to which these datasets captured known interactions for successful bait proteins. By this criterion, the Gavin datasets recapitulated around 30% of literature interactions, while the Ho dataset recapitulated around 20% of literature interactions. It was not possible to compare overall success rates for all HTP datasets because unsuccessful baits were not unambiguously identified in three of the studies [5, 7, 9]. We note that simple benchmark comparisons of HTP datasets may be confounded by bias in each dataset. For example, the average clustering coefficient in the LC-PI network was significantly higher for the set of baits used in the Gavin versus the Ho datasets (0.43 versus 0.39, P = 0.01) and so a higher rate of recovery is expected in the former.
The overlap between the LC-GI and HTP-GI datasets was also minimal at 305 interactions, or less than 5% of either dataset (Figure 4a,d). In part, this minimal overlap was due to the different nature of query genes in each dataset. In the primary literature, genetic interactions have traditionally been sought with conditional alleles of essential genes, whereas most HTP screens to date have used nonessential genes to query the haploid genome-wide deletion set, which by definition lacks all essential genes [10, 12, 13]. Consistently, essential nodes account for less than 6% of the overlap dataset (see Additional data file 1). In addition, because the HTP-GI dataset is composed almost entirely of synthetic lethal interactions (see Additional data file 1), whereas the LC-GI dataset contains all types of genetic interactions, the potential for overlap is further minimized. Indeed, about 80% of the overlap was accounted for by LC-GI synthetic lethal interactions (see Additional data file 1). As synthetic lethal interaction space is estimated at 200,000 interactions [12, 60], both the LC-GI and HTP-GI datasets still only sparsely sample the global network.
Finally, various methods have been used to combine and refine HTP data. These methods substantially improved overlap with literature-derived interactions. For example, of about 2,500 interactions in a high-confidence distillation of HTP datasets, termed the FYI dataset , 60% were present in the LC-PI the dataset, while of the 2,455 interactions in another high-confidence dataset , 32% were present in the LC-PI dataset. While combined datasets ameliorate the problem of false-positive interactions, such combinations are by definition still prone to false-negative interactions.
Degree distribution of the LC network
It has been argued recently that the power-law distribution observed for some biological networks is an effect of frequency-degree plots and not an intrinsic network property . To assess this possibility, we reanalyzed each network as a rank-degree plot and determined goodness of fit for both linear and exponential curves. In all cases except LC-GI, a linear fit was better than an exponential fit, as judged by the coefficient of determination (Figure 5b). Even for the LC-GI network, a linear fit was nearly as good as an exponential fit. By the more stringent rank-degree plot criterion, we thus conclude that the LC and HTP networks obey a power-law distribution. Finally, it has also recently been noted that essential nodes form an exponential distribution in a HTP protein-interaction network . We consistently find that the essential subnetwork of the LC-PI dataset is best fitted by an exponential distribution, whereas the residual nonessential network follows a power law (N.N.B., unpublished data).
Essentiality, connectivity, and local density
The propensity of essential proteins to connect more frequently than nonessential proteins prompted us to reexamine the issue of essential-essential connections. From the analysis of HTP datasets, it has previously been reported that interactions between highly connected proteins appear to be suppressed . In both the LC-PI and HTP-PI datasets, however, there was in fact a fourfold enrichment for essential-essential interactions (Figure 6b). The neighborhoods of essential proteins in both networks were significantly enriched in essential proteins when compared with the neighborhoods of nonessential proteins (for essentials <LC-PI> = 0.64 and <HTP-PI> = 0.48; for nonessentials <LC-PI> = 0.36 and <HTP-PI> = 0.27; P < 0.01 in each case). This effect has also recently been adduced from HTP data . The LC-PI network exhibited a higher local density of essential interactions than the HTP-PI network as the fraction of essential neighbors in LC-PI was 35% greater than in HTP-PI and the fraction of essential proteins that were surrounded by only essential proteins in LC-PI was twice that in HTP-PI (Figure 6c). Significantly, comparison of an LC-PI subnetwork constructed of only essential proteins to an LC-PI subnetwork of nonessential proteins revealed that the former was fourfold more dense, more fully connected (91% versus 74% of nodes in the largest component), and more tightly connected (average clustering coefficient of 0.5 versus 0.3, see below). These essential-essential interactions were likely to be of functional relevance because the LC-GI dataset exhibited twice as many essential-essential interactions as expected (Figure 6b).
A primary attribute of each node is its clustering coefficient, which is a measure of local interaction density, defined as the percentage of node neighbors that also interact with each other. A clustering coefficient near 0 occurs when almost none of the neighbors is connected to each other, whereas a clustering coefficient near 1 occurs when many neighbors are connected to each other. Accordingly, proteins that are part of a multiprotein complex should have a high clustering coefficient. For all values of clustering coefficient (except 0), the mean clustering coefficient for the LC-PI network was greater than that of the HTP-PI network, often by more than one order of magnitude (Figure 6d, top). The mean clustering coefficient of the LC-PI network was 34% larger in magnitude than for the HTP-PI network. Ignoring the trivial case for nodes of degree 1, which by definition have the maximal clustering coefficient of 1 (that is, 26% of all nodes in LC-PI and 32% of all nodes in HTP-PI), 8% of all LC-PI nodes with degree higher than 2 were fully connected (that is, clustering coefficient of 1), compared with only 2% of all HTP-PI nodes. In contrast, the distributions of clustering coefficients for the LC-GI and HTP-GI networks were very similar, as was the average clustering coefficient (Figure 6d, bottom). For all four networks, the clustering coefficients were negatively correlated with connectivity, suggesting that locally dense interactions may limit the overall number of interaction partners that can access nodes within these regions.
Overlap between protein and genetic networks
Correlations with protein abundance, localization, and expression
The localization of a large fraction of predicted proteins in yeast has also recently been determined . Proteins that interact must at least partially overlap in subcellular location, and indeed, co-localization may be essential to drive interaction equilibrium for low-abundance proteins . This expectation is borne out, as protein co-localization in the same compartment was significantly enriched for physical interaction pairs in the LC-PI dataset, whereas potential inter-compartment interactions were significantly under-represented (Figure 8b). Similar conclusions have been drawn previously for HTP datasets . Although less pronounced, the correlation with subcellular localization also extended to genetic-interaction pairs (Figure 8b).
Analysis of HTP datasets in conjunction with genome-wide expression profiles across many experimental conditions has demonstrated that physical interaction partners are encoded by genes that tend to be co-regulated [26, 70]. As judged by the Pearson correlation coefficients (PCC) for a compendium of 304 different genome-wide expression profiles , this propensity for co-regulation holds in the LC dataset, for both physical and genetic interactions (see Additional data file 3). Although highly statistically significant, the enrichment for positive over negative expression correlation was only around 5% for either dataset, such that this parameter only weakly predicts interactions. We also assessed the fraction of interaction partners that shared at least one transcription factor, as defined in genome-wide location studies . For interaction pairs where each respective gene is bound by one or more transcription factors, 24% (397/1,637) of pairs in the LC-PI dataset had at least one shared transcription factor, compared with 15% (229/1,422) of pairs in the HTP-PI dataset. This significant difference (Fisher's exact test, P < 2 × 10-8, two-tailed) suggested that LC-PI was enriched for interactions between co-regulated proteins. For the LC-GI and HTP-GI datasets, shared transcription factors were found in 16% and 17% of pairs (229/1,422 and 117/672, respectively), a nonsignificant difference (Fisher's exact test, P = 0.45, two-tailed). For all datasets, these transcription factor co-location values were at least seven standard deviations from the mean calculated for a similar number of random pairs, consistent with the tendency of interacting proteins and genes to be coexpressed.
Predictive power of the LC dataset
Recent developments in methods for gene or protein function prediction suggest that probabilistic integration of diverse genomic data is a powerful approach to the annotation of uncharacterized genes. Given its precision and substantial coverage, the LC dataset should augment these approaches. We have recently constructed a Bayesian network that integrates affinity precipitation, two-hybrid, synthetic lethality, and microarray correlation data . The performance of this network was dramatically improved by the LC dataset: for a recall of 2% of a standard constructed from GO terms (about 11,000 pairs), the LC dataset improved prediction precision from 50% to 68% (Figure 9a).
Another important characteristic of any biological dataset is the diversity of functional groups covered. While precision-recall curves estimate the total number of true-positive pairs in the LC dataset, they do not specifically report the number of distinct biological processes captured by the data. To measure this diversity, we computed precision-recall statistics separately on the 146 largest GO terms under the 300-gene threshold for each data type, and counted the number of terms that meet a minimum combined precision-recall score, as measured by the commonly used F-score or harmonic mean. The diversity of coverage in the LC dataset was clearly superior to that in any of the HTP datasets (Figure 9b). For example, the LC dataset covered eight distinct biological processes at a minimum F-score threshold of 0.32, whereas the next best data type, HTP affinity precipitation, covered eight GO terms only when the F-score threshold was relaxed to 0.15. This increased diversity is an important consideration in functional prediction because the limiting factor in such analyses is often incomplete data.
Prediction and coverage of protein complexes
The predicted core proteome is substantially conserved across eukaryotes. For example, 37% of yeast proteins have identifiable orthologs in humans . This concept has been recently extended to identify conserved protein pathways . We assessed the ability of the LC-PI dataset to augment these pathway predictions, based on the current fly protein-interaction network of 20,720 unique interactions between 7,038 proteins in FlyBase . We again searched the combined LC-PI and HTP-PI yeast networks for densely connected subnetworks suggestive of protein complexes, but in addition we made the requirement that the set of proteins in each complex has putative orthologs in fly that were also densely connected in the fly network. This process identified 1,412 putative conserved complexes between yeast and fly (see Additional data file 1). Like the single-species yeast complexes identified above (Figure 10a), the LC-PI dataset contributed the majority of interactions in the complexes conserved between yeast and fly (Figure 10c). As an example of such predicted complexes, a dense cytoskeletal control network in yeast corresponded to a partial network detected in the fly HTP dataset (Figure 10d). This orthologous network both buttresses known yeast interactions and suggests possible experiments to probe the cytoskeletal regulation in the fly. Finally, again based on the principle that interactions among orthologous genes are more likely to be true than those among nonorthologous genes, we used the LC-PI dataset to predict a set of 338 novel human protein interactions (see Additional data file 1).
The proteins grouped in a predicted complex are likely to share a common function. As with individual protein interactions, such co-association can be exploited to make high-quality protein functional predictions. We identified complexes that were already enriched for a particular GO function and transferred this function to all proteins in that complex (see Materials and methods). This process yielded between a hundred and a thousand new GO biological process annotations over all complexes, depending on whether HTP-PI or LC-PI data were used to identify complexes, and whether conserved yeast-only or yeast/fly complexes were specified (Figure 10e; see also Additional data file 1). LC-PI interactions resulted in substantially larger numbers of predictions than did HTP-PI interactions, at a percent accuracy that was roughly equivalent between the two (slightly higher for yeast-only complexes, slightly lower for yeast or fly complexes). Overall, the predictive power of complexes derived from the LC-PI dataset exceeds those derived from the HTP interactions.
Systematic curation of the S. cerevisiae primary literature enabled the creation of a comprehensive database that currently houses a total of 22,250 protein interactions and 11,061 genetic interactions, corresponding to 11,334 and 8,165 nonredundant interactions in the LC-PI and LC-GI datasets, respectively. This resource represents the distillation of more than three decades of yeast molecular genetics and biochemistry, as acquired by individual investigators. Because of the thorough coverage of the LC dataset, it will serve as a look-up table for gene and protein interactions and as a basis for interrogating the properties of biological networks. As shown above, the LC dataset improves the prediction of gene function and protein complexes, both within and between species. The sophisticated molecular genetics of budding yeast will facilitate definitive tests of hypotheses generated from analysis of the LC dataset.
Interaction space: overlap between LC and HTP data
Simple comparison of the LC dataset reveals key differences between experimental data embedded in the literature as a whole and HTP data. The well known high rate of false-positive interactions in HTP physical interaction datasets is an inevitable consequence of nonspecific interactions inherent to different methods [33, 34]. A more unexpected feature of the HTP datasets perhaps is the high rate of false-negative interactions in the original HTP datasets, a parameter that has not been possible to estimate until now. Thus, the overall overlap between HTP-PI and LC-PI datasets is only 14%, whereas even the most robust HTP interaction dataset contains less than 30% of known interactions for the particular baits studied. In conjunction with the observation that the better studied proteins or genes exhibit more interactions, the high false-negative rate in the HTP data suggests that interaction space may be far from saturated and that there are many more interactions to be discovered. The false-negative problem will undoubtedly be ameliorated by recent dramatic increases in mass spectrometer sensitivity  and application of more rigorous HTP approaches . A second unexpected feature of the HTP datasets is the inherent bias towards previously studied interactions. This bias appears to derive in part from bait selection in nonsaturating studies. A final notable difference between the LC and HTP datasets is the dearth of genetic interactions in HTP screens that correspond to physical interactions. The apparent orthogonal relationship between HTP-PI and HTP-GI networks has been noted previously and explained on the basis of inter-pathway genetic interactions [12, 64]. The substantial overlap between genetic and physical interactions observed in the LC datasets, although perhaps driven by investigator bias, belies a simple relationship between genetic and biochemical networks.
Similar network properties of LC data and HTP data
The sparse coverage of true interactions in HTP datasets has numerous implications for previous network analyses, which of necessity have been based solely on HTP data. Importantly, four network properties deduced from HTP studies appear to hold in the LC-derived networks. First, the overall scale-free topology of biological networks deduced from HTP studies is supported by the LC dataset, albeit with regions of less ideal fit. This lack of fit may either reflect the bias in the LC-PI dataset, which results in enrichment of proteins with higher connectivity, or may reflect the fact that biological networks do not perfectly fit a power-law relationship [61, 80]. Although there are relatively fewer hubs compared with non-hubs in the LC-PI network, this network nevertheless has significantly more highly connected hubs than other scale-free networks, such as the HTP-PI dataset. Second, the relationship between essentiality and connectivity also holds in the LC dataset. The large cohort of connections maintained by essential proteins may be a consequence of the fact that essential proteins tend to be more ancient, and have simply gained more interactions by chance. Third, protein-interaction partners tend to co-localize in the same subcellular compartment. Fourth, the modest propensity of protein-interaction partners to be coexpressed under different conditions is an attribute of both LC-PI and HTP-PI datasets.
Essential-essential interactions unify the cellular network
The fourfold enrichment for essential-essential protein interactions observed in both the LC-PI and HTP-PI networks suggests that the global network may be unified by interactions between essential nodes. Indeed, a highly connected core of essential proteins with an exponential degree distribution has recently been noticed in HTP data . This finding is buttressed by our observations that the LC-PI essential-essential interaction network is not only exponentially distributed, but is more dense, more complete and more connected than its nonessential counterpart. Although previous analysis of a HTP two-hybrid network revealed that hub-hub connections are suppressed, implying that the cellular network is modular , this property appears to be a consequence of the HTP dataset (N.N.B. and M.T., unpublished data). Our finding that genetic interactions between essential genes are also twofold enriched in the LC-GI dataset strongly suggests that essential-essential interactions are functionally significant. Consistently, a recent analysis indicates that essential genes may exhibit up to fivefold more synthetic lethal interactions than to nonessential genes . The preponderance of essential-essential interactions has a critical bearing on the evolution of protein networks. Because essential proteins evolve more slowly than nonessential proteins , it seems likely that essentials are constrained to slowly coevolve with other essentials to which they are physically connected [82, 83]. The properties of the global network may thus be dominated by a phalanx of interlinked essential hubs that have been co-selected by evolutionary pressure. This interconnectivity appears to be supported by the substantial overlap we observe between the LC-PI and LC-GI networks, a feature that is not evident in the HTP-GI network . Unlike metabolic networks, which do exhibit modularity , this centralized architecture may not be readily amenable to interpretation through discrete categorization of gene and protein function.
Network representation and bias
Static two-dimensional representations of biological networks are obviously an abstraction that artificially compresses temporally and spatially distinct regions of the network. Although the current LC dataset captures basic data about physical and genetic interactions, much other information remains to be extracted and compiled, including quantitative measures of protein and genetic interactions [67, 85], spatio-temporal aspects of network organization [54, 68], protein-DNA interactions  and the posttranslational modifications that modulate many protein interactions . In addition, more complex attributes such as the directionality of interactions and functional dependencies must also be captured in a systematic manner. Much of this information is contextual in nature and depends on multiple lines of supporting evidence that is not easily codified. This information will, however, be crucial for modeling the dynamics of genetic and protein networks. For example, relationships extracted from the literature have recently been used to demonstrate that the budding yeast cell cycle behaves as a dynamic attractor  and to deduce patterns of information flow in a mammalian neuronal network . Pathway databases such as Reactome  and the Kyoto Encyclopedia of Genes and Genomes (KEGG)  have begun to compile this type of information. The LC dataset will serve as a guidepost for curation of more complex features, from which more sophisticated global models can be built.
As noted above and elsewhere, inherent biases in methods and approaches can compromise any given dataset, whether it be in limits of detection, a propensity to recover certain classes of interaction, or study bias in the primary literature [16, 17, 75, 91]. Comparison of various datasets can reveal biases, which can then be taken into account in interpretation of network properties. With the advent of systems-biology approaches, such integrated datasets within the same study are rapidly becoming the norm and will provide much needed internal consistency between different methods . Moreover, as the sensitivity and reliability of HTP approaches continues to improve, interactions detected by these methods will dominate biological networks. The LC dataset will guide such approaches and facilitate the interpretation of new data.
To maximize portability and integration, systematic curation efforts will require a universal agreed upon structured vocabulary to describe interactions and associated features. The Protein Standards Initiative, a work group of the Human Proteome Organization (HUPO), has recently developed a molecular interaction record structure, called PSI-MI, for protein and genetic interaction data . The PSI-MI format has been adopted by the IMEx consortium of interaction databases , which aims to freely distribute interaction data. The open exchange of interaction records between different databases will enable the necessary comparisons to achieve a curated dataset that is largely error free. In accord with IMEx guidelines, we are in the process of mapping our experimental evidence codes to the PSI-MI format, so that our ongoing curation efforts will conform to the PSI-MI standard.
Apart from applications in the benchmarking of HTP datasets, prediction of protein function and biological network modeling, systematic curation efforts will prove useful in other contexts. In particular, interactions curated from the literature provide a valuable independent means to assess the coherence of GO annotation. Validated interaction partners that bear discrepant GO annotations may indicate either novel biological connections, the need for harmonization of GO terms, or simply outright inconsistencies in the literature. Comprehensive LC interaction datasets allow these discrepancies to be readily found and re-evaluated. Given the considerable efforts involved in the Model Organism Database (MOD) and GO curation, a strong case can be made for linked curation of full interaction records, which already partially overlap with GO evidence codes [47, 48]. We also endorse the concept of author-directed curation at the time of submission or publication; the capture of interaction data in simplified records would greatly augment systematic curation of the literature. Finally, large manually curated datasets will provide a critical benchmark for machine-based learning approaches to automate the curation of the literature . Machine-assisted approaches, such as the Textpresso literature-search algorithm , will undoubtedly improve curation accuracy and efficiency.
Comprehensive curation of reliable protein and genetic interactions from the primary biomedical literature establishes a critical benchmark for HTP datasets, augments prediction of gene or protein function and allows inference of system-level properties of biological networks. The systematic compilation of publicly available LC interaction datasets for other model organisms, including humans , will enable further insight into both individual gene functions and biological network features.
Literature search and definition of datasets
The PubMed database was searched for relevant publications using the following criteria: (all yeast ORFs) + (Gene Name (all aliases)) AND + (Yeast + OR + Saccharomyces + cerevisiae). We also read an additional 6,543 abstracts/papers curated by SGD that were missed in the original search, usually because a gene name was not present in the abstract. A total of 53,117 abstracts/papers as of 1 November, 2005 were manually curated using custom web-based tools. The curation system automatically tracked abstracts and/or full text read by each curator. Abstracts that contained 'Saccharomyces cerevisiae' or 'yeast' and a gene name but that were not true S. cerevisiae publications, typically because the publication described a yeast homolog or two-hybrid interaction for another species, were designated 'Wrong Organism'.
The LC-PI dataset does not include interactions from the two extant HTP mass spectrometry studies in S. cerevisiae [8, 9] or from the three extant HTP two-hybrid studies [5–7]. These latter five combined studies are referred to as the HTP-PI. A number of recent publications report what might be considered HTP data that has been cross-validated to various extents. These publications, designated either systematic interrogation (SI) and HTP survey (HS), were included in the LC-PI dataset for the purpose of analyses reported here, but may be readily segregated for future analysis (see Additional data file 1).
The LC-GI dataset is defined as all interactions derived from conventional genetic approaches, that is, those not based on systematic SGA and dSLAM screens of the yeast deletion set [10, 12, 13] All genetic interactions from systematic screens comprise the HTP-GI dataset (see Additional data file 1 for the list of publications that document HTP-GI data).
The experimental methods for physical interactions were classified as follows:
The bait protein is affinity captured from cell extracts by either polyclonal antibody or epitope tag and the associated interaction partner is identified by MS methods.
The bait protein is affinity captured from cell extracts by either polyclonal antibody or epitope tag and the associated interaction partner is identified by western blot with a specific polyclonal antibody or a second epitope tag. This category was also used if an interacting protein was visualized directly by dye stain or radioactivity.
Interaction is inferred from a biochemical effect of one protein upon another, for example, GTP-GDP exchange activity or phosphorylation of a substrate by a kinase.
Interaction is directly demonstrated at the atomic level by X-ray crystallography.
Interaction is inferred from the presence of two or more protein subunits in a partially purified protein preparation.
Interaction is inferred from two proteins that co-localize in the cell by indirect immunofluorescence, usually in a co-dependent manner. This category also includes co-dependent association of proteins with promoter DNA in chromatin immunoprecipitation experiments.
Interaction is inferred from the identification of two or more protein subunits in a purified protein complex, as obtained by classical biochemical fractionation or by affinity purification and one or more additional fractionation steps. Because the bait-prey relationship does not exist for conventional purification, in those cases where an experimentally tagged bait protein was not present, a virtual bait was defined as the most highly connected protein according to other types of experimental evidence in the data-set. Co-purified complexes are listed in Additional data file 1.
Interaction is detected between a protein immobilized on a membrane and a purified protein probe.
The close proximity of interaction partners is detected by fluorescence resonance energy transfer (FRET) between cyan fusion protein (CFP) and yellow fluorescent protein (YFP) fusion proteins in vivo.
Interaction is detected between a protein and a peptide derived from an interaction partner. This category includes phage-display experiments.
Interaction is detected between a purified protein and associated RNA(s) as detected by northern blot or reverse transcription-PCR. Genome-wide experiments based on microarray detection were classified as HTP, and not recorded, unless supporting documentation for specific interactions was provided.
Interaction is directly detected between purified proteins in vitro, usually in recombinant form.
The bait protein is expressed as a DNA-binding domain fusion and the prey protein is expressed as a transcriptional activation domain fusion and interaction is measured by reporter gene activation. This category was also used for two-hybrid variations such as the split-ubiquitin assay.
The experimental methods for genetic interactions were classified as follows:
Dosage growth defect
The overexpression or increased dosage of one gene causes a growth defect in a strain that is mutated or deleted for another gene.
The overexpression or increased dosage of one gene causes lethality in a strain that is mutated or deleted for another gene.
The overexpression or increased dosage of one gene rescues the lethality or growth defect of a strain that is mutated or deleted for another gene.
Synthetic growth defect
Mutations or deletions in separate genes, each of which alone causes a minimal phenotype but when combined in the same cell results in a significant growth defect under a given condition.
Mutations or deletions in separate genes, each of which alone causes a minimal phenotype but when combined in the same cell results in lethality under a given condition.
A mutation or deletion of one gene rescues the lethality or growth defect of a strain mutated or deleted for another gene.
The mutation, deletion, or overexpression of one gene results in enhancement of any phenotype associated with the mutation, deletion, or over-expression of another gene.
The mutation, deletion, or over-expression of one gene results in the suppression of any phenotype associated with the mutation, deletion, or over-expression of another gene.
At this stage of curation, multiple genetic dependencies and strain background context were not routinely recorded, nor was the possible directionality of genetic interactions inferred.
To estimate excess publication bias in the literature dataset, a bias for a protein or gene ν was defined as the number of interactions ν is part of, minus the connectivity of ν. Thus, if the connectivity of ν is k and ν is seen in k interactions, then the bias is 0; however, if ν is seen in, for example, 2k interactions, the bias is 2. Bias was computed for nodes in each dataset. Fits to power-law curves , expression correlation analyses [26, 70], clustering coefficients , and hierarchical clustering  were computed essentially as described. Standard statistical tests were used throughout.
We evaluated the enrichment of known functional relationships in the curated literature and other HTP data using GO biological process terms as a benchmark. Specifically, we compared protein pairs identified in curation or HTP data to those annotated to the same nodes in GO. We propagated each biological process annotation up to its ancestors to ensure a general evaluation base on the full GO hierarchy. To prevent proteins co-annotated to very general terms (such as 'metabolism') from being considered true positives, the number of unique annotations per GO term was counted. Because the biological specificity of each term roughly corresponds to the number of total annotations, we choose two thresholds to define the set of positive and negative protein pairs. Protein pairs whose most specific co-annotation occurs in GO terms of 300 total annotations or less are considered positives, while pairs whose most specific co-annotation occurs in GO terms of 1000 total annotations or more are considered negatives. The positive set spans around 1,600 terms, totaling some 500,000 pairs, and the negative set spans 10 nodes, totaling around 6 million pairs. The exact choice of GO term size threshold is not critical. Evaluation results are consistent for any choices between 150 and 400 genes when the negative co-annotation term size threshold is fixed at 1,000. Details of predictive methods are provided in Additional data file 2.
Protein complex and pathway prediction
Identification of protein complexes was performed using the PathBLAST network alignment tools, as previously described . Briefly, these methods integrate protein-interaction data from two species with protein sequence homology to generate an aligned network, in which each node represents a pair of homologous proteins (one from each species; BLAST E-value < 10-7) and each link represents a conserved interaction. We note that representation of the network as either a spoke or matrix model does not affect the outcome of PathBLAST predictions because computations for conserved complexes include both direct and indirect interactions. That is, proteins that are bridged by a third protein are automatically linked in the PathBLAST network and assigned only a slight penalty. PathBLAST is thus robust to possible incomplete coverage in one network versus another. Given this design, spoke versus matrix representation models yield very similar complex predictions and network topologies.
The PathBLAST network alignment was searched to identify high-scoring subnetworks, for which the score is based on the density of interactions within the subnetwork as well as the confidence estimates for each protein interaction (see below). The search was then repeated over 100 random trials, in which the interactions of both networks are reassigned while maintaining the same number of interactions per protein, resulting in a distribution of random subnetwork scores pooled over all trials. Dense subnetworks that score in the top 1% of this random score distribution are considered significant and retained as conserved complexes. To minimize redundancy, complexes are filtered against each other such that if the sets of proteins from any two complexes overlap by more than 80%, the lower-scoring complex is discarded. The search for single-species complexes is identical to the search for conserved complexes except that an individual protein network is substituted for the network alignment. This process identifies dense subnetworks constrained by the interactions of one species rather than two. In the fly, confidence estimates for each interaction were derived using a logistic regression model similar to that previously described ; in yeast, so as not to bias one set of interactions over the other, interactions were assigned a uniform confidence of 0.99. Given a set of significant protein complexes, these complexes are used to predict new protein functional annotations, as follows. A GO functional term f is assigned to protein P of complex c if: (1) at least five proteins in c are already annotated with f; (2) at least 50% of the proteins in c are annotated with f; and (3) c is enriched for f by a hypergeometric P-value < 0.01; and (4) f is a sufficiently specific term at level 4 or deeper in the GO ontology. To assess the predictive power of significant complexes, we use tenfold cross-validation. In this procedure, the set of known GO annotations is partitioned into ten equal subsets, and each of these is hidden in turn. The fraction of hidden annotations that is recapitulated using the prediction algorithm is determined.
For predicted interactions between human proteins, yeast-human orthologs were stringently identified by reciprocal best-hit BLAST scores of e-value < 10-10 and sequence identity of > 50%. Human protein interactions were obtained from HPRD  and human protein sequences from the National Center for Biotechnology Information (NCBI). For each interaction in the LC-PI dataset set, if both proteins had a human ortholog and the interaction between these orthologs was not reported in HPRD, a predicted interaction was scored.
Distribution, updates and maintenance
The complete LC dataset is freely available at the BioGRID interaction database  and at the Saccharomyces Genome Database . The LC dataset will be kept current through monthly updates and refined through re-curation and community-directed corrections. In future curation updates, all the above protein- and genetic-interaction evidence categories will be mapped to PSI-MI terms .
Note added in proof
Two comprehensive surveys of protein interactions, as determined by mass spectrometric analysis of affinity purified protein complexes, have recently been reported [109, 110]. The raw dataset in Gavin et al.  overlaps with 21% of the LC-PI dataset and 29% of the HTP-PI dataset, while the raw dataset in Krogan et al.  overlaps with 22% of the LC-PI dataset and 14% of the HTP-PI dataset. The sum total of all HTP-PI data, including recent data [109, 110], overlaps with 34% of the LC-PI dataset. These comparisons suggest that protein interaction space is far from saturated in extant datasets.
The following additional data files are available with this article. Additional data file 1 contains Supplementary Tables 1–11: Supplementary Table 1, LC and HTP dataset statistics; Supplementary Table 2, Co-purified complexes in the LC dataset; Supplementary Table 3, SI/HTP publications; Supplementary Table 4, Post-translational modifications associated with interactions; Supplementary Table 5, Overlap of physical and genetic interaction datasets; Supplementary Table 6, Predicted yeast complexes from yeast interaction datasets; Supplementary Table 7, Predicted yeast complexes from yeast and fly interaction datasets; Supplementary Table 8, Novel human predicted human protein interactions; Supplementary Table 9, Novel GO functional predictions for yeast proteins; Supplementary Table 10, Novel GO functional predictions for fly proteins; Supplementary Table 11, Publications documented in the HTP-GI dataset. Additional data file 2 contains a comparison of the LC dataset with other curated datasets and details of functional predictions. Additional data file 3 contains Supplementary Figures 1-6: Supplementary Figure 1, Curation benchmarks for the LC dataset; Supplementary Figure 2, Distribution of terms in GO categories in LC-PI and LC-GI dataset; Supplementary Figure 3, Relative coverage and overlap of interaction datasets; Supplementary Figure 4, Raw distributions of interactions for each indicated dataset as a function of protein abundance; Supplementary Figure 5, Expression correlation for interaction pairs in LC versus HTP datasets; Supplementary Figure 6, Dense regions in the physical interaction network. Additional data file 4 contains flat files of the main datasets.
We thank V. Wood, L. Harrington, R. Apweiler, H. Hermjakob, T. Hughes, G. Bader for thoughtful discussion and J. Grigull for assistance with curation. L.B. is supported by a National Cancer Institute of Canada Doctoral Award with funds from the Terry Fox Foundation; C.L.M. is supported by a NIH Quantitative and Computational Biology Program Grant; N.N.B. is supported by a Canadian Institutes of Health Research (CIHR) Postdoctoral Fellowship; O.G.T. is an Alfred P. Sloan Research Fellow; C.B. and M.T. are Canada Research Chairs. This work was supported by grants from the NIH to O.G.T., T.I., K.D. and D.B. and by grants from the CIHR to B.A., C.B. and M.T.
- Chua G, Robinson MD, Morris Q, Hughes TR: Transcriptional networks: reverse-engineering gene regulation on a global scale. Curr Opin Microbiol. 2004, 7: 638-646. 10.1016/j.mib.2004.10.009.PubMedGoogle Scholar
- Giaever G, Chu AM, Ni L, Connelly C, Riles L, Veronneau S, Dow S, Lucau-Danila A, Anderson K, Andre B, et al: Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002, 418: 387-391. 10.1038/nature00935.PubMedGoogle Scholar
- Bader GD, Heilbut A, Andrews B, Tyers M, Hughes T, Boone C: Functional genomics and proteomics: charting a multidimensional map of the yeast cell. Trends Cell Biol. 2003, 13: 344-356. 10.1016/S0962-8924(03)00127-2.PubMedGoogle Scholar
- Jorgensen P, Breitkreutz BJ, Breitkreutz K, Stark C, Liu G, Cook M, Sharom J, Nishikawa JL, Ketela T, Bellows D, et al: Harvesting the genome's bounty: integrative genomics. Cold Spring Harb Symp Quant Biol. 2003, 68: 431-443. 10.1101/sqb.2003.68.431.PubMedGoogle Scholar
- Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, et al: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000, 403: 623-627. 10.1038/35001009.PubMedGoogle Scholar
- Ito T, Tashiro K, Muta S, Ozawa R, Chiba T, Nishizawa M, Yamamoto K, Kuhara S, Sakaki Y: Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad Sci USA. 2000, 97: 1143-1147. 10.1073/pnas.97.3.1143.PubMed CentralPubMedGoogle Scholar
- Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA. 2001, 98: 4569-4574. 10.1073/pnas.061034498.PubMed CentralPubMedGoogle Scholar
- Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K, et al: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002, 415: 180-183. 10.1038/415180a.PubMedGoogle Scholar
- Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, et al: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002, 415: 141-147. 10.1038/415141a.PubMedGoogle Scholar
- Tong AH, Evangelista M, Parsons AB, Xu H, Bader GD, Page N, Robinson M, Raghibizadeh S, Hogue CW, Bussey H, et al: Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science. 2001, 294: 2364-2368. 10.1126/science.1065810.PubMedGoogle Scholar
- Ooi SL, Shoemaker DD, Boeke JD: DNA helicase gene interaction network defined using synthetic lethality analyzed by microarray. Nat Genet. 2003, 35: 277-286. 10.1038/ng1258.PubMedGoogle Scholar
- Tong AH, Lesage G, Bader GD, Ding H, Xu H, Xin X, Young J, Berriz GF, Brost RL, Chang M, et al: Global mapping of the yeast genetic interaction network. Science. 2004, 303: 808-813. 10.1126/science.1091317.PubMedGoogle Scholar
- Pan X, Yuan DS, Xiang D, Wang X, Sookhai-Mahadeo S, Bader JS, Hieter P, Spencer F, Boeke JD: A robust toolkit for functional profiling of the yeast genome. Mol Cell. 2004, 16: 487-496. 10.1016/j.molcel.2004.09.035.PubMedGoogle Scholar
- Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, et al: A protein interaction map of Drosophila melanogaster. Science. 2003, 302: 1727-1736. 10.1126/science.1090289.PubMedGoogle Scholar
- Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Hao T, et al: A map of the interactome network of the metazoan C. elegans. Science. 2004, 303: 540-543. 10.1126/science.1091403.PubMed CentralPubMedGoogle Scholar
- Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, et al: A human protein-protein interaction network: a resource for annotating the proteome. Cell. 2005, 122: 957-968. 10.1016/j.cell.2005.08.029.PubMedGoogle Scholar
- Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, et al: Towards a proteome-scale map of the human protein-protein interaction network. Nature. 2005, 437: 1173-1178. 10.1038/nature04209.PubMedGoogle Scholar
- Barabasi AL, Albert R: Emergence of scaling in random networks. Science. 1999, 286: 509-512. 10.1126/science.286.5439.509.PubMedGoogle Scholar
- Albert R, Jeong H, Barabasi AL: Error and attack tolerance of complex networks. Nature. 2000, 406: 378-382. 10.1038/35019019.PubMedGoogle Scholar
- Wagner A: Does selection mold molecular networks?. Sci STKE. 2003, 2003: PE41-PubMedGoogle Scholar
- Watts DJ, Strogatz SH: Collective dynamics of 'small-world' networks. Nature. 1998, 393: 440-442. 10.1038/30918.PubMedGoogle Scholar
- Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and centrality in protein networks. Nature. 2001, 411: 41-42. 10.1038/35075138.PubMedGoogle Scholar
- Wagner A: The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol Biol Evol. 2001, 18: 1283-1292.PubMedGoogle Scholar
- Shen-Orr SS, Milo R, Mangan S, Alon U: Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet. 2002, 31: 64-68. 10.1038/ng881.PubMedGoogle Scholar
- Zhang LV, King OD, Wong SL, Goldberg DS, Tong AH, Lesage G, Andrews B, Bussey H, Boone C, Roth FP: Motifs, themes and thematic maps of an integrated Saccharomyces cerevisiae interaction network. J Biol. 2005, 4: 6-10.1186/jbiol23.PubMed CentralPubMedGoogle Scholar
- Ge H, Liu Z, Church GM, Vidal M: Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet. 2001, 29: 482-486. 10.1038/ng776.PubMedGoogle Scholar
- Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science. 2003, 302: 449-453. 10.1126/science.1087361.PubMedGoogle Scholar
- Troyanskaya OG, Dolinski K, Owen AB, Altman RB, Botstein D: A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc Natl Acad Sci USA. 2003, 100: 8348-8353. 10.1073/pnas.0832373100.PubMed CentralPubMedGoogle Scholar
- Lee I, Date SV, Adai AT, Marcotte EM: A probabilistic functional network of yeast genes. Science. 2004, 306: 1555-1558. 10.1126/science.1099511.PubMedGoogle Scholar
- Wong SL, Zhang LV, Tong AH, Li Z, Goldberg DS, King OD, Lesage G, Vidal M, Andrews B, Bussey H, et al: Combining biological networks to predict genetic interactions. Proc Natl Acad Sci USA. 2004, 101: 15682-15687. 10.1073/pnas.0406614101.PubMed CentralPubMedGoogle Scholar
- Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, Uetz P, Sittler T, Karp RM, Ideker T: Conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci USA. 2005, 102: 1974-1979. 10.1073/pnas.0409522102.PubMed CentralPubMedGoogle Scholar
- Myers CL, Robson D, Wible A, Hibbs MA, Chiriac C, Theesfeld CL, Dolinski K, Troyanskaya OG: Discovery of biological networks from diverse functional genomic data. Genome Biology. 2005, 6: R114-10.1186/gb-2005-6-13-r114.PubMed CentralPubMedGoogle Scholar
- von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature. 2002, 417: 399-403. 10.1038/nature750.PubMedGoogle Scholar
- Bader GD, Hogue CW: Analyzing yeast protein-protein interaction data obtained from different sources. Nat Biotechnol. 2002, 20: 991-997. 10.1038/nbt1002-991.PubMedGoogle Scholar
- Mrowka R, Patzak A, Herzel H: Is there a bias in proteome research?. Genome Res. 2001, 11: 1971-1973. 10.1101/gr.206701.PubMedGoogle Scholar
- Hodges PE, Payne WE, Garrels JI: The Yeast Protein Database (YPD): a curated proteome database for Saccharomyces cerevisiae. Nucleic Acids Res. 1998, 26: 68-72. 10.1093/nar/26.1.68.PubMed CentralPubMedGoogle Scholar
- Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Munsterkotter M, Rudd S, Weil B: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 2002, 30: 31-34. 10.1093/nar/30.1.31.PubMed CentralPubMedGoogle Scholar
- Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, Cesareni G: MINT: a Molecular INTeraction database. FEBS Lett. 2002, 513: 135-140. 10.1016/S0014-5793(01)03293-8.PubMedGoogle Scholar
- Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, et al: IntAct: an open source molecular interaction database. Nucleic Acids Res. 2004, 32 (Database issue): D452-D455. 10.1093/nar/gkh052.Google Scholar
- Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, Eisenberg D: DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 2002, 30: 303-305. 10.1093/nar/30.1.303.PubMed CentralPubMedGoogle Scholar
- Bader GD, Betel D, Hogue CW: BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 2003, 31: 248-250. 10.1093/nar/gkg056.PubMed CentralPubMedGoogle Scholar
- Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TK, Gronborg M, et al: Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 2003, 13: 2363-2371. 10.1101/gr.1680803.PubMed CentralPubMedGoogle Scholar
- Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006, 34 (Database issue): D535-D539. 10.1093/nar/gkj109.Google Scholar
- BioGRID. [http://www.thebiogrid.org]
- International Molecular Exchange Consortium. [http://imex.sourceforge.net]
- Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C, et al: The HUPO PSI's molecular interaction format: a community standard for the representation of protein interaction data. Nat Biotechnol. 2004, 22: 177-183. 10.1038/nbt926.PubMedGoogle Scholar
- Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, et al: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32 (Database issue): D258-261.Google Scholar
- Drabkin HJ, Hollenbeck C, Hill DP, Blake JA: Ontological visualization of protein-protein interactions. BMC Bioinformatics. 2005, 6: 29-10.1186/1471-2105-6-29.PubMed CentralPubMedGoogle Scholar
- Krallinger M, Valencia A: Text-mining and information-retrieval services for molecular biology. Genome Biol. 2005, 6: 224-10.1186/gb-2005-6-7-224.PubMed CentralPubMedGoogle Scholar
- Christie KR, Weng S, Balakrishnan R, Costanzo MC, Dolinski K, Dwight SS, Engel SR, Feierbach B, Fisk DG, Hirschman JE, et al: Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res. 2004, 32 (Database issue): D311-D314. 10.1093/nar/gkh033.Google Scholar
- Saccharomyces Genome Database. [http://www.yeastgenome.org]
- Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003, 423: 241-254. 10.1038/nature01644.PubMedGoogle Scholar
- Breitkreutz BJ, Stark C, Tyers M: The GRID: the General Repository for Interaction Datasets. Genome Biol. 2003, 4: R23-10.1186/gb-2003-4-3-r23.PubMed CentralPubMedGoogle Scholar
- Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJ, Cusick ME, Roth FP, Vidal M: Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature. 2004, 430: 88-93. 10.1038/nature02555.PubMedGoogle Scholar
- Hoffmann R, Valencia A: Life cycles of successful genes. Trends Genet. 2003, 19: 79-81. 10.1016/S0168-9525(02)00014-8.PubMedGoogle Scholar
- Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, et al: The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003, 4: 41-10.1186/1471-2105-4-41.PubMed CentralPubMedGoogle Scholar
- Decottignies A, Sanchez-Perez I, Nurse P: Schizosaccharomyces pombe essential genes: a pilot study. Genome Res. 2003, 13: 399-406. 10.1101/gr.636103.PubMed CentralPubMedGoogle Scholar
- Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, et al: A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol. 2004, 5: R7-10.1186/gb-2004-5-2-r7.PubMed CentralPubMedGoogle Scholar
- Grigoriev A: On the number of protein-protein interactions in the yeast proteome. Nucleic Acids Res. 2003, 31: 4157-4161. 10.1093/nar/gkg466.PubMed CentralPubMedGoogle Scholar
- Davierwala AP, Haynes J, Li Z, Brost RL, Robinson MD, Yu L, Mnaimneh S, Ding H, Zhu H, Chen Y, et al: The synthetic genetic interaction spectrum of essential genes. Nat Genet. 2005, 37: 1147-1152. 10.1038/ng1640.PubMedGoogle Scholar
- Tanaka R, Yi TM, Doyle J: Some protein interaction data do not exhibit power law statistics. FEBS Lett. 2005, 579: 5140-5144. 10.1016/j.febslet.2005.08.024.PubMedGoogle Scholar
- Pereira-Leal JB, Audit B, Peregrin-Alvarez JM, Ouzounis CA: An exponential core in the heart of the yeast protein interaction network. Mol Biol Evol. 2005, 22: 421-425. 10.1093/molbev/msi024.PubMedGoogle Scholar
- Maslov S, Sneppen K: Specificity and stability in topology of protein networks. Science. 2002, 296: 910-913. 10.1126/science.1065103.PubMedGoogle Scholar
- Kelley R, Ideker T: Systematic interpretation of genetic interactions using protein networks. Nat Biotechnol. 2005, 23: 561-566. 10.1038/nbt1096.PubMed CentralPubMedGoogle Scholar
- Breitkreutz BJ, Stark C, Tyers M: Osprey: a network visualization system. Genome Biol. 2003, 4: R22-10.1186/gb-2003-4-3-r22.PubMed CentralPubMedGoogle Scholar
- Ozier O, Amin N, Ideker T: Global architecture of genetic interactions on the protein network. Nat Biotechnol. 2003, 21: 490-491. 10.1038/nbt0503-490.PubMedGoogle Scholar
- Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A, Dephoure N, O'Shea EK, Weissman JS: Global analysis of protein expression in yeast. Nature. 2003, 425: 737-741. 10.1038/nature02046.PubMedGoogle Scholar
- Huh WK, Falvo JV, Gerke LC, Carroll AS, Howson RW, Weissman JS, O'Shea EK: Global analysis of protein localization in budding yeast. Nature. 2003, 425: 686-691. 10.1038/nature02026.PubMedGoogle Scholar
- Batada NN, Shepp LA, Siegmund DO: Stochastic model of protein-protein interaction: why signaling proteins need to be colocalized. Proc Natl Acad Sci USA. 2004, 101: 6445-6449. 10.1073/pnas.0401314101.PubMed CentralPubMedGoogle Scholar
- Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science. 2001, 292: 929-934. 10.1126/science.292.5518.929.PubMedGoogle Scholar
- Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, et al: Functional discovery via a compendium of expression profiles. Cell. 2000, 102: 109-126. 10.1016/S0092-8674(00)00015-5.PubMedGoogle Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, et al: Transcriptional regulatory code of a eukaryotic genome. Nature. 2004, 431: 99-104. 10.1038/nature02800.PubMed CentralPubMedGoogle Scholar
- Bader GD, Hogue CW: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics. 2003, 4: 2-10.1186/1471-2105-4-2.PubMed CentralPubMedGoogle Scholar
- Rives AW, Galitski T: Modular organization of cellular networks. Proc Natl Acad Sci USA. 2003, 100: 1128-1133. 10.1073/pnas.0237338100.PubMed CentralPubMedGoogle Scholar
- Spirin V, Mirny LA: Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci USA. 2003, 100: 12123-12128. 10.1073/pnas.2032324100.PubMed CentralPubMedGoogle Scholar
- O'Brien KP, Remm M, Sonnhammer EL: Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 2005, 33 (Database issue): D476-D480. 10.1093/nar/gki107.Google Scholar
- FlyBase Consortium: The FlyBase database of the Drosophila genome projects and community literature. Nucleic Acids Res. 2003, 31: 172-175. 10.1093/nar/gkg094.Google Scholar
- Page JS, Masselon CD, Smith RD: FTICR mass spectrometry for qualitative and quantitative bioanalyses. Curr Opin Biotechnol. 2004, 15: 3-11. 10.1016/j.copbio.2004.01.002.PubMedGoogle Scholar
- Vidalain PO, Boxem M, Ge H, Li S, Vidal M: Increasing specificity in high-throughput yeast two-hybrid experiments. Methods. 2004, 32: 363-370. 10.1016/j.ymeth.2003.10.001.PubMedGoogle Scholar
- Przulj N, Corneil DG, Jurisica I: Modeling interactome: scale-free or geometric?. Bioinformatics. 2004, 20: 3508-3515. 10.1093/bioinformatics/btg415.PubMedGoogle Scholar
- Jordan IK, Rogozin IB, Wolf YI, Koonin EV: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 2002, 12: 962-968. 10.1101/gr.87702. Article published online before print in May 2002.PubMed CentralPubMedGoogle Scholar
- Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW: Evolutionary rate in the protein interaction network. Science. 2002, 296: 750-752. 10.1126/science.1068696.PubMedGoogle Scholar
- Jordan IK, Wolf YI, Koonin EV: No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol Biol. 2003, 3: 1-10.1186/1471-2148-3-1.PubMed CentralPubMedGoogle Scholar
- Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL: Hierarchical organization of modularity in metabolic networks. Science. 2002, 297: 1551-1555. 10.1126/science.1073374.PubMedGoogle Scholar
- Schuldiner M, Collins SR, Thompson NJ, Denic V, Bhamidipati A, Punna T, Ihmels J, Andrews B, Boone C, Greenblatt JF, et al: Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell. 2005, 123: 507-519. 10.1016/j.cell.2005.08.031.PubMedGoogle Scholar
- Ptacek J, Devgan G, Michaud G, Zhu H, Zhu X, Fasolo J, Guo H, Jona G, Breitkreutz A, Sopko R, et al: Global analysis of protein phosphorylation in yeast. Nature. 2005, 438: 679-684. 10.1038/nature04187.PubMedGoogle Scholar
- Li F, Long T, Lu Y, Ouyang Q, Tang C: The yeast cell-cycle network is robustly designed. Proc Natl Acad Sci USA. 2004, 101: 4781-4786. 10.1073/pnas.0305937101.PubMed CentralPubMedGoogle Scholar
- Ma'ayan A, Jenkins SL, Neves S, Hasseldine A, Grace E, Dubin-Thaler B, Eungdamrong NJ, Weng G, Ram PT, Rice JJ, et al: Formation of regulatory patterns during signal propagation in a mammalian cellular network. Science. 2005, 309: 1078-1083. 10.1126/science.1108876.PubMed CentralPubMedGoogle Scholar
- Joshi-Tope G, Gillespie M, Vastrik I, D'Eustachio P, Schmidt E, de Bono B, Jassal B, Gopinath GR, Wu GR, Matthews L, et al: Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 2005, 33 (Database issue): D428-D432. 10.1093/nar/gki072.Google Scholar
- Kanehisa M, Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000, 28: 27-30. 10.1093/nar/28.1.27.PubMed CentralPubMedGoogle Scholar
- Ramani AK, Bunescu RC, Mooney RJ, Marcotte EM: Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome. Genome Biol. 2005, 6: R40-10.1186/gb-2005-6-5-r40.PubMed CentralPubMedGoogle Scholar
- Ideker T, Galitski T, Hood L: A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet. 2001, 2: 343-372. 10.1146/annurev.genom.2.1.343.PubMedGoogle Scholar
- Muller HM, Kenny EE, Sternberg PW: Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol. 2004, 2: e309-10.1371/journal.pbio.0020309.PubMed CentralPubMedGoogle Scholar
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998, 95: 14863-14868. 10.1073/pnas.95.25.14863.PubMed CentralPubMedGoogle Scholar
- Bader JS, Chaudhuri A, Rothberg JM, Chant J: Gaining confidence in high-throughput protein interaction networks. Nat Biotechnol. 2004, 22: 78-85. 10.1038/nbt924.PubMedGoogle Scholar
- COG database. [http://www.ncbi.nlm.nih.gov/COG]
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13: 2498-2504. 10.1101/gr.1239303.PubMed CentralPubMedGoogle Scholar
- Donaldson I, Martin J, de Bruijn B, Wolting C, Lay V, Tuekam B, Zhang S, Baskin B, Bader GD, Michalickova K, et al: PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics. 2003, 4: 11-10.1186/1471-2105-4-11.PubMed CentralPubMedGoogle Scholar
- Zhu G, Spellman PT, Volpe T, Brown PO, Botstein D, Davis TN, Futcher B: Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth. Nature. 2000, 406: 90-94. 10.1038/35021046.PubMedGoogle Scholar
- Yoshimoto H, Saltsman K, Gasch AP, Li HX, Ogawa N, Botstein D, Brown PO, Cyert MS: Genome-wide analysis of gene expression regulated by the calcineurin/Crz1p signaling pathway in Saccharomyces cerevisiae. J Biol Chem. 2002, 277: 31079-31088. 10.1074/jbc.M202718200.PubMedGoogle Scholar
- Gasch AP, Huang M, Metzner S, Botstein D, Elledge SJ, Brown PO: Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec 1p. Mol Biol Cell. 2001, 12: 2987-3003.PubMed CentralPubMedGoogle Scholar
- Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO: Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000, 11: 4241-4257.PubMed CentralPubMedGoogle Scholar
- Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998, 9: 3273-3297.PubMed CentralPubMedGoogle Scholar
- Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown PO, Herskowitz I: The transcriptional program of sporulation in budding yeast. Science. 1998, 282: 699-705. 10.1126/science.282.5389.699.PubMedGoogle Scholar
- DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science. 1997, 278: 680-686. 10.1126/science.278.5338.680.PubMedGoogle Scholar
- Sudarsanam P, Iyer VR, Brown PO, Winston F: Whole-genome expression analysis of snf/swi mutants of Saccharomyces cerevisiae. Proc Natl Acad Sci USA. 2000, 97: 3364-3369. 10.1073/pnas.050407197.PubMed CentralPubMedGoogle Scholar
- Shakoury-Elizeh M, Tiedeman J, Rashford J, Ferea T, Demeter J, Garcia E, Rolfes R, Brown PO, Botstein D, Philpott CC: Transcriptional remodeling in response to iron deprivation in Saccharomyces cerevisiae. Mol Biol Cell. 2004, 15: 1233-1243. 10.1091/mbc.E03-09-0642.PubMed CentralPubMedGoogle Scholar
- Ogawa N, DeRisi J, Brown PO: New components of a system for phosphate accumulation and polyphosphate metabolism in Saccharomyces cerevisiae revealed by genomic expression analysis. Mol Biol Cell. 2000, 11: 4309-4321.PubMed CentralPubMedGoogle Scholar
- Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B, et al: Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006, 440: 631-636.PubMedGoogle Scholar
- Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, et al: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006, 440: 637-643.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.