Promoter architecture and the evolvability of gene expression
© BioMed Central Ltd 2009
Published: 14 December 2009
Skip to main content
© BioMed Central Ltd 2009
Published: 14 December 2009
Evolutionary changes in gene expression are a main driver of phenotypic evolution. In yeast, genes that have rapidly diverged in expression are associated with particular promoter features, including the presence of a TATA box, a nucleosome-covered promoter and unstable tracts of tandem repeats. Here, we discuss how these promoter properties may confer an inherent capacity for flexibility of expression.
Early in research on the molecular basis of phenotypic variation the focus was primarily on mutations in the coding regions (exons) of genes. But as first noted by King and Wilson , substantial physiological differences can be seen between closely related species despite almost identical sets of proteins, and it is now generally accepted that distinctions between species are defined not only by their ensemble of genes but, critically, by how those genes are regulated.
For example, dramatic differences in the body plan of related insects have been traced to differences in the expression of developmentally regulated genes [2–4], and the classic example of variation in beak shape among Darwin's finches appears to be controlled by variation in expression levels of the gene encoding Bmp4 . Surveying 331 previously reported mutations underlying phenotypic changes, Stern and Orgogozo  found that approximately 22% were regulatory changes, and the proportion of documented regulatory changes is increasing annually and is even larger for inter-species differences.
More recent studies using advanced technologies, including microarrays or high-throughput sequencing, have compared the genome-wide expression programs of related species [7–16] or strains [17–29] and revealed thousands of differences in the expression of orthologous genes. Identifying the regulatory changes underlying specific expression differences has, however, been more difficult: little progress has been made in connecting expression divergence with regulatory sequence divergence, and the degree of sequence conservation at individual promoters and regulatory elements cannot predict the degree of expression divergence of the associated genes [30–34]. What has emerged is a more general distinction: some genes have a much greater propensity to diverge in their expression than others. Here we discuss recent studies in yeast on the promoter architectures underlying these differences, and how they may contribute to the evolvability of gene expression. Yeast is an excellent model for studying the evolution of gene expression because of its simplicity as a unicellular organism with short and well-defined promoter regions, ease of genetic manipulation and a wealth of functional genomics data.
The notion that there are two kinds of promoters in yeast, with different functional and architectural properties, was developed long ago by Struhl and colleagues, who extensively studied the regulation of the adjacent yeast genes his3 and pet56 and suggested the presence of distinct core promoters that control constitutive versus inducible gene expression . More recent studies have shown that these distinctions correspond to distinct evolutionary properties: whereas the expression of some genes has diverged between related yeasts the expression of others has remained stable. Notably, this gene-specific tendency is maintained in multiple studies comparing the genomic expression patterns of different yeasts. Despite the fact that these studies were on different sets of yeast strains or species grown in different environments, and that different quantities (expression levels or ratios) were measured and different computational and experimental methods used, their results show significant correlations: genes whose expression diverged according to one study were often found to diverge in the other studies .
Moreover, these genes also preferentially diverged in expression in 'mutation accumulation' experiments, where cells were allowed to accumulate mutations in conditions in which the effects of natural selection were minimized . Thus, we believe that expression divergence of these genes in multiple datasets is not due to increased positive selection (or relaxation of purifying selection) , but instead reflects an inherent capacity for expression divergence. This capacity of a gene to evolve in expression can be quantified by measuring its 'expression divergence' - that is, a mathematical quantification of how much the expression of a gene differs among evolutionarily related yeast species or strains .
Expression divergence correlates strongly with gene responsiveness, namely the extent by which a gene's expression is altered by the environment, and with expression noise [39, 40], namely the extent by which a gene's expression differs among genetically identical cells [7, 37]. That is, genes whose expression is strongly regulated between different conditions display noisy expression and evolve rapidly between related strains or species. Thus, it is possible that genes differ in their capacity for expression flexibility, which is manifested at various timescales: during evolution in response to mutations; during physiological responses to environmental changes; and within a population of cells as a result of stochastic fluctuations.
The capacity for expression divergence (or flexibility) has been linked to several characteristics of gene promoters. The simplest association is with the number of binding sites for transcriptional regulators: promoters of flexible genes are characterized by a relatively large number of binding sites [36, 37]. This is perhaps not surprising, since the expression of genes with many regulators (and binding sites) can be affected by mutations in any one of these regulators (or promoter binding sites), thus increasing their mutational target size - that is, the number of possible mutations that would affect the expression of these genes.
One particular promoter binding site stands out for its large influence on expression divergence: promoters that contain a TATA box show a remarkable increase in expression divergence, as well as in responsiveness and in noise [7, 36, 37]. The distinction between genes with promoters containing a TATA box and those without stands when the number of transcriptional regulators or of promoter binding sites is controlled; it is also consistent among genes from different functional classes - for example, those encoding membrane proteins, genes encoding metabolic proteins, and genes encoding ribosomal proteins (although these different groups also differ widely in the proportion of genes with promoters containing TATA boxes) . Strikingly, increased expression divergence of TATA-containing genes has been observed in species ranging from yeast to mammals, including also mutation-accumulation lines of yeasts, flies and worms [7, 37], suggesting that it reflects a general phenomenon. Interestingly, the promoters of TATA-containing genes are not associated with more mutations but only with increased expression divergence . Thus, we propose that promoters carrying a TATA box are inherently more sensitive to genetic perturbations than TATA-less promoters. This is also consistent with the distinction between constitutive and inducible genes and with previous studies that demonstrated that a canonical TATA box is important for dynamic regulation of gene expression whereas other sequence elements are important for maintaining constitutive expression levels [35, 41].
The TATA box is a ubiquitous core promoter element that is bound by the transcription pre-initiation complex (PIC). What could cause increased expression divergence of TATA promoters? Transcription can be considered as a two-step process: first the PIC is recruited by transcription factors and assembles at the core promoter together with RNA polymerase; and second, the polymerase is released from the PIC and transcribes the gene. The second step can be repeated multiple times (re-initiation) if the PIC remains bound to the core promoter, and this is believed to be facilitated by the TATA box [42–44]. Thus, a TATA box could increase the extent of re-initiation, thereby amplifying gene expression. Notably, the binding of the PIC to the TATA box and the binding of transcription factors to other sites could be cooperative . This would make the effect of the TATA box on gene expression nonlinear, as any amplification of transcription factor binding would stabilize PIC binding and cause a further increase in re-initiation. In this way, TATA-containing genes could be more sensitive to regulatory mutations than TATA-less genes.
So far we have discussed the role of promoter architecture in the sensitivity to mutations, namely whether a mutation influences gene expression and to what extent. However, expression divergence could also be directly facilitated by mechanisms that increase the mutation rate(that is, the number of mutation events per unit of time) at particular promoters. Although the determinants of local mutation rates are still poorly understood, one property that has been shown to increase mutation rates is the presence of unstable tandem repeats.
A recent study revealed that about 25% of all yeast promoters contain unstable tandem repeats: short (1 to 150 nucleotide) stretches of DNA that are repeated head to tail . For example, TAG-TAG-TAG-TAG-TAG-TAG-TAG is a trinucleotide repeat, with the unit TAG repeated seven times. Tandem repeats most often consist of short (2 to 6 nucleotide), AT-rich units that are repeated 10 to 30 times, and occur frequently about 20 to 100 nucleotides upstream of the transcriptional start site.
The number of repeat units changes at frequencies that are typically 10- to 10,000-fold higher than average point mutation frequencies. Changes in the number of repeat units may cause gradual changes in transcription, with a certain number of units yielding maximal transcription . Thus, when tandem repeats occur within promoters, their inherent instability may give rise to variants displaying altered levels of transcription, generating a pool of phenotypic diversity that allows rapid divergence. The mechanism underlying repeat-based expression divergence has been proposed to have its origins in chromatin structure. AT-rich promoter repeats are known to influence local nucleosome positioning, and changes in the number of repeats affect the density and positioning of nucleosomes in the critical part of the promoter .
In contrast to divergence of coding regions, divergence of gene expression can originate both from mutations in local DNA sequence (cis mutations) - for example, a mutation that affects a promoter binding site or nucleosome position - and from mutations in other genes (trans mutations), such as those encoding transcription factors or chromatin regulators. Thus, increased divergence in the expression of genes could be due to their sensitivity to cis mutations or trans mutations or both. In some cases, such as variable repeat tracts, it is clear that the effect depends on cis changes. However, in other cases, the relative contribution of cis and trans mutations is unclear. For example, an increased dependence on nucleosome positioning could be due to cis mutations affecting nucleosome binding or to trans mutations affecting chromatin regulators.
Two approaches have been used to distinguish the effects of cis and trans mutations on gene expression on a genomic scale: genetical genomics [51, 53] and analysis of hybrid species [15, 54]. Results from both kinds of study suggest that divergence in the expression of flexible genes is due chiefly to trans mutations [15, 51]. For example, genes that diverged between Saccharomyces cerevisiae and Saccharomyces paradoxus as a result of trans mutations displayed high divergence in seven different studies comparing expression of different S. cerevisiae strains or species . In contrast, expression of genes that diverged by cis mutations displayed less divergence in the other seven studies. Furthermore, the presence of a TATA box or of an occupied pattern of nucleosomes (Figure 1) was primarily associated with increased effects of trans mutations rather than cis mutations [15, 51].
Expression divergence is a major driver of evolutionary change and seems to be enriched at particular genes. As described above, expression divergence in yeast correlates with several promoter features, including a large number of binding sites, a TATA box, an occupied pattern of promoter nucleosomes, increased dependence on chromatin regulators and unstable tandem repeats. Notably, controlling for one of these factors does not remove the effect of the others, suggesting that each of these factors have an independent effect on expression divergence. Many of these factors seem to exert their influence on expression divergence predominantly through trans effects, although others (for example, unstable repeats) involve cis effects.
As noted above, expression divergence (the extent to which expression of a gene evolves) correlates with expression responsiveness (the extent to which expression of a gene is changed in response to the environment). We believe that the promoter elements discussed above underlie expression flexibility of these genes on short timescales (responsiveness and noise), which are instrumental in the immediate response of a cell to the environment, as well as on longer timescales (expression divergence), which may allow evolutionary adaptation to novel conditions. In other words, the correlation between responsiveness and expression divergence may be due to their dependence on the same promoter properties.
The notion that responsive, inducible promoters differ from stable 'housekeeping' promoters, established by Struhl and colleagues [43, 55–59], has now been extended and linked to the evolvability of gene expression. However, much is still unknown. For example, the protein-DNA and protein-protein interactions that underlie the differential requirement of genes for general transcription factors, as well as the implications of these interactions for the dynamics of gene regulation, remain poorly understood.
The fact that promoter architecture correlates with expression evolvability (that is, the readiness with which gene expression evolves) raises the possibility that expression evolvability may be subject to selection. This could make it possible for the expression of some genes to remain robust to mutation, whereas other genes are inherently able to change rapidly in expression under evolutionary pressure. Consistent with this, we find that different promoter elements that are independently linked to expression evolvability preferentially coincide at the same genes, as if evolvability were selected in these genes. In this context, it is interesting to note that the group of rapidly diverging genes is enriched with plasma membrane genes and, in general, genes that interact with the cell environment  (Figure 2). These genes are needed to cope with changes in the environment and their flexibility may allow for rapid adaptation to new environments. Further studies will be required to examine this hypothesis.
We apologize for omission of relevant references due to space restrictions. Research in the lab of KJV is supported by the Human Frontier Science Program Award HFSP RGY79/2007, FP7 ERC Starting Grant 241426, VIB, the KU Leuven Research Fund and the FWO-Odysseus program. Research in the lab of NB is supported by the Helen and Martin Kimmel Award for Innovative Investigations, the EU (FunSysB), the Israeli Ministry of Science and the European Research Council (Ideas).