Promoter architecture and the evolvability of gene expression

Evolutionary changes in gene expression are a main driver of phenotypic evolution. In yeast, genes that have rapidly diverged in expression are associated with particular promoter features, including the presence of a TATA box, a nucleosome-covered promoter and unstable tracts of tandem repeats. Here, we discuss how these promoter properties may confer an inherent capacity for flexibility of expression.

For example, dramatic differences in the body plan of related insects have been traced to differences in the expression of developmentally regulated genes [2][3][4], and the classic example of variation in beak shape among Darwin's finches appears to be controlled by variation in expression levels of the gene encoding Bmp4 [5]. Surveying 331 previously reported mutations underlying phenotypic changes, Stern and Orgogozo [6] found that approximately 22% were regulatory changes, and the proportion of documented regulatory changes is increasing annually and is even larger for inter-species differences.
More recent studies using advanced technologies, including microarrays or high-throughput sequencing, have compared the genome-wide expression programs of related species [7][8][9][10][11][12][13][14][15][16] or strains [17][18][19][20][21][22][23][24][25][26][27][28][29] and revealed thousands of differences in the expression of orthologous genes. Identifying the regulatory changes underlying specific expression differences has, however, been more difficult: little progress has been made in connecting expression divergence with regulatory sequence divergence, and the degree of sequence conservation at individual promoters and regulatory elements cannot predict the degree of expression divergence of the associated genes [30][31][32][33][34]. What has emerged is a more general distinction: some genes have a much greater propensity to diverge in their expression than others. Here we discuss recent studies in yeast on the promoter architectures underlying these differences, and how they may contribute to the evolvability of gene expression. Yeast is an excellent model for studying the evolution of gene expression because of its simplicity as a unicellular organism with short and welldefined promoter regions, ease of genetic manipulation and a wealth of functional genomics data.

The inherent capacity of genes for expression divergence
The notion that there are two kinds of promoters in yeast, with different functional and architectural properties, was developed long ago by Struhl and colleagues, who extensively studied the regulation of the adjacent yeast genes his3 and pet56 and suggested the presence of distinct core promoters that control constitutive versus inducible gene expression [35]. More recent studies have shown that these distinctions correspond to distinct evolutionary properties: whereas the expression of some genes has diverged between related yeasts the expression of others has remained stable. Notably, this gene-specific tendency is maintained in multiple studies comparing the genomic expression patterns of different yeasts. Despite the fact that these studies were on different sets of yeast strains or species grown in different environments, and that different quantities (expression levels or ratios) were measured and different computational and experimental methods used, their results show significant correlations: genes whose expression diverged according to one study were often found to diverge in the other studies [36].
Moreover, these genes also preferentially diverged in expression in 'mutation accumulation' experiments, where cells were allowed to accumulate mutations in conditions in which the effects of natural selection were minimized [37]. Thus, we believe that expression divergence of these genes in multiple datasets is not due to increased positive selection (or relaxation of purifying selection) [38], but instead reflects an inherent capacity for expression divergence. This capacity of a gene to evolve in expression can be quantified by measuring its 'expression divergence'that is, a mathematical quantification of how much the expression of a gene differs among evolutionarily related yeast species or strains [36].
Expression divergence correlates strongly with gene responsive ness, namely the extent by which a gene's expression is altered by the environment, and with expression noise [39,40], namely the extent by which a gene's expression differs among genetically identical cells [7,37]. That is, genes whose expression is strongly regulated between different conditions display noisy expression and evolve rapidly between related strains or species. Thus, it is possible that genes differ in their capacity for expression flexibility, which is manifested at various timescales: during evolution in response to mutations; during physiological responses to environmental changes; and within a population of cells as a result of stochastic fluctuations.

TATA boxes, nucleosome-free regions and expression flexibility
The capacity for expression divergence (or flexibility) has been linked to several characteristics of gene promoters. The simplest association is with the number of binding sites for transcriptional regulators: promoters of flexible genes are characterized by a relatively large number of binding sites [36,37]. This is perhaps not surprising, since the expression of genes with many regulators (and binding sites) can be affected by mutations in any one of these regulators (or promoter binding sites), thus increasing their mutational target size -that is, the number of possible mutations that would affect the expression of these genes.
One particular promoter binding site stands out for its large influence on expression divergence: promoters that contain a TATA box show a remarkable increase in expression divergence, as well as in responsiveness and in noise [7,36,37]. The distinction between genes with promoters containing a TATA box and those without stands when the number of transcriptional regulators or of promoter binding sites is controlled; it is also consistent among genes from different functional classes -for example, those encoding membrane proteins, genes encoding metabolic proteins, and genes encoding ribosomal proteins (although these different groups also differ widely in the proportion of genes with promoters containing TATA boxes) [7]. Strikingly, increased expression divergence of TATAcontaining genes has been observed in species ranging from yeast to mammals, including also mutation-accumulation lines of yeasts, flies and worms [7,37], suggesting that it reflects a general phenomenon. Interestingly, the promoters of TATA-containing genes are not associated with more mutations but only with increased expression divergence [7]. Thus, we propose that promoters carrying a TATA box are inherently more sensitive to genetic perturbations than TATA-less promoters. This is also consistent with the distinction between constitutive and inducible genes and with previous studies that demonstrated that a canonical TATA box is important for dynamic regulation of gene expression whereas other sequence elements are important for maintaining constitutive expression levels [35,41].
The TATA box is a ubiquitous core promoter element that is bound by the transcription pre-initiation complex (PIC). What could cause increased expression divergence of TATA promoters? Transcription can be considered as a two-step process: first the PIC is recruited by transcription factors and assembles at the core promoter together with RNA polymerase; and second, the polymerase is released from the PIC and transcribes the gene. The second step can be repeated multiple times (re-initiation) if the PIC remains bound to the core promoter, and this is believed to be facilitated by the TATA box [42][43][44]. Thus, a TATA box could increase the extent of re-initiation, thereby amplifying gene expression. Notably, the binding of the PIC to the TATA box and the binding of transcription factors to other sites could be cooperative [44]. This would make the effect of the TATA box on gene expression nonlinear, as any amplification of transcription factor binding would stabilize PIC binding and cause a further increase in re-initiation. In this way, TATA-containing genes could be more sensitive to regulatory mutations than TATA-less genes.
Importantly, TATA-containing promoters differ from other promoters not only in their expression flexibility but also in other properties [45], and so it is possible that these secondary characteristics underlie their increased expression flexibility. Perhaps the most notable feature of TATA promoters is their atypical chromatin structure [46][47][48]. At most yeast promoters, the region directly upstream of the transcription start site contains transcription factor binding sites and is nucleosome-free, increasing the accessibility of the binding sites to transcriptional regulators [49] (Figure 1). By contrast, at promoters with high expression flexibility, and at those containing a TATA-box, this region tends to be more occupied by nucleosomes (Figure 1). We and others have proposed that because nucleo somes are thought to interfere with the binding of regulatory proteins, the regulation of nucleosome states might fine tune the expression of these genes [46][47][48]50]. Such increased dependence on the regulation of chromatin structure is indeed observed: promoters that are relatively more occupied by nucleosomes show relatively large changes in expression when genes encoding chromatin regulators are mutated or deleted [48,51]. As with the effect of the number of transcription factors, an increased dependence on chromatin regulators increases the mutational target size, affecting expression of these genes. Any mutation in a gene encoding a relevant chromatin regulator, or an upstream gene regulating the activity of the chromatin regulator, could affect transcription of the downstream target gene.

Unstable tandem repeats
So far we have discussed the role of promoter architecture in the sensitivity to mutations, namely whether a mutation influences gene expression and to what extent. However, expression divergence could also be directly facilitated by mechanisms that increase the mutation rate (that is, the number of mutation events per unit of time) at particular promoters. Although the determinants of local mutation rates are still poorly understood, one property that has been shown to increase mutation rates is the presence of unstable tandem repeats.
A recent study revealed that about 25% of all yeast promoters contain unstable tandem repeats: short (1 to 150 nucleotide) stretches of DNA that are repeated head to tail [52]. For example, TAG-TAG-TAG-TAG-TAG-TAG-TAG is a trinucleotide repeat, with the unit TAG repeated seven times. Tandem repeats most often consist of short (2 to 6 nucleotide), AT-rich units that are repeated 10 to 30 times, and occur frequently about 20 to 100 nucleotides upstream of the transcriptional start site.
The number of repeat units changes at frequencies that are typically 10-to 10,000-fold higher than average point mutation frequencies. Changes in the number of repeat units may cause gradual changes in transcription, with a certain number of units yielding maximal transcription [52]. Thus, when tandem repeats occur within promoters, their inherent instability may give rise to variants displaying altered levels of transcription, generating a pool of phenotypic diversity that allows rapid divergence. The mechanism underlying repeat-based expression divergence has been proposed to have its origins in chromatin structure. AT-rich promoter repeats are known to influence local nucleosome positioning, and changes in the number of repeats affect the density and positioning of nucleosomes in the critical part of the promoter [52].

Expression divergence by cis and trans mutations
In contrast to divergence of coding regions, divergence of gene expression can originate both from mutations in local DNA sequence (cis mutations) -for example, a mutation that affects a promoter binding site or nucleosome positionand from mutations in other genes (trans mutations), such as those encoding transcription factors or chromatin regulators. Thus, increased divergence in the expression of genes could be due to their sensitivity to cis mutations or trans mutations or both. In some cases, such as variable repeat tracts, it is clear that the effect depends on cis changes. However, in other cases, the relative contribution of cis and trans mutations is unclear. For example, an increased dependence on nucleosome positioning could be due to cis mutations affecting nucleosome binding or to trans mutations affecting chromatin regulators.
Two approaches have been used to distinguish the effects of cis and trans mutations on gene expression on a genomic scale: genetical genomics [51,53] and analysis of hybrid species [15,54]. Results from both kinds of study suggest that divergence in the expression of flexible genes is due chiefly to trans mutations [15,51]. For example, genes that diverged between Saccharomyces cerevisiae and Saccharomyces paradoxus as a result of trans mutations displayed high divergence in seven different studies comparing expression of different S. cerevisiae strains or species [15]. In contrast, expression of genes that diverged by cis mutations displayed less divergence in the other seven studies. Furthermore, the presence of a TATA box or of an occupied pattern of nucleosomes (Figure 1) was primarily associated with increased effects of trans mutations rather than cis mutations [15,51].
These results are consistent with a model in which increased flexibility of promoters is due to increased dependence on Promoter architecture associated with expression flexibility [46][47][48].
Top: the architecture of a typical promoter in which nucleosomes are regularly positioned but are excluded from a particular region upstream of the transcription start site. This nucleosome-free region (NFR) contains accessible binding sites for (few) transcriptional regulators (TF). Bottom: the architecture of promoters with high expression flexibility. These promoters tend to have a TATA box and multiple other binding sites for transcriptional regulators. Nucleosome positions are more dynamic (double-headed arrows) and nucleosomes are not strongly excluded from any particular region, and therefore compete with transcriptional regulators at their binding sites. These promoters are thus dependent on the activity of multiple transcriptional regulators and chromatin regulators (CR), which increases their mutational target size. trans factors (Figure 2). This could include both the number of factors that influence the expression of a given gene (for example, a promoter occupied by nucleosomes is influenced by many chromatin regulators) or the extent to which these factors influence expression (TATA promoters, as well as occupied promoters, could be more sensitive to the binding of transcriptional regulators). Accordingly, promoters with particular architectures could be more tuned to the activity of various regulatory factors and thus more sensitive to evolutionary changes in their activity. Notably, such promoters would also become more sensitive to variation in the activity of these regulators through physiological changes or stochastic fluctuations, which could explain the connection between expression divergence, responsiveness and noise.

Promoter architecture and expression evolvability
Expression divergence is a major driver of evolutionary change and seems to be enriched at particular genes. As described above, expression divergence in yeast correlates with several promoter features, including a large number of binding sites, a TATA box, an occupied pattern of promoter nucleosomes, increased dependence on chroma tin regulators and unstable tandem repeats. Notably, controlling for one of these factors does not remove the effect of the others, suggesting that each of these factors have an independent effect on expression divergence. Many of these factors seem to exert their influence on expression divergence predominantly through trans effects, although others (for example, unstable repeats) involve cis effects.
As noted above, expression divergence (the extent to which expression of a gene evolves) correlates with expression responsiveness (the extent to which expression of a gene is changed in response to the environment). We believe that the promoter elements discussed above underlie expression flexibility of these genes on short timescales (responsive ness and noise), which are instrumental in the immediate response of a cell to the environment, as well as on longer timescales (expression divergence), which may allow evolutionary adaptation to novel conditions. In other words, the correlation between responsiveness and expression divergence may be due to their dependence on the same promoter properties.
The notion that responsive, inducible promoters differ from stable 'housekeeping' promoters, established by Struhl and colleagues [43,[55][56][57][58][59], has now been extended and linked to the evolvability of gene expression. However, much is still unknown. For example, the protein-DNA and protein-protein interactions that underlie the differential requirement of genes for general transcription factors, as well as the implications of these interactions for the dynamics of gene regulation, remain poorly understood.
The fact that promoter architecture correlates with expression evolvability (that is, the readiness with which gene expression evolves) raises the possibility that expression evolvability may be subject to selection. This could make it possible for the expression of some genes to remain robust to mutation, whereas other genes are inherently able to change rapidly in expression under evolutionary pressure. Consistent with this, we find that different promoter elements that are independently linked to expression evolvability preferentially coincide at the same genes, as if evolvability were selected in these genes. In this context, it is interesting to note that the group of rapidly diverging genes is enriched with plasma membrane genes and, in general, genes that interact with the cell environment [7] (Figure 2). These genes are needed to cope with changes in the environment and their flexibility may allow for rapid adaptation to new environments. Further studies will be required to examine this hypothesis.

Figure 2
Expression flexibility, mediated by promoter architecture, may be due to increased dependence on trans regulation and environmental changes. Genes with a TATA box, promoter occupied with nucleosomes and many binding sites are regulated more extensively by regulatory factors. These factors respond to extracellular signals, thus making the target genes responsive to environmental changes both on short timescales (responsiveness and noise) as well as on longer timescales (evolutionary changes). These flexible genes preferentially code for proteins that interact with the environment and mediate the response to environmental changes (curved arrow), and this may allow for rapid adaptation to new environments.

Environmental signals
Low flexibility Signal transduction TATA High flexibility TF 2 CR 1 TF 1