Reconstructing prokaryotic transcriptional regulatory networks: lessons from actinobacteria
© BioMed Central Ltd 2009
Published: 15 April 2009
Skip to main content
© BioMed Central Ltd 2009
Published: 15 April 2009
Reconstruction of transcriptional regulatory networks of uncharacterized bacteria is a main challenge for the post-genomic era. Recent studies, including one in BMC Systems Biology, address this problem in the relatively underexplored actinobacteria clade, which includes major pathogenic and economically relevant taxa.
Accumulation of such data in model organisms on a genomic scale has recently allowed representation of these regulatory interactions between transcription factors and their target genes as an ordered graph or a network. This transcription regulatory network provides a powerful theoretical framework to analyze the complete regulatory system of model organisms such as E. coli  or B. subtilis . Topological studies on such networks have revealed fundamental features that are common to other biological and non-biological networks, such as an approximation of the power-law degree distribution of regulatory interactions (few transcription factors regulate many genes, and most transcription factors regulate a low number of genes)  and the presence of certain stereotypical recurring patterns of connections called motifs  (Figure 1b,c). These features are important for deciphering the responses of organisms to the environment, as well as for biochemical engineering of pathways. Three recent papers [7–9] have now reconstructed transcription regulatory networks for several species of actinobacteria.
However, both these approaches are fraught with difficulties, including the fundamental problem of correctly identifying orthologous transcription factors. For example, the transcription factor birA, which regulates biotin synthesis, combines an amino-terminal winged-helix-turn-helix DNA-binding domain with a carboxy-terminal biotin ligase domain. Orthologs of birA in certain bacteria lack the DNA-binding domain and thus cannot function as transcriptional regulators of biotin regulons in those organisms. Therefore, mere identification of an ortholog might not predict transcription regulation. The binding sites are usually unknown for a significant fraction of transcription factors in an organism. Even when they are known, it is observed that orthologous transcription factors can regulate orthologous targets using divergent binding sites , indicating the limitations of the binding-site-based approach. Furthermore, earlier studies on the relative conservation of transcription factors and targets suggest that transcription factors are more frequently displaced or lost than targets . It has also been observed that the number of transcription factors encoded by a prokaryotic organism scales as a power law with respect to total gene number – larger genomes tend to have more transcriptional regulators per gene than would be expected from a linear increase with genome size. Taken together, these observations limit the scope of traditional transcription regulatory network reconstructions to well-conserved transcription factors and targets and probably work best with organisms that are phylogenetically related or are of similar genome size with a similar lifestyle .
A set of recent studies [7–9] offers new ways to tackle the challenges of reconstruction of transcription regulatory networks in uncharacterized organisms, in terms of both methodology and data. These studies focus primarily on members of the previously underexplored actinobacterial clade, including pathogens such as Mycobacterium tuberculosis and Corynebacterium diptheriae and industrially relevant organisms such as Corynebacterium glutamicum. The first of these, reported by Baumbach et al. in BMC Systems Biology (Figure 2c) , is a culmination of a series of studies on Corynebacterium and presents the assembly of a preliminary network for C. glutamicum derived from experimental results. It covers 72 transcription factors of the predicted 182 transcription factors in this organism (our unpublished results). The study  combined the conventional technique – detection of orthologous transcription factors and targets based on the C. glutamicum template – with binding site prediction to reconstruct networks in closely related uncharacterized corynebacteria: C. diphtheriae, C. efficiens and C. jeikeium. A key advance in this work was the adjustment of the initially inaccurately determined binding sites by shifting them by one or more positions, followed by motif searches to identify a more likely binding site. These adjusted binding sites were then used in conjunction with target gene conservation to predict actual interactions. From the results presented in this work it seems that such a dual approach, while conservative, might indeed delineate high-confidence interactions.
The second study  reconstructed the network of M. tuberculosis using a combination of experimentally documented interactions and orthology-based linkages, with an extension of these two sets of interactions using predicted operons (Figure 2d). Using this network, covering 43 of the approximately 235 transcription factors of this organism (after accounting for incorrect annotations; see below), together with microarray data, the authors were able to explore the shift in gene regulatory processes accompanying dormancy, which is a major pathogenic feature of M. tuberculosis .
The third study  represents a major development in terms of identification of new transcription factor-target interactions using a novel bacterial one-hybrid system. In this system, hybrid transcription factors are generated by fusing them to the α subunit of the RNA polymerase and tested for interaction with different bait DNA sequences by checking for activation of reporter genes adjacent to the bait sequence (Figure 2e). By this method the authors  were able to describe several novel transcription factor-target interactions related to responses to stress and redox and fatty acid metabolism in M. tuberculosis. Consequently, this study goes a long way in extending the network in this organism by increasing the coverage to 58 transcription factors.
A comparison of the networks from the two M. tuberculosis studies [8, 9] showed that only ten transcription factors and nine interactions are shared. We have also assembled a transcription regulatory network for M. tuberculosis, using the C. glutamicum network reported in the Baumbach et al. study  as a template, using the conventional ortholog-based transfer of interactions (our unpublished results). This inferred network had 397 interactions, of which 49 (12.35%) were detected by either of the two studies on M. tuberculosis [8, 9] and includes hubs that were present in both organisms, such as LexA and Crp (hubs are genes that regulate a large number of targets; LexA represses SOS-response genes and Crp is a cyclic AMP-dependent activator of gene expression). These observations strongly suggest that we are indeed far from the complete transcription regulatory network in either of these organisms. However, the independent support for about 12% of the M. tuberculosis interactions inferred using orthology-based techniques, even with these very incomplete networks, implies that this method has some value despite the known problems with it.
It is sobering that these studies [7–9] still cover a relatively small fraction of the complete networks of the respective organisms. It should also be kept in mind that all of them are influenced by the state of annotation of the gene and protein databases. We noticed that in each of the studies [7–9] there are instances of false positives generated as a result of incorrect annotation of non-DNA-binding proteins as transcription factors. We further observed that most organism-specific databases do not successfully identify all potential transcription factors encoded by a particular organism. For example, most studies report the number of transcription factors in M. tuberculosis as ranging from 150 to 194 [8, 9]. However, careful profile-based searches suggest that the actual number of transcription factors in this organism is closer to 235 (our unpublished results). Such underestimates are also observed in the case of C. glutamicum, suggesting that greater care needs to be applied to the detection and annotation of transcription factors.
Nevertheless, the studies [7–9] highlight some procedures that could result in improved reconstruction of transcription regulatory networks. Firstly, the success of the one-hybrid method in detecting entirely new interactions confirms that there is no substitute to an effective high-throughput experimental method in such endeavors. This is especially true because of the presence of lineage-specific transcription factors in most bacterial clades (such as the differentiation and sporulation factor WhiB in actinobacteria), displacement of regulatory hubs (evolutionary replacement of a highly connected transcription factor in the network by another phylogenetically distinct transcription factor) and the non-linear scaling of transcription factor counts with gene number [9, 10]. The C. glutamicum and M. tuberculosis network assembly efforts bring home the fact that there are already numerous individual studies in the literature that can be combined to provide a base for reconstructing a network for certain organisms. However, despite the recent progress in automatic text-mining tools , analysis of datasets such as those assembled in these studies [7–9] requires considerable human intervention to generate reliable transcription-factor-target connections. Finally, the novel approach of combining transcription factor-target orthology with adjusted transcription factor binding site predictions presented in the corynebacteria study  serves as a plausible model for making reliable predictions of interactions, at least for closely related taxa. This, in conjunction with high-throughput experimental studies targeting representatives across the prokaryotic tree, might indeed prove useful in future efforts towards accurate transcription regulatory network reconstruction.
This research was supported by the Intramural Research Program of the National Institutes of Health, National Library of Medicine. We thank Jan Baumbach for the assistance in obtaining the C. glutamicum transcription network.