Send your posts to emailaddress.jpg

Subscribe

About Bioinformatics

This page contains an archive of all entries posted to The Seven Stones in the Bioinformatics category. They are listed from oldest to newest.

Data integration is the next category.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
embo_logo.gif npg_logo.gif
Powered by
Movable Type 3.33

Main

Bioinformatics Archives

February 15, 2008

Transcription paused and poised for regulation

Research highlight by Frank C.P. Holstege, Department of Physiological Chemistry, University Medical Center Utrecht, the Netherlands.

MSB Research HighlightsFor eukaryotes, it is widely thought that transcription is primarily regulated through recruitment of the essential machinery to transcription start-sites. Previous hints challenging this paradigm have been confirmed by recent analyses showing that transcription regulation of a large number of genes actually occurs after recruitment. Mechanistically, such studies have gone furthest in Drosophila melanogaster (Muse et al, 2007; Zeitlinger et al, 2007). Here, conservative estimates indicate that more than 10% of genes are regulated through promoter-proximal pausing. On such genes, RNA polymerase II is recruited and initiates transcription, but then pauses around 50 bp downstream of the transcription start-site where it awaits further signals to resume elongation and complete transcription proper. These observations tie in with other observations made in yeast (Radonjic et al, 2005), embryonic stem cells (Bernstein et al, 2006; Lee et al, 2006) and differentiated mammalian cells (Guenther et al, 2007). There are numerous implications to these findings. For example, the widely assumed link between the presence of gene-specific transcription activators and full-length transcription appears to be much looser than expected. These results also underscore the importance of testing established models on a genome-wide scale. Indeed, other such surveys (Birney et al, 2007), indicate that to understand transcription, we may need to take into account even more surprises – such as the presence of ten times more start-sites than protein-coding genes and overlapping transcription units, etc… – than the post-recruitment mechanisms demonstrated in Drosophila.

Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, Fry B, Meissner A, Wernig M, Plath K, et al. (2006) A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125: 315-326

Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, et al. (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447: 799-816

Guenther MG, Levine SS, Boyer LA, Jaenisch R, and Young RA (2007) A chromatin landmark and transcription initiation at most promoters in human cells. Cell 130: 77-88

Lee TI, Jenner RG, Boyer LA, Guenther MG, Levine SS, Kumar RM, Chevalier B, Johnstone SE, Cole MF, Isono K, et al. (2006) Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 125: 301-313

Muse GW, Gilchrist DA, Nechaev S, Shah R, Parker JS, Grissom SF, Zeitlinger J, and Adelman K (2007) RNA polymerase is poised for activation across the genome. Nat Genet 39: 1507-1511

Radonjic M, Andrau JC, Lijnzaad P, Kemmeren P, Kockelkorn TT, van Leenen D, van Berkum NL, and Holstege FC (2005) Genome-wide analyses reveal RNA polymerase II located upstream of genes poised for rapid response upon S. cerevisiae stationary phase exit. Mol Cell 18: 171-183

Zeitlinger J, Stark A, Kellis M, Hong JW, Nechaev S, Adelman K, Levine M, and Young RA (2007) RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo. Nat Genet 39: 1512-1516

July 5, 2007

What would you do with a petaflop?

The Blue Brain Project of the Ecole Polytechnique Fédérale de Lausanne (EPFL) currently uses an IBM Blue Gene/L supercomputer, reaching peaks of 22.4 teraflops (flops=Floating Point Operations Per Second), to simulate a model of a mammalian neocortical column composed of 10'000 neurons (Markram, 2006). IBM recently announced the release of the new Blue Gene/P, which, in its largest configuration, would be more than 100 times more powerful than the EPFL Blue Gene/L, reaching peaks of 3000 teraflops (3 petaflops, 3×1015 flops, 100'000 times more powerful than a home PC).

Apparently, this leap in computing power represents a challenge even for software developers, in particular due to the increase in parallelism (The petaflop challenge, Nature News). Would this open up fundamentally new avenues in systems biology? What are the applications in systems biology that would benefit the most from such supercomputing power? What would you do if you had a petaflop computer...?

June 19, 2007

E. coli counts in base 117

Finding general laws on the organization principle of living organisms is a particularly difficult task in biology but certainly a central one in systems biology. Part of the difficulty in this endeavor is probably linked to the fact that "by its very nature, life is both contingent and particular, each organism the product of eons of tinkering, of building on what had accumulated over the course of a particular evolutionary trajectory" (Keller, 2007, see also our post). Such laws are thus particularly significant when they emerge from evolutionary constraints alone. In a recent paper published in PNAS, Matthew Wright and colleagues may well provide such an example by looking at the "chromosomal periodicity of evolutionarily conserved gene pairs" (Wright et al, 2007).

Using a comparative genomic approach, Wright and colleagues selected pairs of genes based on two simple criteria: 1) the genes of a pair have to have a tendency to be close together; 2) one gene of the pair should tend to be present only if the other gene is also present. Searching more than 100 bacterial genomes, 22'500 statistically conserved gene pairs could be identified. Looking at the distribution of distances between genes in a pair and at the density of conserved pair along E. coli chromosome, a strikingly regular pattern emerged: conserved pairs appear to be localized as clusters that are regularly spaced over the entire chromosome, with a regular inter-cluster interval of 117kb. In addition, this regular positional pattern correlates with the pattern of log-phase transcriptional activity along the chromosome: both positional and transcriptional grids are almost perfectly aligned, with the same 117kb periodicity (see figure below, from Figure 3b in Wright et al, Copyright 2007 The National Academy of Sciences of the USA).

thumb070618.jpg

The interpretation offered to explain these findings is that the regular spacing of conserved gene pairs may reveal underlying regularities of the structural spatial organization of E. coli chromosome. Specifically, a solenoid-like model with regular 117kb loops would imply that conserved pairs are preferentially located on one face of the chromosome. Correlation between the positional grid and the longitudinal profile of transcriptional activity suggests that this arrangement is coupled to functionally important characteristics (eg diffusion properties of the RNA polymerase or existence of transcription factories)

A patterned structure with a similar periodicity has been suggested by previous studies based the analysis of sequence features or on the profile transcriptional activity along the bacterial chromosome (Jeong et al, 2004, Carpentier et al, 2005, Allen et al, 2006). What is remarkable in the study by Wright and colleagues, is that the 117kb periodicity emerges so clearly by using solely evolutionary conservation criteria: chromosomal proximity and phylogenetic co-occurrence. The evolutionary forces that operate on a wide variety of genomes are thus able to reveal constraints on the overall structural organization of an entire bacterial chromosome. In turn, this finding implies that strong evolutionary selective pressure operate to shape the long-range organization of chromosomes. How general is this 117kb-periodicity law? Wright and colleagues were able to find a similar arrangement in C. crescentus and it will be interesting to see if a similar organization is observed in other genomes. Direct investigation of chromosomal conformation in vivo may also shed more light on the physical and functional mechanisms that explain the deep link between evolutionary conservation of local properties and a global architectural principle of a bacterial genome.

February 14, 2007

Connecting disease state to genetic modules

Diseases such as cancer are often related to collaborative effects involving interactions of multiple genes within complex pathways, or to combinations of multiple SNPs. To understand the structure of such mechanisms, it is helpful to analyze genes in terms of the purely cooperative, as opposed to independent, nature of their contributions towards a phenotype (Anastassiou, 2007).
Two papers currently published in Molecular Systems Biology address this question:thumb070214.jpg
  • Using an information-theoretic definition of synergy, Dimitris Anastassiou exposes a computational approach to identify ab initio sets of interacting genes linked to a given disease state or phenotype (Anastassiou, 2007). This definition of synergy, derived form a generalization of the concept of mutual information, can connect two levels of organization (for example: genes and disease phenotype) and reveal the structure of the cooperative effects underlying a phenotypic state.
  • Jim Collins and colleagues apply network inference techniques to identify key pathways involved in prostate cancer progression (Ergün et al, 2007). A compendium of 1144 expression profiles spanning multiple cancer types is used to train the "mode-of-action by network identification" (MNI) algorithm. When applied on the test set of prostate cancer profiles, the androgen receptor and several of its cognate target genes are identified as top genetic mediators. This signaling pathway would not have been detected by expression change alone or by pathway analysis using Gene Set Enrichment Analysis (GSEA).

January 19, 2007

Analyzing time-series expression data

tree-like Ziv Bar-Joseph and colleague describe their new method Dynamic Regulatory Events Miner (DREM) to analyze time-series gene expression data and combine them with static ChIP-chip experiments. The expression profiles are modeled using an extension of Hidden Markov Model that enforces a tree structure onto the expression profiles. The technique allows to deduce the condition-specific or time-dependent activity of transcription factors that explain the observed expression profiles.

sharp transitionsIn their analysis of developmental time-series of gene expression in Drosophila, Peer Bork and colleagues apply a more drastic principle to identify robust groups of genes that correlate with major development phases. They required "four points of low expression and four subsequent points of high expression (or vice versa) even if the amplitude change was relatively low (see Materials and methods). This type of convolution not only requires a sharp increase or decrease of expression, but also that the change in transcript level is consistent over a period of time, thereby reducing the rate of false positives owing to individual outliers."