Send your posts to emailaddress.jpg

Subscribe

About Computational approaches

This page contains an archive of all entries posted to The Seven Stones in the Computational approaches category. They are listed from oldest to newest.

Biological approaches is the previous category.

Forum is the next category.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
embo_logo.gif npg_logo.gif
Powered by
Movable Type 3.33

Main

Computational approaches Archives

March 3, 2008

Less papers to read, more data to use...

In a nice post at bbgm, Deepak writes:

...historical online literature lacks the relevant structure and metadata to make our task easier, but it is time that publishers thought ahead about some of the advantages of online publishing.

thumb080303.jpg I can't agree more. I heard sometimes the claim that within 5-10 years, more than 95% of the scientific literature is going to be read by computers only. Possible. However, the converse alternative might be interesting to consider: what if 95% of scientific papers could be 'written' by computers? Even if this formulation is obviously provocative and unrealistic, the point is that harnessing the 'network effect' of the web may have two complementary components, one community- the other computer-driven. On one hand, web 2.0 functionalities enable community-driven commenting, rating and even writing of scientific publications. On the other hand, semantic web technologies are expected to facilitate computer-driven integration of scientific data from multiple sources, which is likely to play an increasingly important role in science. Rather than mining thousands of unread papers, the scientist of the future may rather search the web for relevant data first and integrate it to generate – or 'write' – novel insight. In fact, integration of large datasets already represents a major field of research in systems biology (see Chuang et al 2007, Xue et al 2007 or Mani et al 2008 as recent examples published in Mol Syst Biol).

It seems thus that, in addition of being web 2.0 enabled, new publishing models should 'embed' more structured data into online publications. In short, 'papers' could progressively transform into hybrid online objects that resemble more to database records (see Timo Hannay's post on this topic) or highly structured documents. At the extreme, one could even imagine to publish 'naked' datasets, without any 'stories' around them. Of course, efficient data integration will require the data to be in a standard and structured format and its quality will have to be well characterized. These are all far from trivial qualities.

The good old-fashioned papers are probably not going to disappear as publication units, in particular for high-impact studies reporting novel and deep insights. It is also not the point here to propose dumping every scientist's hard drive into the web. Data-rich publications would be published only when the authors would feel it appropriate. There might thus be some equilibrium to find between papers that will never be read except by a text mining engine and pure datasets, published as a resource, easier to search, to mine and to integrate. This dialectic may ultimately boil down to the issue of how well will text mining and data integration technologies perform in the future.

In any case, within the context of the current debate about the saturation of the peer-review system, I wonder whether a data-centric form of scientific publishing could help to release somewhat the pressure. Reviewing of datasets might be quicker and could rely more on standardized evaluation parameters. If assorted with proper credit attribution mechanisms and metrics of impact, data-rich (or even data-only) publications may represent an alternative model complementing the traditional 'paper' format. It would prevent the loss of useful data otherwise buried in verbal descriptions and, most importantly, would hopefully stimulate web-wide integration of disparate datasets.

February 26, 2008

A refreshing model: peppermint terpenoids

Research highlight by Doron Lancet, Crown Human Genome Center, Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel

MSB Research HighlightsLiving cells are typically asymmetric, having tens of thousands different biopolymers (proteins and polynucleotides), but merely <1000 types of small molecules, such as amino acids and lipids. An exception is certain plant cells that harbor members of ~40,000 strong group of low molecular weight terpenoids, often displaying a complex compositional balance essential for plant growth and survival (Aharoni et al, 2005). Understanding the intricacies of biosynthesis and interconversion of such unusual cellular components appears to require the full power of Systems Biology. In a recent paper, Rios-Estepa et al (2008) harness a systems approach, including iterative cycles of mathematical modeling and experimental testing, to help elucidate the metabolic dynamics of the terpenoid universe.

Specifically they ask how plants vary their monoterpene profiles in response to environmental stress – changing levels of illumination. A highlight of their results is that the variation of terpene metabolic fluxes is mediated by specific events in which members of the terpenoid repertoire exert a regulatory effect on terpene biosynthesis enzymes. Rewardingly, this is predicted by a computer simulation and subsequently verified by experiment. The broader conclusion, applicable to all living organisms, is that as the power of computing grows, it will become possible to make increasingly specific and accurate predictions, that will allow both a better global understanding and the successful engineering of cellular networks.


Aharoni A, Jongsma MA, Bouwmeester HJ (2005) Volatile science? Metabolic engineering of terpenoids in plants. Trends Plant Sci. 10:594-602.

Rios-Estepa R, Turner GW, Lee JM, Croteau RB, Lange BM (2008) A systems biology approach identifies the biochemical mechanisms regulating monoterpenoid essential oil composition in peppermint. Proc Natl Acad Sci U S A. 105:2818-2823

February 21, 2008

Top-down mapping of gene regulatory pathways

Trey Ideker videoIn a very recent lecture (see full video from NIH VideoCasting) given for the NIH Systems Biology Special Interest Group, Trey Ideker presents a great overview of the various strategies his group has been developing in the recent years in order to integrate multiple types of large scale datasets. While one of the most pervasive 'meme' about high-throughput measurement is that they are "notoriously unreliable" (see Hakes et al, 2008, for a recent example), Trey beautifully illustrates how predictive computational models and novel biological insights can be generated by sophisticated data integration strategies. Three types of applications are presented in his talk:

  1. mapping of transcriptional response pathways
  2. functional mapping of protein complexes
  3. disease diagnosis and stratification

In the last section, Trey presents the study recently published in Molecular Systems Biology (Chuang et al, 2007, video: 00hr:39min:15sec) where the information provided by microarray expression profiling is superposed to a protein-protein physical interaction network to identify 'subnetwork' biomarkers that classify metastatic vs non-metastatic breast tumors.

February 15, 2008

Transcription paused and poised for regulation

Research highlight by Frank C.P. Holstege, Department of Physiological Chemistry, University Medical Center Utrecht, the Netherlands.

MSB Research HighlightsFor eukaryotes, it is widely thought that transcription is primarily regulated through recruitment of the essential machinery to transcription start-sites. Previous hints challenging this paradigm have been confirmed by recent analyses showing that transcription regulation of a large number of genes actually occurs after recruitment. Mechanistically, such studies have gone furthest in Drosophila melanogaster (Muse et al, 2007; Zeitlinger et al, 2007). Here, conservative estimates indicate that more than 10% of genes are regulated through promoter-proximal pausing. On such genes, RNA polymerase II is recruited and initiates transcription, but then pauses around 50 bp downstream of the transcription start-site where it awaits further signals to resume elongation and complete transcription proper. These observations tie in with other observations made in yeast (Radonjic et al, 2005), embryonic stem cells (Bernstein et al, 2006; Lee et al, 2006) and differentiated mammalian cells (Guenther et al, 2007). There are numerous implications to these findings. For example, the widely assumed link between the presence of gene-specific transcription activators and full-length transcription appears to be much looser than expected. These results also underscore the importance of testing established models on a genome-wide scale. Indeed, other such surveys (Birney et al, 2007), indicate that to understand transcription, we may need to take into account even more surprises – such as the presence of ten times more start-sites than protein-coding genes and overlapping transcription units, etc… – than the post-recruitment mechanisms demonstrated in Drosophila.

Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, Fry B, Meissner A, Wernig M, Plath K, et al. (2006) A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125: 315-326

Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, et al. (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447: 799-816

Guenther MG, Levine SS, Boyer LA, Jaenisch R, and Young RA (2007) A chromatin landmark and transcription initiation at most promoters in human cells. Cell 130: 77-88

Lee TI, Jenner RG, Boyer LA, Guenther MG, Levine SS, Kumar RM, Chevalier B, Johnstone SE, Cole MF, Isono K, et al. (2006) Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 125: 301-313

Muse GW, Gilchrist DA, Nechaev S, Shah R, Parker JS, Grissom SF, Zeitlinger J, and Adelman K (2007) RNA polymerase is poised for activation across the genome. Nat Genet 39: 1507-1511

Radonjic M, Andrau JC, Lijnzaad P, Kemmeren P, Kockelkorn TT, van Leenen D, van Berkum NL, and Holstege FC (2005) Genome-wide analyses reveal RNA polymerase II located upstream of genes poised for rapid response upon S. cerevisiae stationary phase exit. Mol Cell 18: 171-183

Zeitlinger J, Stark A, Kellis M, Hong JW, Nechaev S, Adelman K, Levine M, and Young RA (2007) RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo. Nat Genet 39: 1512-1516

February 12, 2008

Information processing in signaling networks

Research highlight by Charles Auffray, Functional Genomics and Systems Biology for Health, UMR7091, CNRS and Pierre & Marie Curie University—Paris VI, Villejuif, France

MSB Research Highlights The work presented by Helikar et al. (2008) in a paper recently published in the PNAS represents a promising new step in the development of computational cellular physiology in eukaryotes. From curated cellular and biochemical data available in the literature, the authors have assembled a discrete Boolean model of signal transduction comprising 130 nodes, and examined in a systematic and controlled manner how varying combinations of external inputs translate into a range of cellular responses. The qualitative model is not only able to reproduce known input-output relationships representative of major transduction pathways, but it also provides evidence in support of the emergence of information-processing functions from the complex cellular network of molecular interactions. This is strikingly demonstrated by the fact that a large sample of randomly selected input combinations result in a very limited fraction of the possible outputs, which correspond to well-characterized global biological responses, a result which is obtained irrespective of the level of noise introduced in the inputs of the model. Moreover, similar input combinations are neatly clustered by the model into equivalence classes of global outputs, reflecting the ability of the cell to integrate complex environmental signals and translate them into robust specific responses and behaviours through common intracellular pathways. While discrete Boolean modelling makes it possible to highlight emergent properties of transduction networks, overcoming the hurdle of parameter estimation, very much as in classical physiology, it provides only high-order views in the form of black boxes with limited predictive and explanatory power. Integration with continuous models will be essential to unravel and engineer the underlying mechanisms.

Helikar T, Konvalina J, Heidel J, Rogers JA (2008). Emergent decision-making in biological signal transduction networks. PNAS 105, 1913-1918

January 18, 2008

Will probiotics bring systems biology to our table?

(via Scintilla)

thumb080118.jpgThe article on "Probiotics modulation of mammalian metabolism" published this week in Molecular Systems Biology by Jeremy Nicholson and colleagues (Martin at al, 2008) has attracted some attention (read the nice summary in Science News) in some (very) popular media (here, here, here and here).

In this follow-up study of the paper published last year (Martin et al, 2007), the team lead by Jeremy Nicholson, in collaboration with Nestlé, demonstrates clear physiological effects of oral probiotics administration on mice harbouring a humanized microbiome. The effects are intricate: both the host flora and metabolism are altered. By analyzing metabolite pools in several compartments (liver, blood, urine, feces, gut), and following in parallel the host microbiota, patterns of correlations between microbial species and metabolites start to be visible and reveal the probiotics-induced modulation of the microbial-mammalian interactions. But the actual paper is really just next door (synopsis), so have a look...

How will these results translate to humans? What will be the best way to influence our microbiome? Drugs or yoghurt? These are fascinating questions and the understanding of how our physiology depends on the microbial flora could have profound consequences, particularly in these times when we seem to be in a "rush to gene-based solutions to all our problems" (Wilson, 2007). Will personal genomics have to ultimately develop into personal metagenomics to include our "extended" microbial genome?

Even if I usually prefer to resist the temptation of a self-promoting section in this blog, I find the attention of the media for this topic interesting (despite the usual variable accuracy of newspaper reports) because it points to an area where systems biology provides insights into topics of immediate interest to the general public.

The NIH has recently started its Human Microbiome Project. In this context, this study also underscores the importance of developing model systems and tools to manipulate the microbiome and to analyze the incredibly dense and intricate interactions that connect host and microbial species. A field where top-down systems biology seems indeed a very pragmatic and promising approach.

January 14, 2008

Morphogen Paradoxes

Bicoid morphogen gradientA controversy seems to be brewing over some recent theories and quantitative analyses addressing the fundamental question of how the Bicoid morphogen gradient is established and decoded in early Drosophila embryos. The transcription factor Bicoid controls the anterior-posterior patterning of the developing embryo. It is translated from maternal mRNA localized at the anterior pole of the egg and its graded distribution activates, in a concentration-dependent manner, the expression of gap genes, thus determining their spatial domain of expression. Synthesis from a localized source combined with diffusion and uniform degradation of the Bicoid morphogen provides one of the simplest models to explain the approximately exponential shape of its gradient. While, historically, patterning has been thought to rely on the gradient at its steady state – that is when synthesis, transport and degradation processes balance each other – the question arose as to whether steady-state can be reached rapidly enough in the quickly developing embryo (Lander, 2007).

In February last year, Naama Barkai and colleagues published a study (Bergmann et al, 2007) in which they propose that the gradient would in fact be interpreted before it has reached its steady-state, when the gradient is still "moving". Experimental evidence for a dynamic evolution of Bcd profile between cleavage cycle 11 and 12 is provided using a reporter gene driven by bicoid-binding sites. These authors further show that a pre-steady-state model implies a reduced sensitivity of the gradient readout to variations in the production of morphogen at its source. One biologically relevant example of this robustness is the observation that the domain of expression of hunchback, a Bicoid target gene, shifts much less in embryos from mothers with altered bicoid gene dosage than would be predicted by a steady-state model.

A few months later, Thomas Gregor and colleagues published two papers (Gregor et al, 2007a, 2007b) reporting a detailed analysis of the profile and dynamics of the Bicoid gradient. Quantitative in vivo imaging of a transgenic bicoid-eGFP reporter revealed several paradoxes. While a stable gradient of nuclear Bicoid is quickly established (within 90min, approx. cleavage cycle 9), the (local) diffusion coefficient of Bicoid, as deduced from photobleaching experiments, appears to be far too small (D=0.3 μm2/s, much less than expected from previous estimations made by injecting labeled dextran molecules) to be compatible with such a rapid establishment of the (long-range) gradient by diffusion alone. These experiments further show that nuclear Bicoid is under a highly dynamic nuclocytoplasmic equilibrium, pointing to a fundamental role for the nucleus in gradient establishment and stability. Finally, the precision with which the Bicoid gradient is transformed into Hunchback expression (see illustration, after Gregor et al 2007b) is estimated to be around 10%. This remarkable level of precision would not only be close to the physical limits of the system, but also strikingly matches the accuracy required to detect changes of Bicoid expression between adjacent cells (10%, equivalent to a difference of only 70 Bicoid molecules per nucleus) and the level of reproducibility of the absolute morphogen concentration from embryo to embryo (10% as well).

In a Correspondence published last week, Bergmann and colleagues (2008) dispute these interpretations and claim that a "reanalysis of their [Gregor et al's] data demonstrates that their findings are consistent with the well-accepted paradigm of diffusion-based patterning and provides further support for the notion that the Bicoid profile is decoded prior to reaching its steady state". Thus, according to these authors, constant nuclear Bicoid levels are not indicative of steady-state of the gradient itself given that cytoplasmic levels may still be changing. The small diffusion coefficient of Bicoid would then be an additional argument in favor of the necessity of a pre-steady-state decoding mechanism. If this is the case, the differences in Bicoid levels between adjascent cells would be much bigger at cleavage cycle 9 (50% instead of 10% at cycle 14), thus resolving the paradox of the high precision of the hunchback response.

In their response (Bialek et al, 2008), Gregor and colleagues reply that if cells would make a decision by reading Bicoid concentration at cycle 9, the boundary between expression domains would be 5 cells wide at stage 14 (=\sqrt{2^14/2^9}), while in reality it is only a single cell wide. While they agree that the overall gradient might not be at steady-state at these early stages, they argue that the stability of nuclear Bicoid levels is functionally highly relevant given that Bicoid is a transcription factor. Finally, they also point out that the deduced local diffusion constant is so small that it is in fact incompatible with observing any Bicoid in the middle of the embryo in the first place, thus suggesting the existence of additional mechanisms to explain establishment of the gradient at the scale of the entire embryo. These and some additional arguments lead Bialek et al to conclude that "the small values of the diffusion constant for Bcd we reported are superficially consistent with their model, but the model provides no basis for understanding any of our observations."

Mmmmh... not an easy one. Those who have additional insights into these subtle but fascinating questions, please let us know!

January 11, 2008

Consumer Health Information Technology

Play video I highly recommend to visit the NIH VideoCasting page, which hosts many interesting video/podcasts. Even if I realize that this is a bit old according to the blogosphere time scale, I would like to point to this one: "The Future: Consumer Health Information Technology", featuring talks given at a NCI-sponsored meeting on Dec 10, 2007 by Adam Bosworth (formerly "Google Health architect", now starting his own company Keas), Bern Shen (Intel) and Bill Crounse (Microsoft).

In his introduction to the meeting, Bradford Hesse (NCI) colorfully summarizes one of the main concepts exposed by the speakers (the video is very long, so I give some pointers: 0h16min43sec) by comparing the future of healthcare to...an "IKEA flat pack": patients will progressively be empowered to assemble their own care from home, like they would build a piece of (cheap) furniture.

Adam Bosworth (0h25min53sec) presents his very pragmatic vision of how IT could concretely help healthcare (0h39min07sec): a) help the consumer to own and control his personal health data, and this already for very simple basic information; b) provide tools for doctors so that they can deliver personalized care as easily as producing a spreadsheet; c) develop tools for researchers to facilitate the design and implementation of new protocols and clinical trials.

Bill Crounse (Microsoft's other Bill...1h14min30sec) sees 5 major current trends that will increasingly challenge the healthcare system and call for IT solutions (1h26min22sec): a) increasing personal responsibility ("the end of health insurance"); b) progressive "retailization" of healthcare services (eg appearance of "retail minute clinics"); c) commoditization of healthcare providers; d) globalization of access to information (through the web of course); e) globalization of healthcare services. I recommend his little funny anecdote on the high-tech GPS wireless-connected plumber (1h25min30sec) who appears to better equipped than any practicing physician...

The speakers also all insist on the need for massive data integration promoted by the interoperability of formats and coding information, themes that probably sound familiar to many systems biologists.

Toward the end of his talk (1h35min00sec), Bill Crounse shows a short "science-fiction" movie on Microsoft's vision of the future of healthcare: a world full of credit-card sized tablet PCs, touch screens and many other very exciting gadgets (I love gadgets!). But I can't help missing a bit the warmth of human-to-human interactions within this jungle of virtual consultations, retail clinics, remote controlled metabolic parameters, etc... and I didn't quite see in that movie that the doctor would spend more time with his patient or the daughter with her sick Grandma. But this may of course only reflect some old-fashioned side of my temperament...

November 20, 2007

Personal genomics for a fistful of dollars

The wave of personal genomics is progressing rapidly. A string of four papers appeared recently (Porreca et al, 2007, Albert et al, 2007, Okou et al 2007, Hodges et al, 2007) reporting on microarrray-based technologies that enable the enrichment of selected genomic fragments in a single massively multiplexed reaction, thus greatly facilitating subsequent resequencing of pre-defined portions of the human genome (eg all coding exons). These technologies are expected to reduce dramatically the cost of targeted resequencing of individual genomes.

On the commercial front, deCODE and 23andMe have launched their personal genome service offering genome-wide SNPs profiling for a little less than $1,000 (NYT articles: Nicholas Wade, Amy Harmon, or Wired, ScienceRoll, Sandra, DNA and You).

The chips used by 23andMe are the "Illumina HumanHap550+ BeadChip, which reads more than 550,000 SNPs (single nucleotide polymorphisms) plus a 23andMe custom-designed set that analyzes more than 30,000 additional SNPs." The profile provided by deCODEme includes "over one million variants across the genome."

So what do you think?

November 18, 2007

Glia-neuron interactions

thumb071115b.jpg Nature Neuroscience has a nice special focus on glia and disease. The featured reviews and perspective articles discuss multiple aspects of neuron-glia interactions and their role in disease. The reason why I am highlighting this collection here is that I have the feeling that this field could potentially be a nice playground for systems biology.

For example, Rossi and colleagues (2007) review the various metabolic processes affected during brain ischemia. Several of the examples discussed illustrate very well how the extent of brain damage is determined by the concurrent dynamics of both harmful and protective processes engaging complex interactions between neurons and astrocytes. A critical determinant for ischemic damage is the catastrophic loss of ATP levels caused by deficient glucose and oxygen delivery. Astrocytes have glycogen stores that can normally be converted to lactate which is exported to neurons to provide energy during phases of high activity. In absence of oxygen however, lactate can no longer be oxidized. In this case, glucose may then help delay loss of ATP levels, via anaerobic glycolysis. But this beneficial effect might be counteracted by lactic acidosis caused by continued glycolysis in the absence of O2, which is known to accentuate ischemic damage in the case of hyperglycemia. Moreover, acidosis may activate Na+-H+ exchange, cytosolic Na+ accumulation, reversal of Na+-Ca2+ exchange resulting in astrocyte Ca2+ overload, either impairing their protective functions or even killing them.

A similar complexity is seen in the events underlying ischemic glutamate release. Loss of cellular ATP levels impairs the function of the Na+-K+ ATPase and thus disrupts ionic gradients. The resulting depolarization leads to a large increase in extracellular glutamate that is amplified by positive feedback, ultimately resulting in neuronal death by excitotoxicity. Astrocytes may contribute to increased extracellular glutamate levels via direct vesicular glutamate release and vesicular ATP release that in turn activates glutamate-permeable P2X receptors. Glutamate reuptake is normally carried out by five high-affinity sodium-dependent glutamate transporters. Disruption of transmembrane potential and of ionic gradients can cause transporter reversal thus further contributing to glutamate release. This depends in turn on the intracellular glutamate concentration which is much higher in astrocytes than neurons, determining the relative kinetic of neuronal and astrocytic reuptake/release as the ischemic perturbations progress. Further details are visible on Figure 3 from Rossi et al (2007):

Even if this short overview is condensed and incomplete, it suggests to me that quantitative measurements and integrated modeling could be quite helpful, if feasible, to understand the various contributions of the many processes involved and to identify potential points of protective synergies or characterize regimes under which the stability of the astrocyte-neuron system is catastrophically compromised. Perhaps this type of model and its calibration could even serve as a starting point to investigate the involvement of astrocytes in computational aspect of neuronal functions (Wang et al, 2006).