Send your posts to emailaddress.jpg

Recently Commented On

Quick links

MSB
Connotea
Precedings
Pipe
Creative Commons License
This weblog is licensed under a Creative Commons License.
embo_logo.gif npg_logo.gif
Powered by
Movable Type 3.33
August 18, 2008

SciFoo: scientific fireworks

In his list of eight 'generative' values (Better Than Free), Kevin Kelly includes 'embodiment'–the actual physical realization of an item or event which could be otherwise freely distributed over the web. While we are all 'hyperlinked' on the Internet, the value of those unique qualities that cannot be generated or "copied" on the web is dramatically increased. The type of intense emulation and shared excitement sparked at the recent Science Foo Camp (SciFoo 2008), organized by Nature, Google and O'Reilly, gave a wonderful example of the unique value of direct human exchange during an exclusive event bringing together roughly 200 top scientists, 'geeks' and other technologists at the Googleplex in Mountain View, California.

SciFoo is a so-called 'unconference': there is no program or more precisely, as Timo Hannay explained during the opening of the conference, the attendees are the 'program'. The actual schedule was defined only on the first evening in a purposefully chaotic process by anyone who wished to organize a session on any topic. For the next two days, in a festival of parallel sessions, astrophysicists, 'googlers', technologists, molecular biologists, taxonomists, game designers, flying car constructors, publishers, thinkers and (some) dreamers discussed and exchanged ideas with great enthusiasm and a rare intensity and openness.

Needless to say that deciding which session to attend was close to impossible... In any case, I ended up following three types of talks: a series on systems biology related topic (data integration, machine learning, personal genomics, baroque structure of the transcribed genome), several (of many) sessions focused on the theme of open data/science and finally some more eclectic sessions (only from my standpoint, of course) on diverse topics such as the foundations of the concept of time in physics, on some demonstration of very simple yet powerful Python scripting exercises to analyze text and the potential of game design to harness our 'cognitive surplus'. I cannot possibly summarize all the talks, interactions and impressions gathered at this meeting, but here are a few subjective excerpts:

  • There were quite a few sessions on open science and open data. Ernst Hafen made a strong case for the need of a unique AuthorID that would help in tracking the multiple aspects of researchers' scientific activities. With regard to data, Google announced that a new service will soon be launched, Google Research Datasets, offering to host, for free, large datasets of any type. The service will allow inclusion of some minimal meta-data about the submitted datasets and will provide a mechanism to define a delay before the dataset is made publicly visible. This will probably become a very simple and convenient way for storing data (in particular if a useful API is developed), so convenient in fact, that we may have to be a little careful that it will not turn into a temptation to bypass the 'minimal information...' standards usually required by traditional public databases.
  • George Church provided an overview of the Personal Genome Project (PGP) and described the type of biological data that will be integrated with the genomic and genetic information collected from consenting PGP volunteers: analysis of the transcriptome of pluripotent stem cells derived from the subjects; sequence of the repertoire of recombined V-D-J regions in immune cells ('VDJome') to exploit correlations between given V-D-J sequences and antigen-specific stimulations; characterization of the microbiome used as a tracer of the environmental and physiological conditions; record of phenotypic traits and disease conditions using controlled vocabularies. Finally, George also emphasized the exponentially decreasing cost of sequencing, which will not only make large scale sequencing of full personal genomes feasible but will also potentially open entire new fields of applications based on massive DNA sequencing.
  • Lee Smolin talked about the nature of the concept of time in physics and investigated the question of whether our perception of time as the 'experience of successive present moments' is 'real' or, alternatively, an emergent property of the laws of physics. I cannot pretend I followed the entire argument, but I learned that the mathematical representation of the physical reality involves the geometrization of time (as one of the state space's dimensions), leading in fact to a representation devoid of temporal flow (somehow the clock has to be outside the system). To this geometrical representation, physical laws are associated and applied to initial conditions. If I did not misunderstand it, it appears that this approach used in physics might have to be considered as approximative because it may only be valid for subsystems of the universe whereas it might not be appropriate for a true cosmological theory of the entire universe, with possibly disturbing consequences on the nature of physical laws...
  • Believe it or not but music can be 'geekified' as well: Chris diBona, later in the evening, brought his tenori-on for a fun demonstration. I want one of those!

The meeting ended with some final scientific fireworks, when some of the speakers gave a series of brilliant 2 min summary talks, providing a colorful overview of the many sessions we inevitably had missed. I have to admit that I like fireworks and I would certainly have enjoyed having a little more of this final kaleidoscopic view of science. Clearly, the authentic value of this conference lies in the unique and direct human interactions, but I wish there would be nevertheless some way–perhaps by using this last session in some form of outreach action–to disseminate this pure joy of scientific diversity and curiosity to a broader audience.

Credits: illustrations from Bob Lee, Flickr, some rights reserved

July 26, 2008

Soon Sci Foo!

A last very quick post before going on vacation (Swiss Alps...). In two weeks I will have the great privilege to attend the mythic SciFoo 'un-conference' at the Googleplex in Mountain View, California. Many ideas of exciting sessions are already circulating. I would just like to add my support to Cameron Neylon's proposal for a discussion around the issue of building a 'Science Data Commons'. The availability and 'integrability' of scientific data represent probably some of the major challenges in scientific communication and, obviously, I would be excited to see if, from the discussions at Sci Foo, some ideas will emerge on how scientific journals can take concrete and pragmatic steps to help making scientific data readily available in a useful form.

July 23, 2008

ISMB 2008: micro-blogging at its best

Probably like many others, I have often been puzzled by the phenomenon of 'micro-blogging', which consists in posting very short messages on the web (typically via sites such as Twitter) with the goal of providing an instantaneous description of the activity, state of mind or thoughts of the writer. The last few days, a small group of bloggers attending the ISMB 2008 Conference in Toronto used a form of collective micro-blogging on FriendFeed in an intensive way to cover many of the talks held at the conference.

Particularly interesting was the coverage of several keynote lectures, often commented simultaneously on a single 'feed' by several bloggers in the audience, providing so to say a real-time example of 'crowdsourcing'. The result is a surprisingly useful set of notes, where the combined attention and complementary knowledge of the participants allow some gaps to be filled, provide additional information (including references or links) and follow the flow of the presentation as it unfolds. I provide below a few picks, relevant to systems biology, while the rest can be consulted (and, importantly, searched!) in the ISMB 2008 Room' on FriendFeed. Good job & many thanks!

July 18, 2008

The impact of online publishing

"I haven't browsed a table of content in ages; I find all my papers by Pubmed searches anyway". We have probably all heard this remark, which reflects a general trend as how online publishing has changed the way we retrieve scientific publications. In a study published today in Science, Evans ("Electronic Publication and the Narrowing of Science and Scholarship", Evans, 2008) presents data on citations patterns showing that the appearance of electronic publications has been accompanied by a decrease in the number of citations and a progressive restriction of citations to recent papers:

Collectively, the models presented illustrate that as journal archives came online, either through commercial vendors or freely, citation patterns shifted. As deeper backfiles became available, more recent articles were referenced; as more articles became available, fewer were cited and citations became more concentrated within fewer articles.

The interpretation offered is that online availability has driven citations to become more focused while less relevant articles are more easily filtered out. In addition, Evans argues that facile navigation through the network of hyperlinked citations may amplify the tendency to be influenced by other's choice when citing "reference" studies and thus accentuates the dominance of a restricted number of articles:

By enabling scientists to quickly reach and converge with prevailing opinion, electronic journals hasten scientific consensus. But haste may cost more than the subscription to an online archive: Findings and ideas that do not become consensus quickly will be forgotten quickly.

It is probably difficult to be sure that all sources of bias and confounding factors can be eliminated in this type of analysis. For example, on the Friendfeed discussion thread, LJ Jensen asks whether the sheer amount of published research could explain why scientist restrict their citation to the most recent literature. See also some additional discussion in the associated News & Views (Couzin, 2008)

In any case, the study highlights two complementary strategies in information retrieval: finding relevant papers by targeted searches versus staying informed on a broad range of topics by systematic browsing. In our Google-driven era, we may have the tendency to forget the importance of good old-fashioned 'table-of-content-skimming' to stimulate cross-disciplinary thinking, widen our horizon and cultivate scientific curiosity.

Perhaps it is a specificity of printed media to provide "poor indexing" and therefore enforce broad exposure to unrelated areas of research. On the other hand, some web technologies already help to browse through vast amounts of online publications (for example an RSS aggregator helps me to generate a daily literature survey; this can be further combined, for example here at Frienfeed, with other community-centered feeds; other aggregators highlight information by automatic clustering: Postgenomic and Scintilla). However, these tools remain imperfect and, in our reflection on the future of scientific publishing, we will need to find the right balance between the two strategies above and think of how the increasing efficiency of search engines can be complemented by means providing a continuous exposure to diversity.

July 10, 2008

Fascinating correlations or elegant theories?

Chris Anderson, Editor-in-Chief of Wired , wrote a few weeks ago a provocative piece "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete", arguing that in our Google-driven data-rich era ("The Petabyte Age") the good old "approach to science —hypothesize, model, test — is becoming obsolete", leaving place to a purely correlative vision of the world. There is a good dose of provocation in the essay and it was quite successful in spurring a flurry of skeptical reactions in the blogosphere, FriendFeed-land and lately in Edge's Reality Club.

I know that it is a bit late to write a post on this but this debate reminds me of the bottom-up vs top-down dialectic in (systems) biology. The tradition in molecular biology has been to focus on molecular mechanisms–a series of molecular events–that explain given biological functions. With detailed knowledge on the properties of an increasing number of components, bottom-up mechanistic descriptions–or models–can be constructed, which account for the experimental observations.

Of course, the purpose of models, at least for insightful ones, is more than merely providing mechanistic descriptions. As William Bialek writes, "Given a progressively more complete microscopic description of proteins and their interactions, how do we understand the emergence of function?" (Aguera y Arcas et al, 2003). There is therefore some subsequent subtle transition from description to insight, from model to theory, from detailed and specific to simple and general (watch Murray Gell-Mann's TEDTalk on "Beauty and truth in physics").

Theories are elegant.

On the other hand, high-throughput technologies (microarrays, proteomics, metabolomics, ultra high throughput sequencing, etc...) are indeed profoundly changing molecular biology and flooding the field with experimental data like never before. Currently, only part of this data can be explained within the context of mechanistic models. Still, and this is probably Chris Anderson's main point, it turns out that if the data is rich enough, one can exploit it by looking at the data globally, from the 'top', to reveal statistical patterns and correlations. Even if there is no mechanistic explanations (yet) for these correlations, they may reveal new worlds, novel structures and detect relationships between processes that were considered before as unlinked.

Correlations are fascinating.

Correlations resulting from data-driven analysis may well in turn stimulate new mechanistic investigations and hopefully new understanding. On Edge, Sean Carroll summarizes it all: "Sometimes it will be hard, or impossible, to discover simple models explaining huge collections of messy data taken from noisy, nonlinear phenomenon. But it doesn't mean we shouldn't try. Hypotheses aren't simply useful tools in some potentially-outmoded vision of science; they are the whole point. Theory is understanding, and understanding our world is what science is all about."

BUT, what is true for fundamental science is not obligatorily a rule for more applied fields, where the priority might less be on understanding than on acting. In particular, in medically related fields, top-down data-driven correlative approaches represent a pragmatic approach to obtain predictive models without waiting for still elusive fully mechanistic models that would encompass the entire complexity of human physiology (Nicholson, 2006).

As often in science, as in other human activities, different but complementary views are championed by people with different temperaments: there are those who like to build an edifice piece by piece and those who want to explore new territories. I think–I hope–that progresses in systems biology on both fronts, top-down and bottom-up, demonstrates that there is no need to turn this complementarity into an opposition.

June 18, 2008

2007 Impact Factor

The 2007 Impact Factors were published yesterday by Thompson Reuters.

The Impact Factor of Molecular Systems Biology for 2007 is 9.954

This represents a substantial increase over last year's Impact Factor (see chart) and we would like to warmly thank all our authors and reviewers who have contributed to this success. We will continue to work very hard to maintain the high standards of the journal and promote innovative and insightful research in systems biology.

The significance of Impact Factors suffers from intrinsic limitations (see Ian's post) and interpretation of this metric is subject to much discussion (Rossner et al 2007, Thompson's Citation Impact Forum). These and other questions related to bibliometrics are also currently debated at the Nature Network Citation in Science group.

May 20, 2008

Google Health, Biomedical Mutual Organizations and Open Consent

GoogleHealth.jpg Google Health, the new service offered by Google is now online (via bbgm, Life as a Healthcare CIO, GTO). This service helps users to store, organize and share their health profile and medical records, to use a variety of health-related online services and to search for medical information. Understandably, Google places great emphasis on data security and confidentiality. In this regard, I thought it might be worth highlighting several recent and thought-provoking discussions around the issues of data privacy and participative medical investigations.

In a provocative editorial (Bains, 2007, see also Nature Medicine News article), William Bains advocates that collectives of individuals, so-called 'Biomedical Mutual Organization', could organize themselves on a voluntary and self-funded basis to conduct clinical trials that would rely on extensive self-experimentation, data sharing and pooling of analytical resources. This proposal challenges the classical view that those who conduct a clinical trial should avoid conflicts of interest with respect to the outcome of the trial. On the other hand, Bains argues, this system would allow more innovative and radical trials to be performed, given that the subjects of the trial would have increased trust in the research process (being their own trial managers) and, hopefully, a more accurate perception of the risk/benefit balance involved.

Another radical proposal is the concept of 'open-consent' as currently applied within George Church's Personal Genome Project (Church, 2005). Jeantine Lunshof, George Church and colleagues highlight in a recent review (Lunshof et al, 2008) the limitations of the current definitions of genetic privacy and confidentiality in view of the rapid advances in the fields of human genetics and personal genomics. In particular, the creation of large database interlinking individual genome-wide genotypes to extensive phenotypic profiles will make de-identification of such datasets increasingly difficult if not impossible (Lowrance and Collins, 2007). Under these conditions, it appears that the promise of absolute anonymity and confidentiality of private data is becoming unrealistic. Church and colleagues affirm that an 'open-consent' policy would avoid making such false promises and would therefore represent a more realistic way to formulate an adequately informed consent when accepting to participate to a human genomic research study.

At last month's ESF Conference on Systems Biology, Hiroaki Kitano discussed the potential of multi-component, combinatorial therapies (see also Kitano, 2007). He introduced the tentative idea of an 'Open Pharma' strategy, which would attempt to exploit beneficial synergistic effects that may result from combined administration of cheap generic drugs. He envisions that this type of approach could ultimately lead the way to novel and hopefully more affordable therapeutic strategies, which would provide a potential alternative to the current single-target proprietary drug paradigm.

Observing the launch of Google Health within the context of this series of rather revolutionary proposals, it is tempting to imagine for a moment what would result from large-scale self-experimentation with multi-component generic drug cocktails combined with web-enabled data sharing under some form of open-consent... Will 'Participative Open Pharma' be our future?

April 29, 2008

Rewiring E. coli transcriptional network

Research highlight by Kazuharu Arakawa and Masaru Tomita, Institute for Advanced Biosciences, Keio University, Japan

MSB Research HighlightsGene duplications and mutations are central driving forces in the evolution of genomes. Genomes must be robust to such changes in order to be evolvable, and many studies have probed genome robustness using systematic gene knockouts or overexpression experiments. In a recent paper, Isalan et al. (2008) took a new approach to test the robustness of Escherichia coli gene circuitry by reconstructing gene duplication events by shuffling the promoter-ORF pairs for about 300 transcription factors and introducing 598 recombined pairs one-by-one into E. coli to rewire its transcriptional network. Surprisingly, ~95% of such additions are robustly tolerated, and some networks even exhibit greater fitness under various selection pressures. Moreover, the study shows that, in contrast to naive expectations, the introduction of positive or negative feedback loops has little effect on the protein expression levels of regulated ORFs.

Since radical rewiring of the gene circuitry appears to have only a limited impact on expression levels, this work suggests that gene regulatory networks are highly dynamic and underscores the potential importance of post-transcriptional mechanisms for the robustness of transcriptional regulation. Moreover, this work illustrates the fundamental robustness and evolvability of gene regulatory networks, which is reassuring news for synthetic biology.


Isalan M, Lemerle C, Michalodimitrakis K, Horn C, Beltrao P, Raineri E, Garriga-Canut M, Serrano L (2008) Evolvability and hierarchy in rewired bacterial gene networks. Nature 452:840

April 21, 2008

ESF-UB Conference on Systems Biology

santfeliu1.jpgThe ESF meeting on Systems Biology, organized by Luis Serrano and Ruedi Aebersold, took place last week in Sant Feliu de Guixols, Spain. A lovely location (I took this picture with my iSight directly from my room...) for a small conference with a list of outstanding speakers. Together with the influence of the Mediterranean-Latin 'cultural jet lag' (understand: go to bed very very very very late), the stage was set for intense networking among the participants.

The meeting had a broad scope, and I think that the organizers did a very good job in covering the diversity of the field, form quantitative biology and mathematical modeling to network biology, large-scale phenotyping and synthetic biology. Even if I cannot summarize all the talks, here are some general impressions on some of the directions.

First, the 'systematic' branch of systems biology appears to be extending progressively to the cellular level, thanks to progresses in high-throughput imaging techniques and expression systems applied to mammalian systems. For example, large-scale sub-cellular (co-)localization of proteins are used to help deduce extensive maps of molecular interactions that underly the biological function of an organelle (Anthony Hyman), while the analysis of cell-to-cell variability in morphological or other cellular-level features reveals effects that would otherwise be undetectable (Lucas Pelkmans).

At the molecular level, the analysis of large biological networks (transcriptional, Luis Serrano; protein-protein interactions, Marc Vidal) is now progressing towards a large-scale analysis of the impact of perturbations of specific interactions ('edges') rather than the more conventional approach of looking at the absence/presence of individual 'nodes'. This emphasis on 'edges' is further illustrated by efforts in increasing the resolution of protein-protein interaction networks to the level of individual protein domains (Anthony Hyman, Marc Vidal).

The roles and consequences of biochemical interactions are seen somewhat differently by those who study quantitatively signal transduction mechanisms. There, great emphasis was put on the fact that seemingly simple biochemical interactions can result in surprisingly rich spatial and temporal behaviors (Boris Kholodenko) and that considerations of these dynamical aspects are crucial to provide fundamental mechanistic insights into the functions performed by signaling systems. As an example, the quantitative analysis of NF-kappaB signaling dynamics reveals that a sophisticated temporal code is used to discriminate between a variety of stimuli to achieve a stimulus-specific transcriptional response (Alexander Hoffmann).

Clearly, significant efforts remain to bridge large-scale 'systematic' systems biology to its small-scale 'quantitative' branch and one may at first wonder whether these two visions belong to the same field. A recurrent and potentially unifying theme was however that both approaches attempt to understand the relationship linking a biological function to the components of the system that performs this function. As nicely formulated by Tony Hyman, one of the key problems in (systems) biology is to understand how 'individuals' contribute to a 'collective behaviour' (Denis Noble also notes that the 'collective behaviour' can impact on the properties of 'individuals'). This view of systems biology has the advantage that it provides a similar objective for research applied at various scales (eg a cell, an organelle, a signaling pathway, a protein complex) without imposing arbitrary constraints in terms of experimental or computational approaches.

Engineering of biological systems able to perform a human-specified function is intimately related to advances in systems biology. An example of how system-level engineering is pushed to the limits was illustrated by Ron Weiss, who is progressively implementing cell-cell communication, information processing, and cell differentiation control circuits into mammalian stem-cells to ultimately enable rational 'programmed tissue engineering'. But these types of extremely complex circuits currently require enormous efforts and a major emphasis is to develop tools that allow proper engineering practice in biology. Such efforts are the most advanced for systems hosted in bacteria and Adam Arkin provided some spectacular examples of modular design and illustrated how well designed circuits (eg oxygen sensing module from a tumour-invading bacteria) can be rapidly re-used to enormously shorten the development time required to engineer new functions (eg artificial blood cell), without eternal tweaking and tuning.

On a more frivolous note, it did not take us too many glasses of wine at dinner, to start speculating with Hiroaki Kitano about mixing the Robocup and iGEM competitions to create a new 'bio vs nanomachine' league that would let nano-robots play against engineered microorganisms. As I said, we may not have had always enough sleep...

March 13, 2008

Contrasts: Craig Venter and NSABB on synthetic biology

 Craig Venter: On the verge of creating synthetic life Two rather contrasting videos on synthetic biology this month. In the first videocast, released by TED, Craig Venter exposes his grand vision of synthetic genomics. He insists on the notion of 'combinatorial genomics', that will combine the power of large scale DNA synthesis ('robots that can make a million chromosomes a day') with a database of 20 million genes, 'the design components of the future'. This approach, a pragmatic mixture of rational function-oriented design and empirical large-scale selection, is envisioned to prepare a modern 'Cambrian explosion' of new synthetic species. It is good to see Craig Venter laughing when announcing casually the 'modest goal of replacing the entire petro-chemical industry'. In any case, Craig Venter appears to be more concerned that the technology may not develop sufficiently rapidly to match the urgency and scale of the major ecological and medical challenges faced by our planet than by potential threats represented by harmful biohacking and bioterror.

webcast of the NSABB Meeting, Day 1The second video, admittedly less entertaining, is a recording of the recent deliberations of the National Science Advisory Board for Biosecurity (NSABB). In his presentation entitled 'Assessing Biosecurity Concerns Related to Synthetic Biology', David Relman presents some preliminary findings and recommendations of the Working Group on Synthetic Genomics (jump to 1hr:34min:37sec). It is interesting to see that no consensus definition of synthetic biology exists among the various practitioners of the field, who all use different blends of the typical bottom-up engineering approach assembling circuits from standard components and top-down strategy, based on the modifications of existing genomes. Beyond the lack of definition, the current ability to predict biological functions from sequence (eg virulence) remains very limited complicating the possibility of realistic risk assessment. Finally, the development of synthetic biology can be seen as an extension of the success of 'kit-based' molecular biology, which facilitates access of these technologies to groups outside the traditional Life Sciences communities and institutions, making the mission of oversight, outreach and eduction more challenging. David Relman also clearly emphasizes the importance of not discouraging the enthusiasm directed towards potentially beneficial research and applications by overzealous oversight and regulations.

The intersection between the two talks above was perhaps made when the question of virulence was raised (jump to 1hr:59min:35sec). The fraction of pathogenic agents is very small compared to the number of existing species, a point also made by Craig Venter, and the rate of appearance of new pathogens is low. The idea was then raised as whether it would be possible to roughly estimate the risk of creating synthetic pathogens by calculating the likelihood that the amount of natural recombination responsible for the emergence of new pathogens 'in the wild' could be matched by an equivalent amount of experimental recombination in the laboratory. In other words, is there any way to estimate the probability that new forms of virulence could emerge from the announced synthetic 'Cambrian explosion'?