Send your posts to emailaddress.jpg

Subscribe

About March 2008

This page contains all entries posted to The Seven Stones in March 2008. They are listed from oldest to newest.

February 2008 is the previous archive.

April 2008 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
embo_logo.gif npg_logo.gif
Powered by
Movable Type 3.33

« February 2008 | Main | April 2008 »

March 2008 Archives

March 13, 2008

Contrasts: Craig Venter and NSABB on synthetic biology

 Craig Venter: On the verge of creating synthetic life Two rather contrasting videos on synthetic biology this month. In the first videocast, released by TED, Craig Venter exposes his grand vision of synthetic genomics. He insists on the notion of 'combinatorial genomics', that will combine the power of large scale DNA synthesis ('robots that can make a million chromosomes a day') with a database of 20 million genes, 'the design components of the future'. This approach, a pragmatic mixture of rational function-oriented design and empirical large-scale selection, is envisioned to prepare a modern 'Cambrian explosion' of new synthetic species. It is good to see Craig Venter laughing when announcing casually the 'modest goal of replacing the entire petro-chemical industry'. In any case, Craig Venter appears to be more concerned that the technology may not develop sufficiently rapidly to match the urgency and scale of the major ecological and medical challenges faced by our planet than by potential threats represented by harmful biohacking and bioterror.

webcast of the NSABB Meeting, Day 1The second video, admittedly less entertaining, is a recording of the recent deliberations of the National Science Advisory Board for Biosecurity (NSABB). In his presentation entitled 'Assessing Biosecurity Concerns Related to Synthetic Biology', David Relman presents some preliminary findings and recommendations of the Working Group on Synthetic Genomics (jump to 1hr:34min:37sec). It is interesting to see that no consensus definition of synthetic biology exists among the various practitioners of the field, who all use different blends of the typical bottom-up engineering approach assembling circuits from standard components and top-down strategy, based on the modifications of existing genomes. Beyond the lack of definition, the current ability to predict biological functions from sequence (eg virulence) remains very limited complicating the possibility of realistic risk assessment. Finally, the development of synthetic biology can be seen as an extension of the success of 'kit-based' molecular biology, which facilitates access of these technologies to groups outside the traditional Life Sciences communities and institutions, making the mission of oversight, outreach and eduction more challenging. David Relman also clearly emphasizes the importance of not discouraging the enthusiasm directed towards potentially beneficial research and applications by overzealous oversight and regulations.

The intersection between the two talks above was perhaps made when the question of virulence was raised (jump to 1hr:59min:35sec). The fraction of pathogenic agents is very small compared to the number of existing species, a point also made by Craig Venter, and the rate of appearance of new pathogens is low. The idea was then raised as whether it would be possible to roughly estimate the risk of creating synthetic pathogens by calculating the likelihood that the amount of natural recombination responsible for the emergence of new pathogens 'in the wild' could be matched by an equivalent amount of experimental recombination in the laboratory. In other words, is there any way to estimate the probability that new forms of virulence could emerge from the announced synthetic 'Cambrian explosion'?

March 3, 2008

Less papers to read, more data to use...

In a nice post at bbgm, Deepak writes:

...historical online literature lacks the relevant structure and metadata to make our task easier, but it is time that publishers thought ahead about some of the advantages of online publishing.

thumb080303.jpg I can't agree more. I heard sometimes the claim that within 5-10 years, more than 95% of the scientific literature is going to be read by computers only. Possible. However, the converse alternative might be interesting to consider: what if 95% of scientific papers could be 'written' by computers? Even if this formulation is obviously provocative and unrealistic, the point is that harnessing the 'network effect' of the web may have two complementary components, one community- the other computer-driven. On one hand, web 2.0 functionalities enable community-driven commenting, rating and even writing of scientific publications. On the other hand, semantic web technologies are expected to facilitate computer-driven integration of scientific data from multiple sources, which is likely to play an increasingly important role in science. Rather than mining thousands of unread papers, the scientist of the future may rather search the web for relevant data first and integrate it to generate – or 'write' – novel insight. In fact, integration of large datasets already represents a major field of research in systems biology (see Chuang et al 2007, Xue et al 2007 or Mani et al 2008 as recent examples published in Mol Syst Biol).

It seems thus that, in addition of being web 2.0 enabled, new publishing models should 'embed' more structured data into online publications. In short, 'papers' could progressively transform into hybrid online objects that resemble more to database records (see Timo Hannay's post on this topic) or highly structured documents. At the extreme, one could even imagine to publish 'naked' datasets, without any 'stories' around them. Of course, efficient data integration will require the data to be in a standard and structured format and its quality will have to be well characterized. These are all far from trivial qualities.

The good old-fashioned papers are probably not going to disappear as publication units, in particular for high-impact studies reporting novel and deep insights. It is also not the point here to propose dumping every scientist's hard drive into the web. Data-rich publications would be published only when the authors would feel it appropriate. There might thus be some equilibrium to find between papers that will never be read except by a text mining engine and pure datasets, published as a resource, easier to search, to mine and to integrate. This dialectic may ultimately boil down to the issue of how well will text mining and data integration technologies perform in the future.

In any case, within the context of the current debate about the saturation of the peer-review system, I wonder whether a data-centric form of scientific publishing could help to release somewhat the pressure. Reviewing of datasets might be quicker and could rely more on standardized evaluation parameters. If assorted with proper credit attribution mechanisms and metrics of impact, data-rich (or even data-only) publications may represent an alternative model complementing the traditional 'paper' format. It would prevent the loss of useful data otherwise buried in verbal descriptions and, most importantly, would hopefully stimulate web-wide integration of disparate datasets.