Send your posts to emailaddress.jpg

Subscribe

About Databases

This page contains an archive of all entries posted to The Seven Stones in the Databases category. They are listed from oldest to newest.

Data integration is the previous category.

Modeling is the next category.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
embo_logo.gif npg_logo.gif
Powered by
Movable Type 3.33

Main

Databases Archives

March 3, 2008

Less papers to read, more data to use...

In a nice post at bbgm, Deepak writes:

...historical online literature lacks the relevant structure and metadata to make our task easier, but it is time that publishers thought ahead about some of the advantages of online publishing.

thumb080303.jpg I can't agree more. I heard sometimes the claim that within 5-10 years, more than 95% of the scientific literature is going to be read by computers only. Possible. However, the converse alternative might be interesting to consider: what if 95% of scientific papers could be 'written' by computers? Even if this formulation is obviously provocative and unrealistic, the point is that harnessing the 'network effect' of the web may have two complementary components, one community- the other computer-driven. On one hand, web 2.0 functionalities enable community-driven commenting, rating and even writing of scientific publications. On the other hand, semantic web technologies are expected to facilitate computer-driven integration of scientific data from multiple sources, which is likely to play an increasingly important role in science. Rather than mining thousands of unread papers, the scientist of the future may rather search the web for relevant data first and integrate it to generate – or 'write' – novel insight. In fact, integration of large datasets already represents a major field of research in systems biology (see Chuang et al 2007, Xue et al 2007 or Mani et al 2008 as recent examples published in Mol Syst Biol).

It seems thus that, in addition of being web 2.0 enabled, new publishing models should 'embed' more structured data into online publications. In short, 'papers' could progressively transform into hybrid online objects that resemble more to database records (see Timo Hannay's post on this topic) or highly structured documents. At the extreme, one could even imagine to publish 'naked' datasets, without any 'stories' around them. Of course, efficient data integration will require the data to be in a standard and structured format and its quality will have to be well characterized. These are all far from trivial qualities.

The good old-fashioned papers are probably not going to disappear as publication units, in particular for high-impact studies reporting novel and deep insights. It is also not the point here to propose dumping every scientist's hard drive into the web. Data-rich publications would be published only when the authors would feel it appropriate. There might thus be some equilibrium to find between papers that will never be read except by a text mining engine and pure datasets, published as a resource, easier to search, to mine and to integrate. This dialectic may ultimately boil down to the issue of how well will text mining and data integration technologies perform in the future.

In any case, within the context of the current debate about the saturation of the peer-review system, I wonder whether a data-centric form of scientific publishing could help to release somewhat the pressure. Reviewing of datasets might be quicker and could rely more on standardized evaluation parameters. If assorted with proper credit attribution mechanisms and metrics of impact, data-rich (or even data-only) publications may represent an alternative model complementing the traditional 'paper' format. It would prevent the loss of useful data otherwise buried in verbal descriptions and, most importantly, would hopefully stimulate web-wide integration of disparate datasets.

January 11, 2008

Consumer Health Information Technology

Play video I highly recommend to visit the NIH VideoCasting page, which hosts many interesting video/podcasts. Even if I realize that this is a bit old according to the blogosphere time scale, I would like to point to this one: "The Future: Consumer Health Information Technology", featuring talks given at a NCI-sponsored meeting on Dec 10, 2007 by Adam Bosworth (formerly "Google Health architect", now starting his own company Keas), Bern Shen (Intel) and Bill Crounse (Microsoft).

In his introduction to the meeting, Bradford Hesse (NCI) colorfully summarizes one of the main concepts exposed by the speakers (the video is very long, so I give some pointers: 0h16min43sec) by comparing the future of healthcare to...an "IKEA flat pack": patients will progressively be empowered to assemble their own care from home, like they would build a piece of (cheap) furniture.

Adam Bosworth (0h25min53sec) presents his very pragmatic vision of how IT could concretely help healthcare (0h39min07sec): a) help the consumer to own and control his personal health data, and this already for very simple basic information; b) provide tools for doctors so that they can deliver personalized care as easily as producing a spreadsheet; c) develop tools for researchers to facilitate the design and implementation of new protocols and clinical trials.

Bill Crounse (Microsoft's other Bill...1h14min30sec) sees 5 major current trends that will increasingly challenge the healthcare system and call for IT solutions (1h26min22sec): a) increasing personal responsibility ("the end of health insurance"); b) progressive "retailization" of healthcare services (eg appearance of "retail minute clinics"); c) commoditization of healthcare providers; d) globalization of access to information (through the web of course); e) globalization of healthcare services. I recommend his little funny anecdote on the high-tech GPS wireless-connected plumber (1h25min30sec) who appears to better equipped than any practicing physician...

The speakers also all insist on the need for massive data integration promoted by the interoperability of formats and coding information, themes that probably sound familiar to many systems biologists.

Toward the end of his talk (1h35min00sec), Bill Crounse shows a short "science-fiction" movie on Microsoft's vision of the future of healthcare: a world full of credit-card sized tablet PCs, touch screens and many other very exciting gadgets (I love gadgets!). But I can't help missing a bit the warmth of human-to-human interactions within this jungle of virtual consultations, retail clinics, remote controlled metabolic parameters, etc... and I didn't quite see in that movie that the doctor would spend more time with his patient or the daughter with her sick Grandma. But this may of course only reflect some old-fashioned side of my temperament...

November 20, 2007

Personal genomics for a fistful of dollars

The wave of personal genomics is progressing rapidly. A string of four papers appeared recently (Porreca et al, 2007, Albert et al, 2007, Okou et al 2007, Hodges et al, 2007) reporting on microarrray-based technologies that enable the enrichment of selected genomic fragments in a single massively multiplexed reaction, thus greatly facilitating subsequent resequencing of pre-defined portions of the human genome (eg all coding exons). These technologies are expected to reduce dramatically the cost of targeted resequencing of individual genomes.

On the commercial front, deCODE and 23andMe have launched their personal genome service offering genome-wide SNPs profiling for a little less than $1,000 (NYT articles: Nicholas Wade, Amy Harmon, or Wired, ScienceRoll, Sandra, DNA and You).

The chips used by 23andMe are the "Illumina HumanHap550+ BeadChip, which reads more than 550,000 SNPs (single nucleotide polymorphisms) plus a 23andMe custom-designed set that analyzes more than 30,000 additional SNPs." The profile provided by deCODEme includes "over one million variants across the genome."

So what do you think?

May 5, 2007

Semantic zooming of networks

One can only agree with Euan Adie, that "the way we present genomic and proteomic data on the web sucks" (read post on Nascent). And this holds for biological networks: depiction of protein-protein interactions as colorful hairballs results in impressive figures but is not obligatorily very useful. While the network representation is a powerful abstract representation of biological processes, it is trivial to say that a graph (with its jungle of nodes and edges) is far from resembling even remotely to an actual living cell as you see it under the microscope... In the crude visualization of biological process as simple graphs, space, time, multi-scale structure and biological context are missing.

Charles DeLisi makes an attempt to tackle the problem of visualization of complex mutli-scale biological networks by introducing the use of metagraphs (Hu et al, 2007, Nature Biotech 25:547). Metagraphs have so-called metanodes in addition to simple nodes. A metanode contains a subgraph composed of child (meta)nodes, which are revealed only when the metanode is in its "expanded" state. Edges link simple nodes while metaedges link "contracted" metanodes and are inferred from the links carried by nodes of the underlying subgraph. A key distinctive feature of metagraphs is that several instances (carrying different "labels") of a node can be shared between distinct metanodes (eg when a protein belongs to different complexes).

Metanodes can represent directly the multi-scale modular hierarchy of a network, incorporate biological context (eg sets of proteins sharing the same GO annotation) or even represent groups of orthologous genes. With this representation, implemented in the software VisANT (http://visant.bu.edu/), "semantic zooming" into the network is made possible. This would be similar to zooming into a Google Map, when not only the scale of the map changes but also the resolution of the labels and various abstract annotations, as is best seen using the "hybrid" mode superposing annotations with the satellite picture.

This analogy with Google Map illustrates also the limits of the current network representation as "maps" of cellular processes. There is still a long way until the graphs representing biological networks can really be mapped onto cellular structures to result into better visualization tools but also into more realistic computational models of the whole cell. In a sense, a "Google Cell" should also have a "hybrid" mode, where the abstract representation can be superposed onto the "satellite image" version of the biological object visualized. As if little tiny networks would be folded inside each voxel of a 3D full reconstruction of a cell, such as the one recently published by Antony and colleagues (Höög et al, 2007, see post). Something like integrating interaction networks, "ORFeome"-like datasets and electron tomography...

February 2, 2007

Functional genomics of the neuron

Several recent publications seem to give a clear signal that the time has come for a functional genomic approach of key neuronal functions, such as neuronal differentiation or synaptic plasticity.

  • The Allen Institute for Brain Science in Seattle has completed the Allen Brain Atlas (Lein et al, 2007, see also our N&V by Sebastian Jessberger and Fred H Gage), cataloging the expression patterns of 20'000 genes in serial in situ hybridization sections and providing an exemplary web interface to query and retrieve the information
  • Neurons are notoriously difficult cultivate and transfect, making it difficult to probe gene function in a high-thgoughput fashion. Michael Greenberg describes in Neuron (Paradis et al, 2007) the results of a systematical RNAi screen to evaluate the function of roughly 150 genes in synapse formation.
  • Aplysia californica has been extremely useful as a model in identifying the signaling processes underlying synaptic plasticity and delineating the molecular mechanisms involved in learning and memory. The laboratory of Eric Kandel at Columbia University has now characterized the neuronal transcriptome of Aplysia (Moroz et al, 2006) not only in several distinct ganglia but also in individual identified neurons and even in neuronal processes from cells known to support local protein synthesis at their synaptic terminals.