Send your posts to emailaddress.jpg

Subscribe

About Systems Medicine

This page contains an archive of all entries posted to The Seven Stones in the Systems Medicine category. They are listed from oldest to newest.

Synthetic is the previous category.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
embo_logo.gif npg_logo.gif
Powered by
Movable Type 3.33

Main

Systems Medicine Archives

August 18, 2008

SciFoo: scientific fireworks

In his list of eight 'generative' values (Better Than Free), Kevin Kelly includes 'embodiment'–the actual physical realization of an item or event which could be otherwise freely distributed over the web. While we are all 'hyperlinked' on the Internet, the value of those unique qualities that cannot be generated or "copied" on the web is dramatically increased. The type of intense emulation and shared excitement sparked at the recent Science Foo Camp (SciFoo 2008), organized by Nature, Google and O'Reilly, gave a wonderful example of the unique value of direct human exchange during an exclusive event bringing together roughly 200 top scientists, 'geeks' and other technologists at the Googleplex in Mountain View, California.

SciFoo is a so-called 'unconference': there is no program or more precisely, as Timo Hannay explained during the opening of the conference, the attendees are the 'program'. The actual schedule was defined only on the first evening in a purposefully chaotic process by anyone who wished to organize a session on any topic. For the next two days, in a festival of parallel sessions, astrophysicists, 'googlers', technologists, molecular biologists, taxonomists, game designers, flying car constructors, publishers, thinkers and (some) dreamers discussed and exchanged ideas with great enthusiasm and a rare intensity and openness.

Needless to say that deciding which session to attend was close to impossible... In any case, I ended up following three types of talks: a series on systems biology related topic (data integration, machine learning, personal genomics, baroque structure of the transcribed genome), several (of many) sessions focused on the theme of open data/science and finally some more eclectic sessions (only from my standpoint, of course) on diverse topics such as the foundations of the concept of time in physics, on some demonstration of very simple yet powerful Python scripting exercises to analyze text and the potential of game design to harness our 'cognitive surplus'. I cannot possibly summarize all the talks, interactions and impressions gathered at this meeting, but here are a few subjective excerpts:

  • There were quite a few sessions on open science and open data. Ernst Hafen made a strong case for the need of a unique AuthorID that would help in tracking the multiple aspects of researchers' scientific activities. With regard to data, Google announced that a new service will soon be launched, Google Research Datasets, offering to host, for free, large datasets of any type. The service will allow inclusion of some minimal meta-data about the submitted datasets and will provide a mechanism to define a delay before the dataset is made publicly visible. This will probably become a very simple and convenient way for storing data (in particular if a useful API is developed), so convenient in fact, that we may have to be a little careful that it will not turn into a temptation to bypass the 'minimal information...' standards usually required by traditional public databases.
  • George Church provided an overview of the Personal Genome Project (PGP) and described the type of biological data that will be integrated with the genomic and genetic information collected from consenting PGP volunteers: analysis of the transcriptome of pluripotent stem cells derived from the subjects; sequence of the repertoire of recombined V-D-J regions in immune cells ('VDJome') to exploit correlations between given V-D-J sequences and antigen-specific stimulations; characterization of the microbiome used as a tracer of the environmental and physiological conditions; record of phenotypic traits and disease conditions using controlled vocabularies. Finally, George also emphasized the exponentially decreasing cost of sequencing, which will not only make large scale sequencing of full personal genomes feasible but will also potentially open entire new fields of applications based on massive DNA sequencing.
  • Lee Smolin talked about the nature of the concept of time in physics and investigated the question of whether our perception of time as the 'experience of successive present moments' is 'real' or, alternatively, an emergent property of the laws of physics. I cannot pretend I followed the entire argument, but I learned that the mathematical representation of the physical reality involves the geometrization of time (as one of the state space's dimensions), leading in fact to a representation devoid of temporal flow (somehow the clock has to be outside the system). To this geometrical representation, physical laws are associated and applied to initial conditions. If I did not misunderstand it, it appears that this approach used in physics might have to be considered as approximative because it may only be valid for subsystems of the universe whereas it might not be appropriate for a true cosmological theory of the entire universe, with possibly disturbing consequences on the nature of physical laws...
  • Believe it or not but music can be 'geekified' as well: Chris diBona, later in the evening, brought his tenori-on for a fun demonstration. I want one of those!

The meeting ended with some final scientific fireworks, when some of the speakers gave a series of brilliant 2 min summary talks, providing a colorful overview of the many sessions we inevitably had missed. I have to admit that I like fireworks and I would certainly have enjoyed having a little more of this final kaleidoscopic view of science. Clearly, the authentic value of this conference lies in the unique and direct human interactions, but I wish there would be nevertheless some way–perhaps by using this last session in some form of outreach action–to disseminate this pure joy of scientific diversity and curiosity to a broader audience.

Credits: illustrations from Bob Lee, Flickr, some rights reserved

July 10, 2008

Fascinating correlations or elegant theories?

Chris Anderson, Editor-in-Chief of Wired , wrote a few weeks ago a provocative piece "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete", arguing that in our Google-driven data-rich era ("The Petabyte Age") the good old "approach to science —hypothesize, model, test — is becoming obsolete", leaving place to a purely correlative vision of the world. There is a good dose of provocation in the essay and it was quite successful in spurring a flurry of skeptical reactions in the blogosphere, FriendFeed-land and lately in Edge's Reality Club.

I know that it is a bit late to write a post on this but this debate reminds me of the bottom-up vs top-down dialectic in (systems) biology. The tradition in molecular biology has been to focus on molecular mechanisms–a series of molecular events–that explain given biological functions. With detailed knowledge on the properties of an increasing number of components, bottom-up mechanistic descriptions–or models–can be constructed, which account for the experimental observations.

Of course, the purpose of models, at least for insightful ones, is more than merely providing mechanistic descriptions. As William Bialek writes, "Given a progressively more complete microscopic description of proteins and their interactions, how do we understand the emergence of function?" (Aguera y Arcas et al, 2003). There is therefore some subsequent subtle transition from description to insight, from model to theory, from detailed and specific to simple and general (watch Murray Gell-Mann's TEDTalk on "Beauty and truth in physics").

Theories are elegant.

On the other hand, high-throughput technologies (microarrays, proteomics, metabolomics, ultra high throughput sequencing, etc...) are indeed profoundly changing molecular biology and flooding the field with experimental data like never before. Currently, only part of this data can be explained within the context of mechanistic models. Still, and this is probably Chris Anderson's main point, it turns out that if the data is rich enough, one can exploit it by looking at the data globally, from the 'top', to reveal statistical patterns and correlations. Even if there is no mechanistic explanations (yet) for these correlations, they may reveal new worlds, novel structures and detect relationships between processes that were considered before as unlinked.

Correlations are fascinating.

Correlations resulting from data-driven analysis may well in turn stimulate new mechanistic investigations and hopefully new understanding. On Edge, Sean Carroll summarizes it all: "Sometimes it will be hard, or impossible, to discover simple models explaining huge collections of messy data taken from noisy, nonlinear phenomenon. But it doesn't mean we shouldn't try. Hypotheses aren't simply useful tools in some potentially-outmoded vision of science; they are the whole point. Theory is understanding, and understanding our world is what science is all about."

BUT, what is true for fundamental science is not obligatorily a rule for more applied fields, where the priority might less be on understanding than on acting. In particular, in medically related fields, top-down data-driven correlative approaches represent a pragmatic approach to obtain predictive models without waiting for still elusive fully mechanistic models that would encompass the entire complexity of human physiology (Nicholson, 2006).

As often in science, as in other human activities, different but complementary views are championed by people with different temperaments: there are those who like to build an edifice piece by piece and those who want to explore new territories. I think–I hope–that progresses in systems biology on both fronts, top-down and bottom-up, demonstrates that there is no need to turn this complementarity into an opposition.

May 20, 2008

Google Health, Biomedical Mutual Organizations and Open Consent

GoogleHealth.jpg Google Health, the new service offered by Google is now online (via bbgm, Life as a Healthcare CIO, GTO). This service helps users to store, organize and share their health profile and medical records, to use a variety of health-related online services and to search for medical information. Understandably, Google places great emphasis on data security and confidentiality. In this regard, I thought it might be worth highlighting several recent and thought-provoking discussions around the issues of data privacy and participative medical investigations.

In a provocative editorial (Bains, 2007, see also Nature Medicine News article), William Bains advocates that collectives of individuals, so-called 'Biomedical Mutual Organization', could organize themselves on a voluntary and self-funded basis to conduct clinical trials that would rely on extensive self-experimentation, data sharing and pooling of analytical resources. This proposal challenges the classical view that those who conduct a clinical trial should avoid conflicts of interest with respect to the outcome of the trial. On the other hand, Bains argues, this system would allow more innovative and radical trials to be performed, given that the subjects of the trial would have increased trust in the research process (being their own trial managers) and, hopefully, a more accurate perception of the risk/benefit balance involved.

Another radical proposal is the concept of 'open-consent' as currently applied within George Church's Personal Genome Project (Church, 2005). Jeantine Lunshof, George Church and colleagues highlight in a recent review (Lunshof et al, 2008) the limitations of the current definitions of genetic privacy and confidentiality in view of the rapid advances in the fields of human genetics and personal genomics. In particular, the creation of large database interlinking individual genome-wide genotypes to extensive phenotypic profiles will make de-identification of such datasets increasingly difficult if not impossible (Lowrance and Collins, 2007). Under these conditions, it appears that the promise of absolute anonymity and confidentiality of private data is becoming unrealistic. Church and colleagues affirm that an 'open-consent' policy would avoid making such false promises and would therefore represent a more realistic way to formulate an adequately informed consent when accepting to participate to a human genomic research study.

At last month's ESF Conference on Systems Biology, Hiroaki Kitano discussed the potential of multi-component, combinatorial therapies (see also Kitano, 2007). He introduced the tentative idea of an 'Open Pharma' strategy, which would attempt to exploit beneficial synergistic effects that may result from combined administration of cheap generic drugs. He envisions that this type of approach could ultimately lead the way to novel and hopefully more affordable therapeutic strategies, which would provide a potential alternative to the current single-target proprietary drug paradigm.

Observing the launch of Google Health within the context of this series of rather revolutionary proposals, it is tempting to imagine for a moment what would result from large-scale self-experimentation with multi-component generic drug cocktails combined with web-enabled data sharing under some form of open-consent... Will 'Participative Open Pharma' be our future?

February 21, 2008

Top-down mapping of gene regulatory pathways

Trey Ideker videoIn a very recent lecture (see full video from NIH VideoCasting) given for the NIH Systems Biology Special Interest Group, Trey Ideker presents a great overview of the various strategies his group has been developing in the recent years in order to integrate multiple types of large scale datasets. While one of the most pervasive 'meme' about high-throughput measurement is that they are "notoriously unreliable" (see Hakes et al, 2008, for a recent example), Trey beautifully illustrates how predictive computational models and novel biological insights can be generated by sophisticated data integration strategies. Three types of applications are presented in his talk:

  1. mapping of transcriptional response pathways
  2. functional mapping of protein complexes
  3. disease diagnosis and stratification

In the last section, Trey presents the study recently published in Molecular Systems Biology (Chuang et al, 2007, video: 00hr:39min:15sec) where the information provided by microarray expression profiling is superposed to a protein-protein physical interaction network to identify 'subnetwork' biomarkers that classify metastatic vs non-metastatic breast tumors.

January 11, 2008

Consumer Health Information Technology

Play video I highly recommend to visit the NIH VideoCasting page, which hosts many interesting video/podcasts. Even if I realize that this is a bit old according to the blogosphere time scale, I would like to point to this one: "The Future: Consumer Health Information Technology", featuring talks given at a NCI-sponsored meeting on Dec 10, 2007 by Adam Bosworth (formerly "Google Health architect", now starting his own company Keas), Bern Shen (Intel) and Bill Crounse (Microsoft).

In his introduction to the meeting, Bradford Hesse (NCI) colorfully summarizes one of the main concepts exposed by the speakers (the video is very long, so I give some pointers: 0h16min43sec) by comparing the future of healthcare to...an "IKEA flat pack": patients will progressively be empowered to assemble their own care from home, like they would build a piece of (cheap) furniture.

Adam Bosworth (0h25min53sec) presents his very pragmatic vision of how IT could concretely help healthcare (0h39min07sec): a) help the consumer to own and control his personal health data, and this already for very simple basic information; b) provide tools for doctors so that they can deliver personalized care as easily as producing a spreadsheet; c) develop tools for researchers to facilitate the design and implementation of new protocols and clinical trials.

Bill Crounse (Microsoft's other Bill...1h14min30sec) sees 5 major current trends that will increasingly challenge the healthcare system and call for IT solutions (1h26min22sec): a) increasing personal responsibility ("the end of health insurance"); b) progressive "retailization" of healthcare services (eg appearance of "retail minute clinics"); c) commoditization of healthcare providers; d) globalization of access to information (through the web of course); e) globalization of healthcare services. I recommend his little funny anecdote on the high-tech GPS wireless-connected plumber (1h25min30sec) who appears to better equipped than any practicing physician...

The speakers also all insist on the need for massive data integration promoted by the interoperability of formats and coding information, themes that probably sound familiar to many systems biologists.

Toward the end of his talk (1h35min00sec), Bill Crounse shows a short "science-fiction" movie on Microsoft's vision of the future of healthcare: a world full of credit-card sized tablet PCs, touch screens and many other very exciting gadgets (I love gadgets!). But I can't help missing a bit the warmth of human-to-human interactions within this jungle of virtual consultations, retail clinics, remote controlled metabolic parameters, etc... and I didn't quite see in that movie that the doctor would spend more time with his patient or the daughter with her sick Grandma. But this may of course only reflect some old-fashioned side of my temperament...

September 7, 2007

How do we get from the Jimome & Craigome to systems biology?

by George M Church, live from the 9th International Meeting on Human Genome Variation and Complex Genome Analysis, Sep 6-8, 2007 in Barcelona.

Although Jim Watson's genome hasn't been through peer review yet, and Craig Venter’s genome doesn’t have a slick web browser like Jim’s genome yet, we’ve seen enough to ask – what next? Someone at the meeting today got some laughs accidentally when they said that they were comparing Craig’s genome to the human genome. Clearly this is a time requiring great caution. So our first question is: where are we with these first two complete diploid genomes? Well, they’re neither complete nor the first. The Craigome has over 4500 gaps (a bit more than the 341 gaps in the haploid 2004 HGP genome). The first human diploid sequence nod goes to the 269 HapMap genomes published in Oct 2005. Nevertheless we now have the first two non-anonymous personal genomes (hopefully millions someday). Oh, and what is it with press-release that our genomes have higher variation than previously thought? The 0.5% variation observed includes a near-perfect fit to the long-known 0.06% SNP frequency, a 0.08% frequency of smaller indels about twice that seen in 330 genes from Seattle studies, and the remainder being copy-number variants (CNV) 87% of which have been described previously. Just like the number of genes in the genome in 2001, the beauty and the news is in the details not in the summary stats.

We can get from genome variations to systems biology “with focused population association studies, animal models, and functional genomics on the cells from the subjects” (Church 2005). To do genome-wide association studies (GWAS), we must ask where the technology costs are leading? Given the drop in price between the arrivals of the two genomes in the NCBI Personal Genomics directory -- Craig on June 27 at a cost of $70M, and Jim nine days later at a cost of $1M, an over-zealous extrapolationist might be disappointed that the $1K genome did not arrive on July 25. Seriously now, the point is that neither study is inexpensive enough to scale to genome-wide association studies. SNP-chips at $250 each are scaleable, but tend to miss new and/or rare SNPs and small indels. Next generation sequencing and short-read-pairs (Shendure et al 2005) may bring down costs by a factor of 10. Read-pairs seem ripe to become the method of choice for CNVs, smaller indels, and even inversions. Enrichment by hybridization for at least one read to be in an exon or cis-regulatory site might bring costs down another factor of 50. Even if these GWAS studies efficiently get us beyond “linked alleles” to “causative alleles”, they will generate gloriously more hypotheses than they test.

So, back to the other routes to systems biology, animal or cell models could be made to test the 4 million variants per genome (and combinations; oh my!) -- clearly indicating a need for automated homologous recombination methods and/or prioritization of these tests using the third route to systems biology -- “personal functional genomics”. Unlike the HapMap genomes, the Jimome and Craigome are not yet accompanied by extensive phenotypic trait data, nor any cell lines to do so. SNPs and CNVs that affect RNA levels have been elegantly mapped by Spielman et al. 2007 and Stranger et al 2007. Most effects map close to the transcription start sites. Assaying RNA by these standard assays or next-generation sequencing (Kim et al. 2007) from individuals enables comparisons of sum of the two allelic expression levels from the two types of homozygotes (AA & aa) and the heterozygote (Aa) in a variety of different genetic backgrounds and cell-states. In contrast, genome-wide, allele-specific RNA assays would measure the expression from each haplotype in a heterozygote under what is the most ideally identical background state arrangeable. The missing technology is one to gain access to all human tissues (since the list of volunteers for brain biopsies is short). Yet another reason that we will be watching for methods to derive pluripotent stem cells from adult human tissues. Personal functional genomics assays on such personal cell lines are likely to arrive much earlier than (indeed pave the way for) therapeutic applications.


Church GM (2005) The Personal Genome Project. Mol Syst Biol 1:2005.0030

IHGSC et al. (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931-945.

International HapMap Consortium. (2005) A haplotype map of the human genome. Nature 437(7063):1299-320.

Kim JB, Porreca GJ, Gorham JM, Church GM, Seidman CE, Seidman JG (2007) Polony multiplex analysis of gene expression (PMAGE) in a mouse model of hypertrophic cardiomyopathy. Science 316(5830):1481-4.

Levy et al. (2007) The Diploid Genome Sequence of an Individual Human. PLoS 5:e254.

Shendure, J, Porreca, GJ, Reppas, NB, Lin, X, McCutcheon, JP, Rosenbaum, AM, Wang, MD , Zhang, K, Mitra, RD, Church, GM (2005) Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome. Science 309(5741):1728-32.

Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG. (2007) Common genetic variants account for differences in gene expression among ethnic groups. Nat Genet. 39(2):226-31.

Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer SW, Tavare S, Deloukas P, Hurles ME, Dermitzakis ET. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 2007 315(5813):848-53.

September 5, 2007

J Craig Venter's Genome

personal-genome.jpg Many others have abundantly commented on the publication of Craig Venter's genome this week in PLOS Biology (Levy et al, 2007). The sequence of his full diploid genome (HuRef) reveals that the degree of genetic variability between maternal and paternal chromosomes is much higher (0.5%) than expected. Part of this variability is due to insertion/deletions (indels represent only 22% of variant events but amount to up to 75% of the variant nucleotides), alterations that are typically missed by SNP genotyping (SNPs represent 78% of the variants). Copy number variations (62, amounting to 10Mb) are also reported, albeit not determined by sequencing but via microarray genomic hybridization. With regard to variability in gene exons, the analysis shows that at least 44% of the genes are in the heterozygous state.

Beside its scientific content, the psychological impact of this study is considerable. The small interactive "toy" map (see illustration, modified from Levy et al, 2007) published on the PLOS Biology website is a particularly strong symbol: the entire genome of an individual human being displayed on a single page! The fact that a genome is a beautiful linear structure and can be displayed in such a simple and compact way, on a single poster, inevitably triggers the reflex to zoom in, focus on a favorite gene and speculate on the resulting phenotype. This almost unavoidable fascination for a linear interpretation of a linear structure (one gene maps to one disease) is illustrated by the many comments on Craig Venter's "genetic destiny", "wet earwax", predispositions (or lack thereof) to "alcoholism, coronary artery disease, obesity, Alzheimer’s disease, antisocial behavior and conduct disorder" and last but not least, to his blue eyes. Even if Venter himself makes it clear that it "is an impressive array of large sets of genes together with environmental conditions that will determine life outcomes" (Anderson Cooper Blog), it remains that it is very hard to visualize this reality in an intuitive way. The PLoS poster shows a linear map, not an intricate probabilistic network of any sort. The educational efforts required to change in the general public the easy linear representation into a more "integrative" view is certainly going to be a major but decisive challenge, if, with the advent of personal genomics, individuals will be expected to exert more control and responsibility over their own health. Will Systems Biology manage to enter in our daily lives?

Of course, this would imply deciphering the multigenic basis of complex human traits in the first place. But it is precisely this lack of current knowledge on the genotype-phenotype relationship which represents one the strongest scientific incentives to sequence many more individual human genomes and correlate them with the respective medical, physiological and environmental parameters. In this regard, the availability of much cheaper and more efficient sequencing technologies (eg. enabling sequencing, within 10 days, of 100 human genomes at 98% completion, 10-5 accuracy and for $10'000 per genome, as challenged by the Archon X prize foundation, or allowing sequencing 1% of the genome for $1000 as in George Church's Personal Exome Project) may well represent an even more revolutionary advance than the first individual human genome published this week. As George Church wrote in his Editorial (The Personal Genome Project, Church (2005), Molecular Systems Biology 2005.0030),

Ready access to highly integrated and comprehensive human genome and phenome data sets is extremely important and increasingly feasible technically [...] As DNA is only a small part of destiny, personal genomics might fruitfully de-emphasize 'prediction' and focus on augmenting systems biology interpretations and prioritizations of actual day-to-day measurements of our physiological states.

May 16, 2007

The Human (Genetic) Disease Network

thumb070516.jpgThe relationship between genetic mutations and human diseases is often complex and ambiguous: a given disease can be associated with mutations in distinct genes and, conversely, mutations in a given gene can be associated with several diseases. Can this many-to-many relationship be exploited to construct a human disease network and extract information on the human disease landscape?

In their work just published in PNAS, Albert-László Barabasi, Marc Vidal and colleagues reconstruct such a "diseasome" network in which disorders are linked to the respective associated disease genes (Goh et al, 2007 PNAS). Two projections of the network are presented: a) the Human Disease Network (HDN), in which diseases are connected to each other if they share a common disease gene; b) the Disease Gene Network (DGN), in which genes are connected if they are associated with a common disease. The HDN has a giant component comprising almost half of the diseases, in which some classes of disorders cluster naturally (eg cancers or neurological disorders, but not metabolic disorders). The DGN, when integrated with functional annotations, expression and protein-protein interaction data, provides a first step towards a "network-based explanation of the emergence of complex polygenic disorders" in the sense that it reveals, perhaps not too surprisingly, how functionally related genes can lead to similar disorders.

The authors also look at the centrality of human disease genes in the protein-protein interaction network. An interesting twist comes when human disease genes are separated into essential and non-essential classes, according to the lethal or non lethal mouse phenotype resulting from the knockout of the respective orthologous genes. While essential genes tend to be associated with hubs in the interactome, disease genes that are non-essential (representing 78% of all disease genes) do not display a higher connectivity than non-disease genes. A somewhat complementary conclusion was recently reached by Lu and colleagues when looking at changes in gene expression in a mouse model of asthma: genes whose expression is the most affected by the disease have low connectivity while genes coding for hub proteins tend to display stable expression levels (Lu et al, 2007 Mol Syst Biol 3:98).

Reading this work, two main questions come to my mind:

First, if a majority of disease genes are not more central than non-disease genes, what will be the "network-based explanation" for the mere fact that they are implicated in a human disease? What kind of model will be needed to achieve this fundamental prediction?

Second and on a more general note, it looks to me that system-level approaches will be needed to integrate the environmental causes to human disease. While there is no question about the power of genetics and genomics to provide a global view on human diseases, I find it useful to remember that, as Jeremy Nicholson emphasizes,

the majority of people in the world die from what are, in the broadest sense, environmental causes. (Nicholson 2007, Mol Syst Biol 2:52)

Concrete achievements of Systems Biology in addressing significant human health problems may well require strong research efforts to bring system-level understanding into the impact of environmental factors on disease. This way, the Human Genetic Disease Network might ultimately be extended to a true Human Genetic Disease Network.

April 26, 2007

A Human Microbiome Project?

(via Jonathan A. Eisen, The Tree of Life)
What are the areas that will deeply transform biomedical research over the next decade? One of the possible areas identified for inclusion in the NIH Roadmap is research on the Microbiome (the entire set of microbial species living in the human body). A string of recent studies have revealed a profound impact of the enormously complex mammalian microbiome (Gill et al, 2006) on the metabolism and immune status of the host (for a few examples: Backhed et al, 2004, Dumas et al, 2006, Turnbaugh et al, 2006, Kitano & Oda, 2006, Nicholson et al, 2005). In his blog, J Eisen reports on some of the discussions held at an NIH sponsored workshop on the necessity of a Human Microbiome Project and lists possible research avenues for such a program. From his post:

1. Sequence many "reference genomes." By reference genomes here I mean genomes of cultured isolates that are closely related to organisms known in various human locations.
2. Do metagenomic sequencing of a variety of human mirobiome samples.
3. Conduct large scale human microbiome diversity studies. This could involve rRNA PCR surveys as well as some amount of genome sequencing.
4. Develop the computational tools needed to analyze the massive amounts of data that will come out.
5. Encourage the development of new methods to aid in studies of the microbiome.

Perhaps one would like to add that an understanding of the symbiotic relationship between host and microbiome will also require the development of experimental approaches to manipulate the microbiome and measure its impact on the host physiology.

A friend of mine asked me recently what field might strike the popular consciousness in the coming years. Could it be that it will be the realization that we are all "superorganisms" (Lederberg, 2000) and that our health does not only depend on our personal genome (Church 2005) and our environment, but also on the extended genome provided by our very private microbiome?

February 14, 2007

Open Source Biology

/thumb070214b.jpg

Novartis, The Broad Institute, and Lund University today announced the completion of a genome-wide map of genetic differences in humans and their relationship to type 2 diabetes and other metabolic disorders. All results of the analysis are being made accessible, free of charge on the internet to scientists around the world (Novartis Media Release, Feb 12, 2007)

The results of this study are available at http://www.broad.mit.edu/diabetes/

Has the increasing complexity of genome-wide studies and other large-scale systems biology datasets reached a threshold that makes the open source option more attractive to the pharma industry?

Connecting disease state to genetic modules

Diseases such as cancer are often related to collaborative effects involving interactions of multiple genes within complex pathways, or to combinations of multiple SNPs. To understand the structure of such mechanisms, it is helpful to analyze genes in terms of the purely cooperative, as opposed to independent, nature of their contributions towards a phenotype (Anastassiou, 2007).
Two papers currently published in Molecular Systems Biology address this question:thumb070214.jpg
  • Using an information-theoretic definition of synergy, Dimitris Anastassiou exposes a computational approach to identify ab initio sets of interacting genes linked to a given disease state or phenotype (Anastassiou, 2007). This definition of synergy, derived form a generalization of the concept of mutual information, can connect two levels of organization (for example: genes and disease phenotype) and reveal the structure of the cooperative effects underlying a phenotypic state.
  • Jim Collins and colleagues apply network inference techniques to identify key pathways involved in prostate cancer progression (Ergün et al, 2007). A compendium of 1144 expression profiles spanning multiple cancer types is used to train the "mode-of-action by network identification" (MNI) algorithm. When applied on the test set of prostate cancer profiles, the androgen receptor and several of its cognate target genes are identified as top genetic mediators. This signaling pathway would not have been detected by expression change alone or by pathway analysis using Gene Set Enrichment Analysis (GSEA).