Main

Systems Medicine Archives

May 20, 2008

Google Health, Biomedical Mutual Organizations and Open Consent

GoogleHealth.jpg Google Health, the new service offered by Google is now online (via bbgm, Life as a Healthcare CIO, GTO). This service helps users to store, organize and share their health profile and medical records, to use a variety of health-related online services and to search for medical information. Understandably, Google places great emphasis on data security and confidentiality. In this regard, I thought it might be worth highlighting several recent and thought-provoking discussions around the issues of data privacy and participative medical investigations.

In a provocative editorial (Bains, 2007, see also Nature Medicine News article), William Bains advocates that collectives of individuals, so-called 'Biomedical Mutual Organization', could organize themselves on a voluntary and self-funded basis to conduct clinical trials that would rely on extensive self-experimentation, data sharing and pooling of analytical resources. This proposal challenges the classical view that those who conduct a clinical trial should avoid conflicts of interest with respect to the outcome of the trial. On the other hand, Bains argues, this system would allow more innovative and radical trials to be performed, given that the subjects of the trial would have increased trust in the research process (being their own trial managers) and, hopefully, a more accurate perception of the risk/benefit balance involved.

Another radical proposal is the concept of 'open-consent' as currently applied within George Church's Personal Genome Project (Church, 2005). Jeantine Lunshof, George Church and colleagues highlight in a recent review (Lunshof et al, 2008) the limitations of the current definitions of genetic privacy and confidentiality in view of the rapid advances in the fields of human genetics and personal genomics. In particular, the creation of large database interlinking individual genome-wide genotypes to extensive phenotypic profiles will make de-identification of such datasets increasingly difficult if not impossible (Lowrance and Collins, 2007). Under these conditions, it appears that the promise of absolute anonymity and confidentiality of private data is becoming unrealistic. Church and colleagues affirm that an 'open-consent' policy would avoid making such false promises and would therefore represent a more realistic way to formulate an adequately informed consent when accepting to participate to a human genomic research study.

At last month's ESF Conference on Systems Biology, Hiroaki Kitano discussed the potential of multi-component, combinatorial therapies (see also Kitano, 2007). He introduced the tentative idea of an 'Open Pharma' strategy, which would attempt to exploit beneficial synergistic effects that may result from combined administration of cheap generic drugs. He envisions that this type of approach could ultimately lead the way to novel and hopefully more affordable therapeutic strategies, which would provide a potential alternative to the current single-target proprietary drug paradigm.

Observing the launch of Google Health within the context of this series of rather revolutionary proposals, it is tempting to imagine for a moment what would result from large-scale self-experimentation with multi-component generic drug cocktails combined with web-enabled data sharing under some form of open-consent... Will 'Participative Open Pharma' be our future?

February 21, 2008

Top-down mapping of gene regulatory pathways

Trey Ideker videoIn a very recent lecture (see full video from NIH VideoCasting) given for the NIH Systems Biology Special Interest Group, Trey Ideker presents a great overview of the various strategies his group has been developing in the recent years in order to integrate multiple types of large scale datasets. While one of the most pervasive 'meme' about high-throughput measurement is that they are "notoriously unreliable" (see Hakes et al, 2008, for a recent example), Trey beautifully illustrates how predictive computational models and novel biological insights can be generated by sophisticated data integration strategies. Three types of applications are presented in his talk:

  1. mapping of transcriptional response pathways
  2. functional mapping of protein complexes
  3. disease diagnosis and stratification

In the last section, Trey presents the study recently published in Molecular Systems Biology (Chuang et al, 2007, video: 00hr:39min:15sec) where the information provided by microarray expression profiling is superposed to a protein-protein physical interaction network to identify 'subnetwork' biomarkers that classify metastatic vs non-metastatic breast tumors.

January 11, 2008

Consumer Health Information Technology

Play video I highly recommend to visit the NIH VideoCasting page, which hosts many interesting video/podcasts. Even if I realize that this is a bit old according to the blogosphere time scale, I would like to point to this one: "The Future: Consumer Health Information Technology", featuring talks given at a NCI-sponsored meeting on Dec 10, 2007 by Adam Bosworth (formerly "Google Health architect", now starting his own company Keas), Bern Shen (Intel) and Bill Crounse (Microsoft).

In his introduction to the meeting, Bradford Hesse (NCI) colorfully summarizes one of the main concepts exposed by the speakers (the video is very long, so I give some pointers: 0h16min43sec) by comparing the future of healthcare to...an "IKEA flat pack": patients will progressively be empowered to assemble their own care from home, like they would build a piece of (cheap) furniture.

Adam Bosworth (0h25min53sec) presents his very pragmatic vision of how IT could concretely help healthcare (0h39min07sec): a) help the consumer to own and control his personal health data, and this already for very simple basic information; b) provide tools for doctors so that they can deliver personalized care as easily as producing a spreadsheet; c) develop tools for researchers to facilitate the design and implementation of new protocols and clinical trials.

Bill Crounse (Microsoft's other Bill...1h14min30sec) sees 5 major current trends that will increasingly challenge the healthcare system and call for IT solutions (1h26min22sec): a) increasing personal responsibility ("the end of health insurance"); b) progressive "retailization" of healthcare services (eg appearance of "retail minute clinics"); c) commoditization of healthcare providers; d) globalization of access to information (through the web of course); e) globalization of healthcare services. I recommend his little funny anecdote on the high-tech GPS wireless-connected plumber (1h25min30sec) who appears to better equipped than any practicing physician...

The speakers also all insist on the need for massive data integration promoted by the interoperability of formats and coding information, themes that probably sound familiar to many systems biologists.

Toward the end of his talk (1h35min00sec), Bill Crounse shows a short "science-fiction" movie on Microsoft's vision of the future of healthcare: a world full of credit-card sized tablet PCs, touch screens and many other very exciting gadgets (I love gadgets!). But I can't help missing a bit the warmth of human-to-human interactions within this jungle of virtual consultations, retail clinics, remote controlled metabolic parameters, etc... and I didn't quite see in that movie that the doctor would spend more time with his patient or the daughter with her sick Grandma. But this may of course only reflect some old-fashioned side of my temperament...

September 7, 2007

How do we get from the Jimome & Craigome to systems biology?

by George M Church, live from the 9th International Meeting on Human Genome Variation and Complex Genome Analysis, Sep 6-8, 2007 in Barcelona.

Although Jim Watson's genome hasn't been through peer review yet, and Craig Venter’s genome doesn’t have a slick web browser like Jim’s genome yet, we’ve seen enough to ask – what next? Someone at the meeting today got some laughs accidentally when they said that they were comparing Craig’s genome to the human genome. Clearly this is a time requiring great caution. So our first question is: where are we with these first two complete diploid genomes? Well, they’re neither complete nor the first. The Craigome has over 4500 gaps (a bit more than the 341 gaps in the haploid 2004 HGP genome). The first human diploid sequence nod goes to the 269 HapMap genomes published in Oct 2005. Nevertheless we now have the first two non-anonymous personal genomes (hopefully millions someday). Oh, and what is it with press-release that our genomes have higher variation than previously thought? The 0.5% variation observed includes a near-perfect fit to the long-known 0.06% SNP frequency, a 0.08% frequency of smaller indels about twice that seen in 330 genes from Seattle studies, and the remainder being copy-number variants (CNV) 87% of which have been described previously. Just like the number of genes in the genome in 2001, the beauty and the news is in the details not in the summary stats.

We can get from genome variations to systems biology “with focused population association studies, animal models, and functional genomics on the cells from the subjects” (Church 2005). To do genome-wide association studies (GWAS), we must ask where the technology costs are leading? Given the drop in price between the arrivals of the two genomes in the NCBI Personal Genomics directory -- Craig on June 27 at a cost of $70M, and Jim nine days later at a cost of $1M, an over-zealous extrapolationist might be disappointed that the $1K genome did not arrive on July 25. Seriously now, the point is that neither study is inexpensive enough to scale to genome-wide association studies. SNP-chips at $250 each are scaleable, but tend to miss new and/or rare SNPs and small indels. Next generation sequencing and short-read-pairs (Shendure et al 2005) may bring down costs by a factor of 10. Read-pairs seem ripe to become the method of choice for CNVs, smaller indels, and even inversions. Enrichment by hybridization for at least one read to be in an exon or cis-regulatory site might bring costs down another factor of 50. Even if these GWAS studies efficiently get us beyond “linked alleles” to “causative alleles”, they will generate gloriously more hypotheses than they test.

So, back to the other routes to systems biology, animal or cell models could be made to test the 4 million variants per genome (and combinations; oh my!) -- clearly indicating a need for automated homologous recombination methods and/or prioritization of these tests using the third route to systems biology -- “personal functional genomics”. Unlike the HapMap genomes, the Jimome and Craigome are not yet accompanied by extensive phenotypic trait data, nor any cell lines to do so. SNPs and CNVs that affect RNA levels have been elegantly mapped by Spielman et al. 2007 and Stranger et al 2007. Most effects map close to the transcription start sites. Assaying RNA by these standard assays or next-generation sequencing (Kim et al. 2007) from individuals enables comparisons of sum of the two allelic expression levels from the two types of homozygotes (AA & aa) and the heterozygote (Aa) in a variety of different genetic backgrounds and cell-states. In contrast, genome-wide, allele-specific RNA assays would measure the expression from each haplotype in a heterozygote under what is the most ideally identical background state arrangeable. The missing technology is one to gain access to all human tissues (since the list of volunteers for brain biopsies is short). Yet another reason that we will be watching for methods to derive pluripotent stem cells from adult human tissues. Personal functional genomics assays on such personal cell lines are likely to arrive much earlier than (indeed pave the way for) therapeutic applications.


Church GM (2005) The Personal Genome Project. Mol Syst Biol 1:2005.0030

IHGSC et al. (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931-945.

International HapMap Consortium. (2005) A haplotype map of the human genome. Nature 437(7063):1299-320.

Kim JB, Porreca GJ, Gorham JM, Church GM, Seidman CE, Seidman JG (2007) Polony multiplex analysis of gene expression (PMAGE) in a mouse model of hypertrophic cardiomyopathy. Science 316(5830):1481-4.

Levy et al. (2007) The Diploid Genome Sequence of an Individual Human. PLoS 5:e254.

Shendure, J, Porreca, GJ, Reppas, NB, Lin, X, McCutcheon, JP, Rosenbaum, AM, Wang, MD , Zhang, K, Mitra, RD, Church, GM (2005) Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome. Science 309(5741):1728-32.

Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG. (2007) Common genetic variants account for differences in gene expression among ethnic groups. Nat Genet. 39(2):226-31.

Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer SW, Tavare S, Deloukas P, Hurles ME, Dermitzakis ET. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 2007 315(5813):848-53.

September 5, 2007

J Craig Venter's Genome

personal-genome.jpg Many others have abundantly commented on the publication of Craig Venter's genome this week in PLOS Biology (Levy et al, 2007). The sequence of his full diploid genome (HuRef) reveals that the degree of genetic variability between maternal and paternal chromosomes is much higher (0.5%) than expected. Part of this variability is due to insertion/deletions (indels represent only 22% of variant events but amount to up to 75% of the variant nucleotides), alterations that are typically missed by SNP genotyping (SNPs represent 78% of the variants). Copy number variations (62, amounting to 10Mb) are also reported, albeit not determined by sequencing but via microarray genomic hybridization. With regard to variability in gene exons, the analysis shows that at least 44% of the genes are in the heterozygous state.

Beside its scientific content, the psychological impact of this study is considerable. The small interactive "toy" map (see illustration, modified from Levy et al, 2007) published on the PLOS Biology website is a particularly strong symbol: the entire genome of an individual human being displayed on a single page! The fact that a genome is a beautiful linear structure and can be displayed in such a simple and compact way, on a single poster, inevitably triggers the reflex to zoom in, focus on a favorite gene and speculate on the resulting phenotype. This almost unavoidable fascination for a linear interpretation of a linear structure (one gene maps to one disease) is illustrated by the many comments on Craig Venter's "genetic destiny", "wet earwax", predispositions (or lack thereof) to "alcoholism, coronary artery disease, obesity, Alzheimer’s disease, antisocial behavior and conduct disorder" and last but not least, to his blue eyes. Even if Venter himself makes it clear that it "is an impressive array of large sets of genes together with environmental conditions that will determine life outcomes" (Anderson Cooper Blog), it remains that it is very hard to visualize this reality in an intuitive way. The PLoS poster shows a linear map, not an intricate probabilistic network of any sort. The educational efforts required to change in the general public the easy linear representation into a more "integrative" view is certainly going to be a major but decisive challenge, if, with the advent of personal genomics, individuals will be expected to exert more control and responsibility over their own health. Will Systems Biology manage to enter in our daily lives?

Of course, this would imply deciphering the multigenic basis of complex human traits in the first place. But it is precisely this lack of current knowledge on the genotype-phenotype relationship which represents one the strongest scientific incentives to sequence many more individual human genomes and correlate them with the respective medical, physiological and environmental parameters. In this regard, the availability of much cheaper and more efficient sequencing technologies (eg. enabling sequencing, within 10 days, of 100 human genomes at 98% completion, 10-5 accuracy and for $10'000 per genome, as challenged by the Archon X prize foundation, or allowing sequencing 1% of the genome for $1000 as in George Church's Personal Exome Project) may well represent an even more revolutionary advance than the first individual human genome published this week. As George Church wrote in his Editorial (The Personal Genome Project, Church (2005), Molecular Systems Biology 2005.0030),

Ready access to highly integrated and comprehensive human genome and phenome data sets is extremely important and increasingly feasible technically [...] As DNA is only a small part of destiny, personal genomics might fruitfully de-emphasize 'prediction' and focus on augmenting systems biology interpretations and prioritizations of actual day-to-day measurements of our physiological states.

May 16, 2007

The Human (Genetic) Disease Network

thumb070516.jpgThe relationship between genetic mutations and human diseases is often complex and ambiguous: a given disease can be associated with mutations in distinct genes and, conversely, mutations in a given gene can be associated with several diseases. Can this many-to-many relationship be exploited to construct a human disease network and extract information on the human disease landscape?

In their work just published in PNAS, Albert-László Barabasi, Marc Vidal and colleagues reconstruct such a "diseasome" network in which disorders are linked to the respective associated disease genes (Goh et al, 2007 PNAS). Two projections of the network are presented: a) the Human Disease Network (HDN), in which diseases are connected to each other if they share a common disease gene; b) the Disease Gene Network (DGN), in which genes are connected if they are associated with a common disease. The HDN has a giant component comprising almost half of the diseases, in which some classes of disorders cluster naturally (eg cancers or neurological disorders, but not metabolic disorders). The DGN, when integrated with functional annotations, expression and protein-protein interaction data, provides a first step towards a "network-based explanation of the emergence of complex polygenic disorders" in the sense that it reveals, perhaps not too surprisingly, how functionally related genes can lead to similar disorders.

The authors also look at the centrality of human disease genes in the protein-protein interaction network. An interesting twist comes when human disease genes are separated into essential and non-essential classes, according to the lethal or non lethal mouse phenotype resulting from the knockout of the respective orthologous genes. While essential genes tend to be associated with hubs in the interactome, disease genes that are non-essential (representing 78% of all disease genes) do not display a higher connectivity than non-disease genes. A somewhat complementary conclusion was recently reached by Lu and colleagues when looking at changes in gene expression in a mouse model of asthma: genes whose expression is the most affected by the disease have low connectivity while genes coding for hub proteins tend to display stable expression levels (Lu et al, 2007 Mol Syst Biol 3:98).

Reading this work, two main questions come to my mind:

First, if a majority of disease genes are not more central than non-disease genes, what will be the "network-based explanation" for the mere fact that they are implicated in a human disease? What kind of model will be needed to achieve this fundamental prediction?

Second and on a more general note, it looks to me that system-level approaches will be needed to integrate the environmental causes to human disease. While there is no question about the power of genetics and genomics to provide a global view on human diseases, I find it useful to remember that, as Jeremy Nicholson emphasizes,

the majority of people in the world die from what are, in the broadest sense, environmental causes. (Nicholson 2007, Mol Syst Biol 2:52)

Concrete achievements of Systems Biology in addressing significant human health problems may well require strong research efforts to bring system-level understanding into the impact of environmental factors on disease. This way, the Human Genetic Disease Network might ultimately be extended to a true Human Genetic Disease Network.

April 26, 2007

A Human Microbiome Project?

(via Jonathan A. Eisen, The Tree of Life)
What are the areas that will deeply transform biomedical research over the next decade? One of the possible areas identified for inclusion in the NIH Roadmap is research on the Microbiome (the entire set of microbial species living in the human body). A string of recent studies have revealed a profound impact of the enormously complex mammalian microbiome (Gill et al, 2006) on the metabolism and immune status of the host (for a few examples: Backhed et al, 2004, Dumas et al, 2006, Turnbaugh et al, 2006, Kitano & Oda, 2006, Nicholson et al, 2005). In his blog, J Eisen reports on some of the discussions held at an NIH sponsored workshop on the necessity of a Human Microbiome Project and lists possible research avenues for such a program. From his post:

1. Sequence many "reference genomes." By reference genomes here I mean genomes of cultured isolates that are closely related to organisms known in various human locations.
2. Do metagenomic sequencing of a variety of human mirobiome samples.
3. Conduct large scale human microbiome diversity studies. This could involve rRNA PCR surveys as well as some amount of genome sequencing.
4. Develop the computational tools needed to analyze the massive amounts of data that will come out.
5. Encourage the development of new methods to aid in studies of the microbiome.

Perhaps one would like to add that an understanding of the symbiotic relationship between host and microbiome will also require the development of experimental approaches to manipulate the microbiome and measure its impact on the host physiology.

A friend of mine asked me recently what field might strike the popular consciousness in the coming years. Could it be that it will be the realization that we are all "superorganisms" (Lederberg, 2000) and that our health does not only depend on our personal genome (Church 2005) and our environment, but also on the extended genome provided by our very private microbiome?

February 14, 2007

Open Source Biology

/thumb070214b.jpg

Novartis, The Broad Institute, and Lund University today announced the completion of a genome-wide map of genetic differences in humans and their relationship to type 2 diabetes and other metabolic disorders. All results of the analysis are being made accessible, free of charge on the internet to scientists around the world (Novartis Media Release, Feb 12, 2007)

The results of this study are available at http://www.broad.mit.edu/diabetes/

Has the increasing complexity of genome-wide studies and other large-scale systems biology datasets reached a threshold that makes the open source option more attractive to the pharma industry?

Connecting disease state to genetic modules

Diseases such as cancer are often related to collaborative effects involving interactions of multiple genes within complex pathways, or to combinations of multiple SNPs. To understand the structure of such mechanisms, it is helpful to analyze genes in terms of the purely cooperative, as opposed to independent, nature of their contributions towards a phenotype (Anastassiou, 2007).
Two papers currently published in Molecular Systems Biology address this question:thumb070214.jpg
  • Using an information-theoretic definition of synergy, Dimitris Anastassiou exposes a computational approach to identify ab initio sets of interacting genes linked to a given disease state or phenotype (Anastassiou, 2007). This definition of synergy, derived form a generalization of the concept of mutual information, can connect two levels of organization (for example: genes and disease phenotype) and reveal the structure of the cooperative effects underlying a phenotypic state.
  • Jim Collins and colleagues apply network inference techniques to identify key pathways involved in prostate cancer progression (Ergün et al, 2007). A compendium of 1144 expression profiles spanning multiple cancer types is used to train the "mode-of-action by network identification" (MNI) algorithm. When applied on the test set of prostate cancer profiles, the androgen receptor and several of its cognate target genes are identified as top genetic mediators. This signaling pathway would not have been detected by expression change alone or by pathway analysis using Gene Set Enrichment Analysis (GSEA).