AGBT 16 Opening Session

Wednesday, February 10, 2016

Dale Yuzuki-AGBT Guest Blogger (Seracare)


Title: The importance of data sharing

Subtitle: David Haussler: “Every clinician should be able to compare their genome to others.”


Here at the first plenary session at the Advances in Genome Biology and Technology 2016, there is a mix of the strange and the familiar. Strange, in that we are not in Marco Island at the Marriott resort, and it took some time and effort to become familiar with the layout and the location of the important places (ahem, the vendor suites and public places to consume beverages). Familiar, in seeing old friends and meeting new ones, and remembering acquaintances from prior years. And a familiar sight, to hear the latest in where the latest research is leading, as well as progress on some long-standing problems.

David Haussler from the University of California Santa Cruz gave a plenary talk entitled “Global sharing of better and more genomes”, and something he said struck a chord with me – that without the collaborative nature of sharing genomic information (both phenotype and genotype) the entire field of the clinical application of genomic data will simply not make progress. He illustrated this point with the slide, stating the problem: genome data is held in silos, unshared, and not standardized for exchange.

What is great about the work of the Global Alliance for Genomics and Health <> (and on Twitter at @GA4GH <>) is that they are actively working on several fronts to solve this problem. But first a bit of context.

The Bermuda Principles were set forth in 1996 during the Human Genome Project was described by David Bentley as ‘one of the most important early meetings… to help shape the spirit in which it was carried out’ (an oral history recording about this meeting is available of David Bentley here.) <> In one of his talks Francis Collins has described what a courageous step this was – with such an ambitious and resource-consuming and large-scale project that was the Human Genome Project, nothing quite like this had been done before. (For additional details , a 2002 essay from John Suston about that history is available here.) <>

These many years later after the completion of the HGP, and the completion of the HapMap project, over a thousand GWAS and then the Thousand Genomes Project, and here we are today along the path to making genomic discoveries have practical impact in healthcare. I’m reminded of Eric Green’s and Mark Guyer’s Nature Perspective piece in 2011, ‘Charting a course for genomic medicine from base pairs to bedside’ <> – from understanding the structure of genomes, then to understanding the biology of genomes, and then onto understanding the biology of disease and advancing the science of medicine as a memorable ‘heat-map’ that span the years from 2011 to 2020.

The problem remains, however, that with genomic data held in silos, unshared, in a non-standard format progress will not be made. There is a famous quotation from Stewart Brand from 1984 that is often shortened to “Information wants to be free”. However, the full quote is this:

“On the one hand information wants to be expensive, because it’s so valuable. The right information in the right place just changes your life. On the other hand, information wants to be free, because the cost of getting it out is getting lower and lower all the time.

So you have these two fighting against each other.”

This tension, between the value of genomic information in the right place that can literally save a person’s life, and the cost of information becoming lower all the time, is something that needs to be solved.

Dr. Haussler spoke of three major projects of the Global Alliance for Genomics and Health – the first is called the ‘Beacon Project’ that gets a simple yes/no answer ‘to test the willingness of international sites to share genetic data in the simplest of all technical contexts’ (information about this is here) <> The second project is called the ‘BRCA Challenge’ <> which aims to pool BRCA1 and BRCA2 data into a global resource (headed by Sir John Burn of Newcastle University and Stephen Chanock of the NCI); there are over 12,000 coding variants known within these two genes, all ‘scattered everywhere’. The third project he spent some time on, the generation of a new form of human genome variation map.

He laid out how this graph structure works, merging diverse genomic sequencing data into one graph structure that reflects the total diversity discovered in that specific genomic region, each given a unique base-level identifier. The goal is to create a ‘Rosetta Stone of the human genome’, allowing for different kinds of ‘connections’ including inversions that intersect sequences on their sides The more variation put into the graph, he claimed the better it become (“we are beating GATK3 at variant calling”), and showed data comparing several graph-based mapping methods across the MHC region.

The Global Alliance for Genomics and Health is working on important and vital projects (he touched on the development on a common genome ‘API’ with many applications interfacing with it); will you get involved with the genomic data you have in your silo? GA4GH invites organizations and individuals to become a member. <>