3rd Plenary Meeting

9 Jun 2015

The GA4GH 3rd plenary meeting focused on a push for implementation of tools and approaches, a need to integrate with other data sharing efforts, and the importance of global and sector diversity among GA4GH membership. More than 230 individuals attended the meeting, representing 152 organizations active in 24 countries.

3rd Plenary

EXECUTIVE SUMMARY

The Global Alliance for Genomics and Health (GA4GH) held its third plenary meeting on  Wednesday, June 10th in Leiden, the Netherlands, with ancillary meetings on June 9th, 10th, and  11th. More than 230 individuals attended the meeting, representing 152 organizations active in  24 countries. Key themes of the meeting included a push for implementation of tools and  approaches, a need to integrate with other data sharing efforts, and the importance of global  and sector diversity among GA4GH membership.  

In opening remarks, Steering Committee Chair David Altshuler (Vertex Pharmaceuticals)  called on attendees to envision the future of the GA4GH with a “laser-like focus on making a  difference” and to come up with concrete actions for the next two years. He also announced  changes to the GA4GH leadership: Tom Hudson (Ontario Institute for Cancer Research) will serve as Chair Elect until Altshuler’s term ends later in 2015, while David Glazer (Google),  Nazneen Rahman (Institute for Cancer Research), and Heidi Rehm (Partners Healthcare)  will join the Steering Committee as Charles Sawyers and Betsy Nabel step down.  

A keynote presentation from Ewan Birney (European Molecular Biology Laboratory-European  Bioinformatics Institute) focused on enabling secondary use of clinical data for research through  a federated approach to data sharing, increased global diversity in the GA4GH, and long-term  engagement with the clinical community. He noted that in the world of human genetics, whether  for rare or common diseases or cancer, “sample size is king.” To reach the millions of cases and  controls that will be necessary to identify a valuable second rare disease match or to  understand the biological bases of cancer, federation is critical. But more than sample size,  federation will also allow us to leverage things like genetic drift, an increasingly cosmopolitan  global population, and environmental impacts on disease. “The Global Alliance is not only a  good thing,” Birney said, “but a necessary thing for us to make progress over the next decade.  This is the only way I can imagine this working.” 

The GA4GH has made great strides in the two years since its founding, having grown to more  than 330 organizational members across more than 30 countries. Some 750 individual  members across 48 countries have developed dozens of priority products such as the genomics  API and the Framework for Responsible Sharing of Genomic and Health Related Data, which  are increasingly being used by organizations, clinicians, and researchers around the globe. But  while GA4GH has moved quickly, Altshuler said, the field and the world are evolving faster than  anticipated.  

Several national initiatives have sprung up around the world to assemble massive cohorts of  genomic data. It will be critical to stay relevant and to engage with these efforts, Altshuler said.  To that end, an afternoon session welcomed presentations from representatives of eight regions  around the globe: Latin America, Singapore, China, The Netherlands, the United Kingdom, the  United States, Australia, and Canada. The presenters summarized the current and future data sharing plans of their respective efforts and discussed ways they are leveraging GA4GH tools  and approaches as well as ways GA4GH might help them overcome barriers to effective and  responsible data sharing. The speakers expressed eagerness in working with GA4GH and said  the Alliance could provide assistance in both general and specific areas, including: (1) taking  pilot efforts to a national scale, (2) developing information dissemination technologies, (3) implementing tools and approaches within a national health system, and (4) creating a Brazilian  national database that is representative of the unique mixed population. 

In addition to these national initiatives, several organizations have commenced large-scale data sharing efforts. Security Working Group (SWG) Chair Dixie Baker facilitated a session on the  challenges facing Big Data, which include an overarching lack of computational capability and  infrastructure to generate, maintain, exchange, analyze, and visualize large-scale data sets. Five presentations introduced solutions being developed to overcome those challenges. Virtualizing computational resources, Baker said, may be the only approach for achieving  scalability and elasticity for DNA sequencing and analysis, but that approach comes with its own  set of hurdles. Plenary attendees heard from a wide range of activities, including a patient advocacy effort, a commercial cloud provider, and an academic medical center. 

At the last GA4GH plenary meeting (San Diego, October 2014), the importance of  demonstration projects rose to the surface as well as a need for actionable solutions to the  philosophical guiding principles being developed within GA4GH. A morning session at the June  2015 meeting focused on updates since the last plenary, with presentations from John Burn on  BRCA Challenge, Heidi Rehm on Matchmaker Exchange (MME), David Haussler on the  Beacon Project and other activities of the Data Working Group (DWG), and Graeme Laurie and  Susan Wallace on the Privacy & Security and Consent Policies developed by the Regulatory  and Ethics Working Group (REWG).  

The BRCA Challenge held its first meeting directly after the GA4GH Plenary at the UNESCO  headquarters in Paris, France. The meeting led to the development of time plans for a BRCA  web portal and phone app. The MME now has three databases using its API and is working to  connect them together. The journal Human Mutation will release an edition dedicated to the  MME in October 2015 with more than a dozen papers highlighting its activities over the past  1.5 years, including three that document rare disease cases solved through matchmaking. The  Beacon Project has been working with several other data sharing initiatives to create a large,  interconnected, international network of datasets. Its near term goals include new development  tools and use-cases to foster implementations. The REWG’s Privacy & Security and Consent Policies aim to guide practical implementation of the Framework, which was designed to serve  as a high level starting point for deliberation and action. The policies are now available on the  GA4GH website.  

In addition to well-established projects, new activities have been percolating in the areas of eHealth and clinical cancer genomes. A session on these projects included presentations from John Mattison (Kaiser Permanente) and Mark Lawler (Queens University Belfast). The eHealth  Task Team has created a catalogue of activities to guide increased data sharing collaboration  and is working to manage the ecosystem of ontologies and semantic representations. The  Clinical Cancer Genomes (CCG) Task Team recently surveyed the GA4GH community to  identify ways to highlight and enable collaboration between ongoing data sharing efforts. It has  also produced an opinion paper arguing that the international community must overcome  several challenges including lack of interoperability and reluctance to share before Big Data genomics can advance human health.

Ancillary meetings were also held in conjunction with the main plenary session. The DWG and the Clinical Working Group (CWG) held a crosscutting meeting on June 9th, during which audience members heard 15 presentations on topics ranging from large-scale clinical data sharing initiatives to more granular efforts to improve standardization and consistency across  technologies. By the next plenary meeting, a newly formed planning team co-led by Lawler and  Adam Margolin (Oregon Health & Science University) will identify and launch a cross-cutting  global actionable cancer initiative that brings together the domain expertise and medical goals  of the CWG with the data models and APIs developed across the DWG.  

In addition to the cross-cutting cancer session, the CWG, the DWG, the SWG, and the Regulatory and Ethics Working Group (REWG), as well as MME and Beacon all held internal and several cross-cutting planning meetings, and a Beacon hack-a-thon, over the course of the  three day meeting. Working Group Co-Chairs provided summaries of these meetings on June  9th, followed by an open-air panel discussion facilitated by Martin Bobrow. Key themes that  arose during this discussion centered on engaging patient, physician, and disease advocacy communities, increasing global representation of the GA4GH, and implementing GA4GH tools  and approaches. 

Finally, a closing session on June 10th facilitated by David Altshuler identified concrete actionitems for the coming months. The GA4GH Strategic Road Map for 2015 and 2016 helped guide  the discussion, which focused on developing priority products, delivering packaged working solutions, supporting demonstration projects, communicating with key audiences, establishing GA4GH as a thought leader, and building leadership and participation. 

Some specific recommendations included (1) an initiative of patient advocacy organizational  members of the GA4GH to drive increased patient engagement, (2) establishing thought  leadership in the area of data sharing incentivization, (3) creating a communications and  engagement working group, (4) developing ethics training tools for non-clinical researchers  working with genomic data, and (5) encouraging implementation of both the technical tools and  regulatory approaches being developed by the GA4GH Working Groups.

STRATEGIC ROADMAP (2015-2016)

Throughout the three-day plenary meeting, discussions tied back to the GA4GH Strategic RoadMap, which outlines activities for 2015 and 2016 focused on priority needs to enable data  sharing in areas that provide value to members and the field, so that the effort ensures results,  relevance, and sustainability. The objectives and activity directives in the Road Map provide  GA4GH Members with the prioritization and framing needed to develop systems, drive concrete  actions, and create targeted progress in eight key areas: 

  1. Develop priority products
  2. Deliver packaged, working solutions
  3. Support demonstration projects
  4. Align with major data collection and sharing efforts
  5. Communicate strategically with key audiences
  6. Establish GA4GH as a thought leader
  7. Building leadership and participation
  8. Expand organisational capacity and funding stream

INTRODUCTION

In opening remarks, Steering Committee Chair David Altshuler (Vertex Pharmaceuticals) reminded the audience of the history of the Global Alliance for Genomics and Health (GA4GH)  and called on them to envision its future with a “laser-like focus on making a difference.”  

In early January 2013, the founding members of GA4GH met with nothing but an idea: they  sought to create an organization that would bring together people from different sectors,  countries, and perspectives to enable genomic and clinical data sharing in order to improve  health. “It is remarkable and exciting to see what has happened,” Altshuler said.  

Now a network of more than 750 individuals across 320 organizations and 48 countries have  organized into four vibrant working groups, which are creating the tools, methods, and  frameworks critical to enabling genomic and clinical data sharing. Upon this solid foundation,  GA4GH members have established data sharing demonstration projects to put those ideas and  people to work.  

But while GA4GH has made great, quick strides, Altshuler noted, the world has moved even  quicker. “We’re pedaling hard and making progress but the world is speeding up,” he said. The  world is at an inflection point, moving from decades of talk to actually seeing genomic data  making their way into clinical medicine. Several large-scale clinical genomics projects are  developing or underway around the globe. In order for GA4GH to succeed—for data to be  shared responsibly and effectively—it must engage strongly with these efforts, Altshuler said. 

He also announced changes in the GA4GH leadership, which were established through a  nominating committee led by Martin Bobrow (University of Cambridge). Current Steering  Committee member Tom Hudson (Ontario Institute of Cancer Research) will serve as Chair Elect until Altshuler steps down later in 2015. “I’ve known Tom for 20 years,” Altshuler said, “and  I can’t think of better person to take this forward.” Additionally, he announced three new Steering Committee members: Heidi Rehm (Partners Healthcare), Nazneen Ramhan (Institute of Cancer Research), and David Glazer (Google). GA4GH continues to search for one to two more Steering Committee members with a particular eye toward geographic diversity. Finally, Altshuler announced that founding Steering Committee members Charles  Sawyers and Betsy Nabel will step down. Nabel will maintain a senior advisory role while  Sawyers, a current Clinical Working Group Co-Chair, will place more focus on cancer data  sharing projects. 

Altshuler closed by asking members to consider where the GA4GH should be two years from  now. “I think everybody would agree we’d like to see tangible benefits to medicine, health care,  and patients,” he said. In order to achieve that goal, data sets will need to be brought together in  an appropriate, responsible way to enable discovery, better diagnosis, and interpretation of  genomic information. But, he said, we must first identify the actual steps to get there. “We need  to make sure we don’t, in the spirit of trying to solve all problems, not solve any,” he said.

KEYNOTE SPEAKER: CHALLENGES AND OPPORTUNITIES (2015-2020)

Ewan Birney (European Molecular Biology  Laboratory (EMBL)-European Bioinformatics Institute (EBI) delivered the keynote address in which he argued for the importance of a  federated approach to data sharing, increased  global diversity in the GA4GH, and long-term engagement with the clinical community.  

The worlds of practicing medicine and research are different, Birney said, operating in different language, legal, financial, and regulatory contexts. Data in the clinic are often totally closed, while data in the research space are often totally open. But globally, research is becoming more relevant to healthcare, which depends on the skills and techniques researchers generate. Simultaneously,  researchers are becoming more reliant on health care, which will generate a “remarkable data set over the next decade,” which  researchers want to use to feed back into health care. In order to do that well, Birney said, we  must acknowledge the differences between these two worlds.  

He used his own research as an example of how this can work: In collaboration with clinicians at  Imperial College London, Birney performed genome wide association studies against a set of  characteristic cardiac features as portrayed in MRIs from 1,500 healthy volunteers. “We are  repositioning a medical dataset that was generated for purposes of human health to understand basic things about cardiac structure,” Birney explained. 

In the current paradigm, which lacks federation and harmonization between clinical and  research datasets, Birney was still able to move his research forward. “Maybe people like me  can just charm clinicians and there’s enough there,” he said. “Maybe we don’t need to federate  more than swapping papers and collaborating with other scientists. Maybe medical data can  stay siloed.” But, he said, as human geneticists across all diseases know, “sample size is king.”  

With the case for sample size well established for common diseases, Birney focused this part of  his discussion on the value of large data sets for rare disease. Taken together, rare disease is  relatively common, affecting eight percent of the human population. Every time a gene involved  in a rare phenotype is identified, it becomes an amazing point of leverage into human biology,  he said. “It may have been discovered with a small set of people, but it immediately tells us  about other things, often common diseases or common phenotypes.” But to do this well, he  said, we need to screen millions upon millions of people in order to find a match between just  two individuals with precisely the same rare allele with the same rare phenotype. 

Sample size is also critical for effective cancer studies, since each somatic tumor is so unique.  “We need gob-smacking numbers of people to understand the relationship between somatic,  germline changes, and environment,” he said. “The calculations go above 10 million.” 

More than enabling large sample sizes, though, a federation of data sets would also allow us to  leverage things like genetic drift, human migration, and environmental factors. He also noted  that infectious disease research, though a complex area with an existing dynamic scientific community, would also benefit from this effort. Thus, Birney argued that federation is critical for  understanding a broad range of human diseases. 

As a result, Birney said, “the Global Alliance is not only a good thing, but a necessary thing for  us to make progress over the next decade. This is the only way I can imagine this working: a  federated system that repurposes clinical medical data for secondary research.” 

Over the last year, Birney said, GA4GH has created a forum not previously present, which  ranges from low-level technical issues to big-picture clinical issues. In particular he noted that  the effort to establish a reference genome graph shows how these disparate communities are  working together to deliver a critical tool for the community. He noted that the success of this  forum, whose kin often fail, goes to GA4GH’s persistent focus on a large but manageable  scope. “My advice is to keep it in this manageable zone. It’s easy to try to bite off all of the  problems in electronic health records, or all of the problems in genomics, but if we do that we  become too diffuse,” he said, noting that the GA4GH could stand to expand its definition of the  term “Genomics” to include RNA, proteins, metabolites, and other “omics” areas beyond the  genome. 

But, he said, GA4GH is still growing and still has places to go. He encouraged a push for  implementation of technical products as well as long-term engagement with the clinical  community. “We have to be in it for the long game,” he said. “Urgency is useful for motivation,  but research and health care have been working for a long time. It is more important to achieve  change in five, ten years time than pushing out an API two months earlier than deadline.”  

Birney also reiterated the concern for a more global GA4GH, with the current make up too  biased towards Anglophone countries. “We need to acknowledge that diversity is good and  stretch out of our comfort zone to embrace that diversity,” he said. In particular, he encouraged  members to look more broadly across Europe to places like Estonia, Scotland, and Denmark,  which are taking unique approaches to ethical data access and regulation, as well as to the  entire planet, where there is a huge diversity of talent, diseases, and thought processes.  

Finally, he also discussed the various ways his organization is engaging with GA4GH, which  include a joint project with the Center for Genomic Regulation in Barcelona to establish a  Beacon on top of the European Genome-Phenome Archive, the development of reference and  annotation APIs for Ensembl variant data, and an effort to harmonize genomic data file formats.

ACTIONS SINCE 2ND PLENARY MEETING

BRCA Challenge

John Burn (Newcastle University) provided an update on the BRCA Challenge, which held its  first meeting directly after the GA4GH plenary  with sponsorship from AstraZeneca and hosted  by UNESCO at its Headquarters in Paris, France. Next generation sequencing is revealing  vast numbers of variants, Burn said, demanding  an ever more pressing need to collaborate. The  BRCA Challenge is an attempt to enable  collaboration in the area of variant interpretation  for the genes BRCA1 and BRCA2. In 2003, the Human Variome Project and the International  Society for Gastrointestinal Hereditary Tumors  (InSiGHT) launched a Variant Interpretation Committee (VIC) for mismatch repair genes, which brought together everyone working  individually in that space. The group established a database, which now receives thousands of  hits each month from the clinical and molecular genetics communities alike. At the first plenary meeting of the GA4GH, it was proposed to do the same thing across the whole genome, and to start with BRCA1 and BRCA2. Although not perfect genes from a bioinformatics perspective (they are big with lots of variation and noise), they are good because they are well known and already well studied—meaning there is a  significant amount of data to use as a test case for integration.  

The BRCA research landscape consists of many databases with unique variants, including  NCBI, ClinVar, LOVD, and the French Universal Mutation Database. Federation is needed to cull all of the definitive work around the world to present an authoritative network of databases  with a reliable curation system for variant annotation.  

Data security is key, Burn said, as it will enable critical buy-in from the public and from patient  advocacy groups. For instance, Britain has a large amount of data in the National Health  System which could be beneficial to this effort, but it isn’t yet being shared because patients are  nervous thanks to near-daily data breaches. An Ethical, Regulatory, and Advocacy (ERA) group akin to the REWG of the GA4GH is dedicated to this topic and held a special session at the  Paris meeting. In addition to the ERA group, three new subcommittees recently formed to  organize and guide activities of the BRCA Challenge. These include the Evidence Gathering  Group, the Classified Variant Collection group, and the Interpretation Group.  

Burn identified three primary needs of the project, which are being actively worked on by the  subcommittees. First, legacy data in hundreds of diagnostic laboratories with up to 20 years of  experience issuing reports to families with a history of breast/ovarian cancer must be captured and retained. Second, an API must be developed for identifying BRCA1 and BRCA2 variants  from whole genomes and exomes. And third, data from the general population must be  collected. Until 2010, Burn said, only those with a family history of breast or ovarian cancer were  genotyped. According to research from Nazneen Rahman, one in every 250 people carries a  pathogenic BRCA variant. “Once people are aware of this, they’ll want to be tested,” Burn said. 

He went on to discuss the story of a Newcastle family that was given conflicting information about the pathogenicity of its variant, which Burn replicated by pretending to be the daughter of the sequenced carrier. He received four conflicting responses from regional diagnostic service  laboratories that used a standard approach but interpreted the slightly confusing literature  differently. They collectively spent eight hours incorrectly answering his question, Burn said.  “We don’t have this kind of time or manpower. We need a reliable reference database,” he said. 

A Q&A session addressed engagement with patient groups, engagement with Myriad, and the  project’s five-year horizon. “What I want is a database on my phone next year, so I can go to an  app and say which pathogenic variants are those,” he said. “If we can get to that point with  these two genes, then the next stop is 20,000 genes.” 

The BRCA Challenge meeting in Paris led to the development of time plans for such an app as well as for a web portal.

Matchmaker Exchange

Heidi Rehm (Partners Healthcare) provided an update on the Matchmaker Exchange (MME),  which also held an internal planning meeting and a cross-cutting meeting with the BeaconProject on June 11th. MME is a federated  network of genomic databases whose goal is to  identify causal genes for rare disease by  matching phenotypes and genotypes.  

While a single case is never enough to implicate  a gene as causative, a second case is sometimes all that is needed to do so for a rare  disease, Rehm said. At the American Society of  Human Genetics meeting in 2013, a group of  rare disease researchers and clinicians set out  to link their respective databases, which were each currently serving as smaller, self-contained  matchmakers, to help identify second cases and thus be collectively more productive for solving rare diseases. 

The task of linking these databases presented a significant technical challenge, since each has  distinct data schemas and content types. Some may collect variant call files without identifying  candidate genes while others focus entirely on the gene. Still others may incorporate data from model organisms or human phenotype ontology terms. Furthermore, even if there is a match  between cases based on genes, the phenotypic data may not be the same.  

Two initial matchmaker databases—PhenomeCentral and GeneMatcher—developed an API  that sifts through these differences to identify commonalities between cases. They then worked  with the GA4GH Data Working Group (DWG) to align the API with GA4GH standards, and with  the Regulatory and Ethics Working Group (REWG) to create a consent policy, a user agreement to enable data donation and functional access, and a set of requirements for becoming a  Matchmaker service. They are currently working on security requirements for each of the  databases and the matching algorithms are continually under development. A third database—Decipher—has since launched the API internally and a number of other matchmakers are actively working to do the same. The team is now working to link the Matchmakers together so  that a query to one will also have the potential to return matches from the others. 

There are two ways to participate in the MME: Databases can link to the Exchange by  implementing the API, or a clinician with a single case can enter phenotype and/or genotype  information into one of the existing Matchmakers, which will return matches from its own  database as well as those to which it is linked through the Exchange. Rehm and others are  working to post a decision guide on matchmakerexchange.org to help clinicians identify the  most suitable database into which they should enter their case. For instance, Rehm said, if you  have just a gene, you might want to enter into GeneMatcher, but if you have an entire VCF file,  GEM.app might be the better choice. 

In a pilot phase, the team implemented test cases into PhenomeCentral and GeneMatcher and  successfully matched all those for which the case was present in both databases,  demonstrating that the system works as intended. They then looked at 45 unsolved cases and  identified 10 new matches. Two of these are thought to be potentially significant, two are still  under consideration, and six were determined false positives (the genes but not the phenotypes  were the same, emphasizing the need to optimize the matching algorithms). 

A special issue of Human Mutation, guest edited by Rehm, Ada Hamosh (Johns Hopkins  Medicine), and Kym Boycott (Children’s Hospital of Eastern Ontario), summarizes the work to  date in 16 papers. These include an overview of the whole project, one paper on the API, two on the mathematics of the matching algorithms, several on the individual matchmaker systems,  and three that document cases solved through matchmaking. The issue will be released in  October 2015, in conjunction with the next ASHG meeting.  

Going forward, the team will work to enable hypothesis-free matching, since clinicians often may  have sequencing data but still no candidate causal genes. The team wants to aggregate the  datasets to look for commonalities that might not be obvious with only an individual case in  hand. Additionally, they hope to enable broader sharing of at least some of the data. For  instance, Decipher has already proportioned 15,000 cases to be freely searched on. They are  also working with ClinGen and the Genetic Alliance to incorporate patient initiated matchmaking,  which will engender a unique set of technical as well as ethical complexities. While still in the  planning stages, it is expected that a group of physicians will initially receive results from any  patient queries to weed out false positives and to serve as Matchmaker stewards. 

In a Q&A session, Bartha Knoppers of the REWG noted that MME uses a tiered access  approach, in which consent is only required when variant data and other patient details are  entered. If only the gene name and structured phenotype terms are entered into the system, the  MME does not consider that identifiable, but rather the “practice of medicine” as it is carried out  today.  

Another audience member suggested that level two access be built into future standard patient  consent forms. Rehm agreed, noting that she and others are working with the NIH and the  REWG to develop a single page clinical consent form for patients to easily share their data to  advance human health. The only way to capture the most data is to advance clinical data  sharing, Rehm said, since the clinic, more than research, represents the majority of cases.

Beacon Project and Data Sharing APIs

David Haussler (University of California at  Santa Cruz) provided an update on the Beacon  Project and other activities of the Data Working  Group (DWG), which consists of more than 500  people across for- and nonprofit entities who are  all collaborating to build a working interface and  a common language for genomics data sharing.  

The group also held internal planning meetings  and crosscutting meetings with the other  working groups during the three-day event.  

The vital information we need to decode our  health is held in silos around the globe,  Haussler said, but we can change that. A world  of technology awaits us if we can break through  some of the social and technical issues. The  DWG is developing an “Internet of genetic  information that will make accessing genetic data as easy as finding a restaurant on your iPhone,” Haussler said. 

The general Internet consists of agreed upon “names” for the objects we want to access (urls),  protocols for how to access them (http), and a universal understanding for how to semantically  interpret content (html). “For genomics, we just need to specialize this for the kind of information  we are transmitting,” Haussler explained. The DWG has developed a means for naming objects,  which doesn’t reveal anything about the object; methods for requesting data, which are used in  all GA4GH tools and applications; and a schema for each type of data that denotes in precise  mathematical terms how to interpret the content. Now we can reliably build apps that depend on  that interpretation, Haussler said. Two such apps are the Beacon Project and the Reference  Human Variation Map, and Haussler provided specific updates on each. 

The Beacon Project, led by Mark Fiume (DNAStack), pulls together the simplest of all genetic  queries and makes it a ubiquitous feature available on the Internet. First identified by Jim Ostell (National Institutes of Health) at the 2014 GA4GH Plenary meeting in London, Beacons ask  trusted data stewards “do you have any genomes in your database with a particular variant at a  particular position.” The queried database responds with either “yes” or “no.” Beacons present  no technical barriers, but there is a social barrier to exchanging even the simplest, most atomic  unit of genetic information, Haussler said. The advantages of the Beacon approach are an open  API, which enables interoperability between systems; a federated model, which negates the  need for a centralized database and instead relies on trusted data stewards; and the technically  simple, sufficiently primitive query, which mitigates (but does not completely eliminate) risk of  identification. To date, fifteen organizations have lit 155 Beacons across 252 datasets. “This is a  very dramatic uptake of a new technology that’s still in its early defining stages,” Haussler said.  

Beacon currently has two levels of access: open and controlled. Open is intended for the  anonymous Internet user looking for any record of a variant anywhere in a database. Controlled is for those interested in a whole genome, which contains identifiable, private information and thus requires a full legal contract. A newly proposed access level is registered access for those  who want to dig deeper than allowed by open access but who don’t need a whole genome. It  would depend on a set of agreements and would require the user to provide credentials. The DWG sees this as an important intermediate level and is working with the Regulatory and Ethics Working Group to develop it. 

The Reference Human Variation Map, led by Gil McVean (University of Oxford) and Benedict  Paten (University of California at Santa Cruz), is a comprehensive and unbiased representation  of the human genome. The current reference genome is not representative of all human  variation. “There is a huge amount of human diversity, and by leaving it out of the standard  reference we have crippled ourselves in many ways,” he said. A graph representation of all the  variation in the human population gets around “what is essentially a giant Tower of Babel  problem,” he said. Various efforts to codify genetic variation and link it to phenotype each use  different schemes to represent the same thing and none is definitively comprehensive. The Reference Variation Task Team is building a graph reference that will be a comprehensive “Rosetta Stone” for human genetic variation. It does not replace existing references but  translates them into one comprehensive archival representation, Haussler explained.  

A single dominant line on the graph denotes the current human reference genome while  surrounding lines represent different individuals’ genomes. “Think of it like a theme in music with  beautiful variations,” Haussler said. “There may be different themes throughout a region.” At its  core, the graph consists of individual bases of DNA. Whereas the reference genome may have  a sequence of ACGGCC at a certain locus, a common variant among the human population  may turn that into GAGGCC. An alternate path on the graph represents this alternative variant.  Within the graph model, every base has a permanent identifier that won’t change when we learn  more about a locus. “How we identify a genetic variant should be durable over decades,”  Haussler said. “And it will be in this new structure.” 

These two examples offered a flavor of the many projects ongoing within the DWG, whose  ultimate goal is to develop a clean, secure, Internet-based system of accessing and exchanging  health information that is available to the doctor, the researcher, and the consumer alike. If we  can achieve this, Haussler said, then the whole Internet will be a health learning system and no  data will be left on the table. 

A Q&A session focused on issues of using crowd-sourced rankings to qualify annotations to the  current reference genome; the importance of healthy cohorts; the fact that most data currently  left on the table is clinical data; and pseudonymizing clinical data in order to capture its richness.  Finally, the distinction between Matchmaker Exchange (MME) and Beacon was addressed. The two are conceptually different, Haussler said. MME matches individuals across a  multidimensional query of genetic and phenotypic attributes whereas Beacon queries only  whether a database contains a genome with a specific variant. “You could think about it as a  Google search,” Haussler said, where Beacon is the initial high-level search and MME allows  you to drill down further like clicking on a web page. Heidi Rehm (Partners Healthcare) noted  that in most cases matching patients via MME requires different variants on the same gene  whereas Beacon looks at the same variant. Beacon and MME are complementary systems, she  said, and individual Matchmakers are working to launch their own Beacons.

Policy: Consent and Privacy and Security

(University of Edinburgh) and Susan Wallace (University of Leicester) presented two policies developed by the  Regulatory and Ethics Working Group (REWG) that are meant to guide practical implementation  of the Framework for Responsible Sharing ofGenomic and Health-Related Data, which was designed to serve as a high level starting point  for deliberation and action. The policies aim to  facilitate and operationalize the Framework’s various principles.  

The Privacy and Security Policy demonstrates how an entity or individual involved in providing, storing, accessing, or managing data can and ought to promote privacy while also promoting  science and sharing. It serves as a framework for mutual recognition and trust, providing immediate and common ground for interaction. 

The policy can also help entities assess their readiness to share data based on GA4GH objectives and aspirations and identify unmet needs which GA4GH can help them address, though it has no authority to enforce the policy’s uptake. Privacy and security have different best practices, Laurie said, and while the fields overlap they are distinct. Privacy is a fundamental human value, enshrined in laws as a basis of rights and responsibilities. Security is a set of practical measures to manage risk. Users must determine which of the policy’s measures are  most relevant to their pursuits and identify clear lines of responsibility and accountability in the  implementation of the various elements of the policy.  

The Consent Policy provides guidance around whether existing consents allow forward data  sharing and how to proceed if changes need to be made. It is intended to help in the design of  prospective consents and does not override existing consent gained from a data donor. It is  specifically geared toward international data sharing and offers practical measures based on a  series of best practices that map to the 10 core elements of the Framework. The consent policy  is based on five basic principles: That consent is an open, communicative, and continuing  process; that there is an intention to share data across institutions, jurisdictions, and national  borders with appropriate approvals; that plans for data sharing should be transparent,  understandable, and accessible; that data donors have a right to withdraw participation or not  participate at all with the understanding that it may not be possible to retrieve and/or destroy  data once shared; and that data users and producers will abide by applicable regulations and  ethical norms when seeking and conducting international data sharing.  

A Q&A session addressed issues of encouraging international data sharing through robust  consent policies, the dynamic state of privacy regulation in the European Union, and the distinction between data sharing and data discovery.

EMERGING PROJECTS

eHealth

John Mattison (Kaiser Permanente) providedan update on activities of the eHealth Task Team, first inviting Gil Alterovitz (Harvard University) to present the eHealth Catalogue ofActivities, an online listing of international genomic and clinical data sharing initiatives. A visual representation of the catalogue reveals a geographic distribution of activities that aligns  well with GA4GH membership. Using co authorship between these activities as a proxy for collaboration, the eHealth Task Team identified which organizations are working together and which are isolated and found a  direct correlation between collaboration and  shared governance. Increasing leadership among shared initiatives, Alterovitz proposed, may help motivate collaboration across currently isolated groups. The eHealth Task Team will continue its work on the Catalogue, expanding its functionality and publicizing it to the broader community, with the hope that it will foster  increased collaboration by providing a foundation for dialogue between institutions with shared  missions and activities.  

Mattison discussed the unchecked growth of semantic representations, data models, ontologies,  and APIs, and how to manage them for global “omics” collaborations. To constrain this growth,  a byproduct of many siloed activities around the globe, Mattison proposed several  recommendations, including the use of federated data enclaves with researcher credentialing  protocols, “lighter” data representations, micro-consent and micro-credit to illuminate data  hoarding, and data concierges to guide users of an organization’s shared dataset.  

Additionally, he proposed a matrix analysis to guide vendor adoption and market penetration of  the most valuable ontologies and semantic representations. Migration of data across different  institutions and associated semantic environments results in unnecessary semantic degradation  by virtue of multiple transformations, Mattison said. But as data sets become more federated,  the community will naturally arrive at that smaller, constrained set of representations, which will  also reduce semantic degradation. Finally, he posited that “ontology” is a one-word oxymoron,  since there is no single right way of organizing all of the information in a given reference model. 

A Q&A session addressed the best means for capturing data from clinical care systems, be it patient or vendor directed (Mattison said both are useful) as well as how to drive vendor uptake  and market penetration of GA4GH standards, which Mattison said should be done through vendors’ clients.

Clinical Cancer Genomes

Mark Lawler (Queens University Belfast) provided an update on activities of the Clinical Cancer Genomes Task Team, which aims to  provide added value to and harmonize with ongoing projects and outputs in the clinical  cancer genome space. The Task Team has submitted an opinion paper for publication, outlining the global cancer genomics landscape and highlighting GA4GH’s activities and  potential in this precision medicine domain. The  paper argues that Big Data genomics will not  efficiently advance until the international  community overcomes a number of barriers, including shortsightedness, lack of interoperability, and reluctance to share and it  outlines how the GA4GH vision can deliver for  scientists, health care professionals, and, most importantly, patients. The Task Team is also deploying a survey to GA4GH members to identify ongoing efforts, highlight examples of best practice, and enable collaboration between  data sharing initiatives in clinical cancer sequencing. Together, the survey and the opinion  paper aim to delineate the importance and benefits of data sharing, discuss key challenges of merged data usability, identify best practices, develop solutions, and highlight the importance of implementation. The Clinical Cancer Genomes Task Team is also developing an Actionable Cancer Genome Initiative. 

GA4GH has the opportunity to foster, nurture, and drive a global actionable cancer initiative, Lawler said, and he announced intentions to establish a cross-cutting cancer driver project, which will be a joint activity between the GA4GH Clinical and Data Working Groups. This joint activity will create an authoritative approach that defines clinical relevance of the somatic cancer genome. The group will also seek to establish the validity of actionable cancer panels, define optimal standards for technical practices, develop curation and annotation approaches to  somatic variant “Big Data,” facilitate regulatory and reimbursement pathways, and establish  flexible, clear principles to justify variant test selection, delivering clinical cancer diagnostic, prognostic or predictive tools that are useable, billable, payable, and sustainable. 

A lively Q&A session addressed the best first steps in an area as complex as somatic cancer, overcoming the challenge of working with data that is currently fragmented, aligning with  regulators to drive uptake, and leveraging crowdsourcing as a tool for integrating currently siloed data.

NATIONAL INITIATIVES

Facilitated by Mark Guyer (US Precision  Medicine Initiative), the participants in this  session were asked to introduce their national  initiatives and answer three questions: (1) how  are you leveraging or planning to leverage  GA4GH in your national initiative? (2) how do  you plan to share data or support data sharing?  and (3) have you identified major barriers that  GA4GH is currently not trying to address and/or  are their challenges that are unique to your  jurisdiction that GA4GH should be aware of? 

Morris Swertz introduced Genome of the  Netherlands, a project of the Dutch arm of the  federated European Biobanking and  Biomolecular Research Infrastructure (BBMRI), which seeks to harmonize and enrich existing  biobanks around Europe in order to integrate and provide easy access to data. BBMRI-NL consists of 900,000 samples across 200 Dutch  biobanks, representing about five percent of the total Dutch population. Four of the 12 broad based projects were specifically relevant to genomics, including Genome of the Netherlands (GoNL), which was the first population-specific whole genome sequencing study in the world.  

GoNL led to the discovery of many new variants, knowledge which is being put to immediate  clinical use. Two follow-up studies, Biobank Integrative OMICS Studies (BIOS) and the Society of Clinical Genetic Diagnostic Laboratories (VKGL), sought to integrate additional “omics” data  into the GoNL database and to allow clinicians to readily query existing knowledge on a  particular variant, respectively. The team is now focusing on ways to improve dissemination of  the information captured in these projects, with a particular focus on technological challenges.  They are eager to synergize with GA4GH in developing and implementing new APIs and  connecting with those that are already emerging.

Tim Hubbard introduced Genome England and the 100,000 Genomes Project, a project of  the National Health Service focused on treating rare disease and cancer. In addition to  generating new clinically beneficial treatments through whole genome sequencing and enabling  future research, it is also meant to stimulate activity in the UK economy through genomics related spinoffs. Eleven Genome Medical Centers across 75 hospitals have been established to  collect consents, DNA samples, and phenotypic data when a patient presents with a disease  that cannot be diagnosed by an existing test. Illumina performs sequencing on the sample and  all data are fed into a centralized Data Centre for analysis by Commercial Interpretation  Services, which deploy their own algorithms within the Data Centre as virtual machines. A clinical report feeds back to the NHS for verification. Genome England is a member of GA4GH  and expects to implement its APIs within its infrastructure as they become broadly adopted as standards. Since the data are not consented to be redistributed outside of the Data Centre, the project will likely interface with GA4GH through the Beacon Project as well as Matchmaker Exchange at its lowest level of consent. Hubbard noted that GA4GH could provide more guidance on implementing its tools and approaches within a health system, pointing to the  Global Genomic Medicine Collaborative (G2MC) as another group focused on this issue

Paul Lasko introduced the CARE for RARE (formerly the Canadian FORGE Project) as well as  the International Rare Disease Research Consortium (IRDiRC). CARE for RARE aims to  provide diagnoses and new therapeutics for pediatric rare disease and to work with patient  organizations and other stakeholders to bring genomics into Canadian healthcare. It has led to  the identification of causative mutations for 146 rare diseases, half of which were not previously linked to any rare disease, as well as diagnoses for 500 patients. The CARE for RARE database is called PhenomeCentral and is an integral part of the Matchmaker Exchange. While  the project leads to clinical diagnoses for patients, it is a research project and thus data are  consented for international sharing. It is fully supportive of a federated model of data sharing.  

IRDiRC was launched in 2011 to facilitate international collaboration and data sharing among  rare disease researchers. It has two goals: to catalyze the development of diagnostics for most rare diseases and to catalog 200 new therapies by 2020. To date, it has linked more than 3,000 genes to rare disease and identified 144 new rare disease therapies. Matchmaker Exchange started within IRDiRC, but took on a higher level of activity with the help of GA4GH. IRDiRC is  also involved in GA4GH’s Machine Readable Consent Task Team and believes continuing  collaborations between the two groups will be beneficial. Toward that end, Canada launched a  funding opportunity in 2014 called Sharing Big Data for Health Care Innovation – Advancing  Objectives of the Global Alliance for Genomics and Health, aimed at specifically strengthening the Canadian contribution to GA4GH. An announcement is forthcoming on its outcome

Kathryn North presented two Australian initiatives to integrate genomics into everyday  healthcare. First, the Melbourne Genomics Health Alliance (MGHA) is a collaboration of 10  research and healthcare organizations, which compared the clinical utility and cost effectiveness  of whole exome sequencing (WES) compared to standard clinical practice as a first tier assay  for germline and somatic conditions. MGHA members have developed a series of shared tools,  including ethics and consent protocols, a clinical bioinformatics pipeline, and a clinical genomics  data repository. In a pilot study, MGHA showed that WES resulted in a 54% diagnostic rate for  childhood syndromes, compared to 20% for standard practice. The approach is now being  expanded across the entire state of Victoria. Second, the Australian Genomic Health Alliance (AGHA) is a collaborative effort of 41 academic, diagnostic, and genetic services organizations  across Australia, which has applied for funding from the National Health and Medical Research  Council to develop a national approach to genomic medicine. The proposed program aims to  develop a federated genomic data repository that will link phenotypic and genomic data with  electronic health records, using the data sharing approaches promoted by the GA4GH. The  group also plans to use the GA4GH regulatory and ethical standards for consenting patient  data. The biggest challenge facing Australia in this effort is identifying ways to successfully  translate the state-based approach into a national, federated system.

Iscia Lopes-Cendes introduced several activities in Latin America. The Latin American  Collaborative Study on Congenital Malformations is a clinical and epidemiological  investigation launched in 1967 across Latin American hospitals, which has recently begun  incorporating genomic information into its database. The Study Group on Hereditary Tumors is working with two other international genomics initiatives: the Collaborative Group of the  Americas on Inherited Colorectal Cancer and the International Society for Gastrointestinal Hereditary Tumors (InSiGHT). Two Brazilian projects that could benefit from collaboration with  GA4GH are The National Institute of Population Medical Genetics (iNaGeMP), which was  established in 2009 to study rare diseases via clinical, epidemiological, family history, and  genomic data, and The Brazilian Epidemiological and Biobank Stroke Study, a two-year prospective population-based study of stroke across 5 cities using genomic and clinical imaging  data. It expects to collect 2,400 patient samples and 5,000 control samples per year. Samples  collected in a pilot study are now undergoing genomic analysis at the University of Campinas  (UNICAMP) under Lopes-Cendes’ direction. Next, Lopes-Cendes introduced the School of  Human and Medical Genetics, an educational initiative established in 2005 with the Latin  American Network of Human Genetics, which has enrolled hundreds of students across every  Latin America country as well as several Caribbean nations, with particularly strong  representation from Brazil, Mexico, Argentina, and Colombia. Finally, Lopes-Cendes highlighted a collaborative project with GA4GH to address the fact that the Brazilian population is not well  represented in international genomic databases. The country’s population has a complex  genomic background for which other Latin American countries, including those with mixed  populations such as Mexico, cannot be used as proxies. Lopes-Cendes and her team are  producing a database of molecular profiles based on SNP arrays and whole exome sequencing.  They are currently preparing an environment to publicly share the data, are integrating them into LOVD, and will soon light a Beacon on top of them.

Bin Tean Teh introduced various precision medicine activities in Singapore, which span basic science, translational research, and clinical applications in medicine. Three years ago,  Singapore launched POLARIS, a pilot genomic medicine program to identify barriers to the  clinical implementation of genomics. This proof-of-concept multi-institutional collaboration to  translate Singapore research to improve health has established an infrastructure of CAP certified genomics laboratories; Ministry of Health-compliant software for analyzing genomic  data and patient reporting, compatible with local EMR systems; community standards for  regulatory and ethical applications of genomics; and a recently launched Precision Medicine  Institute. Other recent precision medicine efforts include genetic services to test for the TGFB1 mutation for genetic eye disease and multi-gene assays for sudden cardiac death. Over the next  decade, the country will build a medical database consisting of whole genome sequencing data  from 5,000 healthy volunteers integrated with serum metabolites, immunophenotyping data,  cardiac imaging data, and EMR data. Finally, a centralized facility for clinical genomics will  perform rapid data analysis and visualization in concert with a clinical follow up mechanism.  Singapore’s precision medicine leaders are eager to work with GA4GH to scale their efforts

Mark Guyer closed the session with an introduction to the US Precision Medicine Initiative (PMI) and the Big Data 2 Knowledge (BD2K) program. BD2K was established to advance basic and translational science by facilitating and enhancing the sharing of research-generated data. It resulted in the creation of an open digital ecosystem to accelerate biomedical research and its application to human health. The Precision Medicine Initiative is a proposed program with three main components: (1) a near-term focus on cancer, (2) a long-term aim to generalize to the full range of human disease, and (3) to advance the nation’s regulatory framework in order to implement precision medicine in practice. The second component will require the creation of a research cohort of at least 1 million participants that is representative of the American population. This will likely involve several existing cohorts. The NIH fully supports  GA4GH data sharing goals and approaches and intends for the Precision Medicine Initiative to be consistent with its principles and objectives.

BIG DATA CHALLENGES AND SOLUTIONS

Dixie Baker (Martin, Blanck and Associates) Results facilitated the session, beginning with an overview of the challenges relating to Big Data. The overarching challenge, she said, is the lack of computational infrastructure needed to securely generate, maintain, transfer, analyze, and visualize large-scale data sets, and to  integrate the various types of “omics” data with  each other and with clinical data. Virtualizing  computational resources, she said, may be the  only approach for achieving the kind of  scalability and elasticity needed for DNA sequencing and analysis. But virtualization  raises new security and privacy challenges associated with the blurring of physical  boundaries, “Big Data” technologies that render  everything “discoverable,” and compliance with  national laws and institutional policies. “The only  thing we can be sure of is that the challenges will intensify and build over time,” Baker said. The session’s five speakers shared some of the  solutions their organizations are taking to overcome those challenges.

Jun Wang (Beijing Genomics Institute) discussed BGI’s efforts to realize its Million Genomes  Project, announced two years ago to build a strong national “multi-omics” database in three to  five years. CompleteGenomics (a BGI company) currently has capacity to sequence 10,000  genomes per year using its newly released Revolocity platform, which has a current upper limit  of 30,000 genomes per year but is being expanded to reach 300,000 and has an ultimate goal  of 1 million genomes per year. Achieving that capacity will require significant investment, for  which BGI is exploring unique business models. The company anticipates that if More’s law  continues to apply, the cost of sequencing a genome will drop to $1 genome by 2019. BGI  hopes to work together with GA4GH to develop an open research platform on top of the  Revolocity system and aims to create a network of multi-omics data akin to the Internet. Wang  said that cultural differences mean different issues of data privacy and ownership in China. As a  pilot project, the company has collected a comprehensive set of multi-omics data on the  Chinese wheat crop, foxtail, and is analyzing it in a controlled, machine learning environment with promising results.

Angel Pizarro (Amazon) presented Amazon Web Services (AWS), a global provider of on demand, pay-by-the-hour, commitment-free “cloud” computing infrastructure. The ability to  easily share data and applications across institutional boundaries and the ability to publish preconfigured resources to the community at large are key to collaboration. Before the cloud,  population-scale science was limited to institutions with the most computational (and financial)  resources. With AWS, users provision only the size of compute needed for a given project. Data  can be put into the cloud and access can be requested and granted on a temporary basis. AWS  operates 11 regions worldwide; once data are persisted in a region, they remain within that region. This enables AWS to comply with jurisdictional laws restricting the physical location of  data. As an example, the National Database for Autism Research (NDAR) is storing its  phenotype data on the AWS cloud. Credentialed collaborators that want to use NDAR data can  bootstrap it into their own AWS environment, within which they use their own analytical processes or leverage AWS’s open-source ecosystem of preconfigured genome analysis pipelines. For data security, AWS uses a shared responsibility model similar to that described in  the GA4GH Security Technology Infrastructure. AWS uses a shared responsibility model for data and service security. It does not use customer data, but provides a suite of service  offerings for the user to control all of its own risk mitigation strategies. At the end, a customer  can have a third party audit their solution for compliance with industry standards.

Mathew Pletcher (Autism Speaks) presented MSSNG, an initiative to generate an open  access database of 10,000 whole genomes and associated phenotypes from families with  autism. The general goal of MSSNG is to accelerate understanding of the genetic  underpinnings of autism and its specific goal is to introduce more categorical granularity into the  diagnosis. As a patient advocacy group, Autism Speaks intends to provide a community portal  to connect donor families to their data and, ultimately, to other families with a shared genetic  subtype. MSSNG aims to interface with a diverse set of stakeholders, including academic  researchers, large and midsize pharmaceutical and clinical centers, as well as non-traditional  users such as entrepreneurs, the diagnostic industry, educators, parents, data donors, and non academic researchers. MSSNG has so far generated more than 3,000 whole genome  sequences and is on track to reach its goal of 10,000 by the beginning of 2016. The data will be  freely available to credentialed researchers in the Google cloud environment. A web portal will  be launched in July 2015 to support simple queries while the Google platform allows command  line access for more in depth investigations of the dataset. Autism Speaks is taking on all the  costs associated with data hosting and analysis. It is working with the Public Population Project  in Genomics and Society (P3G) to establish an access policy to reduce obstacles to data  sharing while still protecting patient privacy and honoring consent. Current MSSNG data have  not been consented for broad sharing, so Autism Speaks has begun a process of re-consent  and going forward will use a universal consent, currently in development. A Q&A session  touched on the expected number of data access requests, the breadth of associated clinical  phenotype data, and issues of credentialing researchers in a tiered consent model.

Richard Gibbs (Baylor College of Medicine) provided an update on work performed on data  from the CHARGE project, in a collaboration between BCM, DNANexus, and Amazon Web  Services (AWS) to perform the largest ever cloud-based genomic analysis. The group has  expanded its scope to also include sequencing data from Alzheimer’s patients from the  Alzheimer’s Disease Sequencing Project. Since last year, BCM has also grown its clinical  datasets, thanks to the opening of a routine diagnostic lab that brought in more than 6,000 new  cases. The group has made its data available to researchers through a series of portals based  on selective access. Gibbs said that much of the success of the collaboration can be attributed  to the technical solutions developed by the GA4GH community and requested a continued focus  on refining the model. He also requested better solutions to integrating clinical data into the  research arena, pointing to Matchmaker Exchange as a good, but as yet not readily scalable,  option. He also noted that redundancy between the Matchmaker Exchange and the Beacon  project needs to be addressed. Gibbs identified individual privacy, and HIPAA compliance, as  the CHARGE project’s biggest challenge. He noted that the collaborations would be well served  by an easier, more straightforward approach to data access and consent. The consortium has been working on this in a cohort of pancreatic cancer patients, for whom data are now freely  available for querying via the GA4GH API on the DNANexus website. A Q&A session  addressed the favorability of federation versus a centralized hub of data, new analytical  methods for interpreting the non-coding parts of the genome, protection of patient privacy, and  the process of re-consenting patients after testing

Arcadi Navarro (Centre for Genomic Regulation) presented the European Genome  Phenome Archive (EGA), a federated collection of 1,300 datasets across Europe, spanning 73  studies from 308 data providers, including International Cancer Genome Consortium (ICGC),  IRDiRC, UK10K, and the Wellcome Trust Sanger Institute. EGA includes data that may identify  individuals, so the 1.7 Perabytes of information it contains need to be kept secure, despite  heterogeneous patient consents. Access to EGA data is controlled in accordance with Data Use Agreements. Furthermore, the data must be able to be redistributed across a variety of  databases, each housed in different countries with varying regulatory contexts. The EGA, a  project of the European Bioninformatics Institute (EMBL-EBI) and the Centre for Genomic  Regulation (CRG), faces similar issues to those being addressed by GA4GH, including how to  successfully federate. Luckily, EGA isn’t the first group to face this problem, Navarro said, and  then he displayed a picture of Tony Stark in his Ironman suit. Taking a cue from the great, if  fictional, inventor, the EGA team set out to create a series of modular parts that interact with  one another in a standardized way. Each can work independently, but can also be wrapped into  a standard interface, Navarro said. The external services are the same, but the internal  organization of “EGA 2.0” is based on micro-services modules. GA4GH can help the group  tackle federation, Navarro said, particularly in the areas of secure computing clouds and  federated authentication systems. In the second half of his presentation, Navarro addressed the  importance of robust metadata for effective redistribution. Sharing relies on good metadata, he  said, but there is currently no standardized or proscriptive approach to collecting that  information. An EGA task force has been established to address this issue, and will likely  engage Beacon and Matchmaker Exchange to do so.

WORKING GROUP PANEL DISCUSSION

Ancillary meetings took place on June 9th, June  11th, and the evening of June 10th, which  included several crosscutting sessions as well  as internal meetings of each of the four Working  Groups, Matchmaker Exchange, Beacon  Project, and several of the Task Teams (see  appendix for details). Martin Bobrow facilitated a closing session in the afternoon of June 9th,  during which Working Group Chairs provided  summaries of discussions thus far followed by a  panel discussion with Paul Flicek and Dixie  Baker (SWG Co-Chairs), Kathryn North (CWG  Co-Chair), David Haussler (DWG Co-Chair),  Kazuto Kato (REWG Co-Chair), and Ewan  Birney (plenary keynote speaker).  

Bobrow noted that the landscape for data sharing has changed considerably over the past year. This is due in large part, he said, to GA4GH members who are spreading the word that  genomic and clinical data will not improve health unless we share and analyze them in groups  rather than “in our own little siloes.”  

A summary of the ensuing discussion follows.  

  • There has been much activity around cancer. What are we doing in the rest of medicine?
    • The goal is to make the tools and the general principles of ethics, consent, and security applicable across a range of disorders. 
    • We are beginning to engage with the infectious disease community to identify ways to build on existing data sharing efforts using our principles and tools.
    • We could begin to catalyze projects in other disease areas, perhaps in a single area within neuroscience, such as dementia or epilepsy.
  • For each panel member, what is one thing that would help us be more global?
    • By expanding into the infectious disease space. 
    • By reaching out to non-English speaking countries in the developed world, perhaps through Task Teams proactively seeking input from those regions.
    • By producing useful tools that become the standards for clinical practice.
  • Is it time to present our APIs and other tools at clinical professional meetings and engage clinicians in the design? 
    • We want a few hearty physicians to join us in the building process, but we don’t yet have a complete API that a physician could immediately upload and use to improve patient care.
    • A number of upcoming papers, including a whole edition of Human Mutation, will highlight activity in the cancer space.
    • If we choose the right products to push early that have an impact, then we don’t have to a have a fully-tooled system for everybody.
  • How would you pitch a medical center overseeing a large clinical research project to encourage them to apply the GA4GH data sharing framework to their efforts?
    • You are intelligent people: You know where the future is going and that is toward a lot more molecular measurement of patients. And you know that the clinicians in your hospital will have to make informed decisions using those measurements. The power to make better decisions comes from a clinical research base pooled worldwide. So to invest in the GA4GH is to do two things: First, it is to help your clinicians get their heads around the future of medicine and, second, it is to be altruistic, to contribute to that pool from which you’ll benefit in the future.
    • Heads of hospitals and governments also need to hear the economic argument before they will begin to change policy.
    • In the act of sharing data, we often discover inconsistencies and lack of standards inherent in our data. Realizing where data structures and terminologies differ allows harmonization that benefits both the submitter as well as the community at large.
  • Disease advocacy groups need to talk more with patients about the value of data sharing and to incorporate GA4GH into those discussions.
    • Patients and public enrollment are incredibly important. It is not enough to talk amongst ourselves, we also need to look outwards.
  • Many physicians are driven by two things: (1) compliance with regulations and professional licensing and (2) liability concerns. We need to promote the fact that data sharing is both legal and ethical and part of the responsibility of providing the best care. 
  • Can we leverage the pharmaceutical industry and its budget for genomic research
    • The pharmaceutical industry is going through a complicated change: Their model does not generate enough drugs that get licensed at the rate needed to support them. But the industry is beginning to realize the benefit of studying genetic and other molecular measurements. We should all work with our local biotech and pharmaceutical industries to make public/private partnerships that leverage patient cohorts that are in line with the GA4GH framework. 
  • How can we think about not only sharing data but also sharing the outputs of data in order to engage regions or countries with less capacity? 
    • By making things as easy and accessible as possible. Large-scale sequencing data should be immediately converted into smaller, more useable forms.
  • 23andMe has a dataset of roughly 1 million individuals, many of whom have agreed to share their results for research purposes. We should look at how commercial and entrepreneurial genomics efforts are tackling this problem by cutting through middlemen and going straight to where the data are. One way to change behavior might be through leveraging patient power.

ACTION ITEMS

David Altshuler led a closing session on June  10th to discuss action items that emerged during  the first two days of meetings, inviting audience  members to identify the primary challenges that  the GA4GH is not currently taking on but should.  Altshuler used the Road Map that emerged after  the last Plenary to guide the discussion.  

  1. Develop priority products
    • Needed products: 
      • Ethical training and a code of conduct on privacy, ontology, and ethics for non-clinical bioinformatics researchers working with clinical genomic data.
      • Thought processes, structures, APIs, and databases developed for the clinic as have been developed for research, and more interaction between the two.
      • Strategies and tools to standardize data curation efforts.
    • Products in development: 
      • Accountability Task Team to address issues of researcher authenticity and validation in the context of increasing trust rather than a culture of caution.
      • A common lexicon to enable data sharing between entities that use disparate means of patient data anonymization, whether through de-identification, pseudonymization, encryption, or other approaches.
  1. Deliver packaged, working solutions
    • Engineers are needed to deliver easy-to-use software. 
    • A downloadable start-up-kit which provides clear guidelines for helping any group take GA4GH tools and approaches from principles to implementation 
    • To encourage organizations to mention the Framework as the philosophical position guiding new consent forms and to build new consent forms upon the generic GA4GH template. We have seen two such consents emerge thus far. 
    • Demonstrated uptake is one good way for measuring success. We need to continue thinking about how to make our products usable, for if they are not used they will not have impact. 
  1. Support demonstration projects
    • We need to help scale successful existing projects and connect them to other similar groups, as was done for ClinGen via Matchmaker Exchange. 
    • A catalog of examples to immediately demonstrate clinical value of medical genetics efforts and to incite new ideas, applications, and development. 
  1. Align with major data collection and sharing efforts
    • No comments 
  1. Communicate strategically with key audiences
    • Engagement with patients and healthy volunteers, not just advocacy groups, perhaps leveraging the rapidly growing DIY community. 
    • More representation of vulnerable groups. We have established a Paediatric Task Team (launched June 10th) and will soon establish another focused on aging and cognitive impairment. 
    • More communication and engagement with everyday physicians—not just clinical geneticists, but cardiologists, gastroenterologists, and others who are the lynchpins between patients, payers, and hospitals. 
      • To help clinicians realize the value of what we’re doing, we need to convince them through demonstration. When the BRCA Challenge and Matchmaker Exchange lead to diagnoses, society, physicians, and patients will notice.
    • Proposal: A joint project between advocacy organizations within GA4GH membership (i.e., IRDiRC, Genetic Alliance, Global Genes, and others) to drive patient engagement. To take what we’re learning in academic medicine and implement it in the rest of the healthcare system, which is where the majority of care is delivered. 
    • We have Quest Diagnostics and LabCorp, but we need more patient diagnostic companies. We also need to engage with organizations such as Flatiron Health, which process clinical transactions.
  1. Establish GA4GH as a thought leader
    • GA4GH could act as standard-setting body to establish a much-needed robust nomenclature for joining variants across knowledge sources. 
    • GA4GH should serve as the international face of the genomics community and act as a lobbying organization to spur change among government funding agencies and develop incentives that support a share-first, publish-later model. 
      • Our best chance for changing the world’s incentive structures or physicians’ belief systems is through demonstration. When we show we can make discoveries and improve health, then we will see more widespread sharing.
  1. Build leadership and participation
    • After first defining our message, we could establish a communication and engagement Working Group to provide platforms for communicating the value demonstrated by flagship projects and driving engagement from other communities. 
  1. Expand organizational capacity and funding streams
    • No comments

APPENDIX

Download the Meeting Report to view full details on Crosscutting sessions and meetings from the Security Working Group meetings, Regulatory & Ethics Working Group, Clinical Working group meetings, Data Working Group, and Demonstration Projects.

Categories

Latest Events

Picture of Uppsala, Sweden.
6 Oct 2025
13th Plenary
Plenary
See more
City skyline of Cambridge, USA
1 Apr 2025
April Connect 2025
Connect
See more
16 Sep 2024
12th Plenary
Plenary
See more