About us
Learn how GA4GH helps expand responsible genomic data use to benefit human health.
Learn how GA4GH helps expand responsible genomic data use to benefit human health.
Our Strategic Road Map defines strategies, standards, and policy frameworks to support responsible global use of genomic and related health data.
Discover how a meeting of 50 leaders in genomics and medicine led to an alliance uniting more than 5,000 individuals and organisations to benefit human health.
GA4GH Inc. is a not-for-profit organisation that supports the global GA4GH community.
The GA4GH Council, consisting of the Executive Committee, Strategic Leadership Committee, and Product Steering Committee, guides our collaborative, globe-spanning alliance.
The Funders Forum brings together organisations that offer both financial support and strategic guidance.
The EDI Advisory Group responds to issues raised in the GA4GH community, finding equitable, inclusive ways to build products that benefit diverse groups.
Distributed across a number of Host Institutions, our staff team supports the mission and operations of GA4GH.
Curious who we are? Meet the people and organisations across six continents who make up GA4GH.
More than 500 organisations connected to genomics — in healthcare, research, patient advocacy, industry, and beyond — have signed onto the mission and vision of GA4GH as Organisational Members.
These core Organisational Members are genomic data initiatives that have committed resources to guide GA4GH work and pilot our products.
This subset of Organisational Members whose networks or infrastructure align with GA4GH priorities has made a long-term commitment to engaging with our community.
Local and national organisations assign experts to spend at least 30% of their time building GA4GH products.
Anyone working in genomics and related fields is invited to participate in our inclusive community by creating and using new products.
Wondering what GA4GH does? Learn how we find and overcome challenges to expanding responsible genomic data use for the benefit of human health.
Study Groups define needs. Participants survey the landscape of the genomics and health community and determine whether GA4GH can help.
Work Streams create products. Community members join together to develop technical standards, policy frameworks, and policy tools that overcome hurdles to international genomic data use.
GIF solves problems. Organisations in the forum pilot GA4GH products in real-world situations. Along the way, they troubleshoot products, suggest updates, and flag additional needs.
NIF finds challenges and opportunities in genomics at a global scale. National programmes meet to share best practices, avoid incompatabilities, and help translate genomics into benefits for human health.
Communities of Interest find challenges and opportunities in areas such as rare disease, cancer, and infectious disease. Participants pinpoint real-world problems that would benefit from broad data use.
The Technical Alignment Subcommittee (TASC) supports harmonisation, interoperability, and technical alignment across GA4GH products.
Find out what’s happening with up to the minute meeting schedules for the GA4GH community.
See all our products — always free and open-source. Do you work on cloud genomics, data discovery, user access, data security or regulatory policy and ethics? Need to represent genomic, phenotypic, or clinical data? We’ve got a solution for you.
All GA4GH standards, frameworks, and tools follow the Product Development and Approval Process before being officially adopted.
Learn how other organisations have implemented GA4GH products to solve real-world problems.
Help us transform the future of genomic data use! See how GA4GH can benefit you — whether you’re using our products, writing our standards, subscribing to a newsletter, or more.
Help create new global standards and frameworks for responsible genomic data use.
Align your organisation with the GA4GH mission and vision.
Want to advance both your career and responsible genomic data sharing at the same time? See our open leadership opportunities.
Join our international team and help us advance genomic data use for the benefit of human health.
Share your thoughts on all GA4GH products currently open for public comment.
Solve real problems by aligning your organisation with the world’s genomics standards. We offer software dvelopers both customisable and out-of-the-box solutions to help you get started.
Learn more about upcoming GA4GH events. See reports and recordings from our past events.
Speak directly to the global genomics and health community while supporting GA4GH strategy.
Be the first to hear about the latest GA4GH products, upcoming meetings, new initiatives, and more.
Questions? We would love to hear from you.
Read news, stories, and insights from the forefront of genomic and clinical data use.
Attend an upcoming GA4GH event, or view meeting reports from past events.
See new projects, updates, and calls for support from the Work Streams.
Read academic papers coauthored by GA4GH contributors.
Listen to our podcast OmicsXchange, featuring discussions from leaders in the world of genomics, health, and data sharing.
Check out our videos, then subscribe to our YouTube channel for more content.
View the latest GA4GH updates, Genomics and Health News, Implementation Notes, GDPR Briefs, and more.
Discover all things GA4GH: explore our news, events, videos, podcasts, announcements, publications, and newsletters.
The GA4GH 3rd plenary meeting focused on a push for implementation of tools and approaches, a need to integrate with other data sharing efforts, and the importance of global and sector diversity among GA4GH membership. More than 230 individuals attended the meeting, representing 152 organizations active in 24 countries.
The Global Alliance for Genomics and Health (GA4GH) held its third plenary meeting on Wednesday, June 10th in Leiden, the Netherlands, with ancillary meetings on June 9th, 10th, and 11th. More than 230 individuals attended the meeting, representing 152 organizations active in 24 countries. Key themes of the meeting included a push for implementation of tools and approaches, a need to integrate with other data sharing efforts, and the importance of global and sector diversity among GA4GH membership.
In opening remarks, Steering Committee Chair David Altshuler (Vertex Pharmaceuticals) called on attendees to envision the future of the GA4GH with a “laser-like focus on making a difference” and to come up with concrete actions for the next two years. He also announced changes to the GA4GH leadership: Tom Hudson (Ontario Institute for Cancer Research) will serve as Chair Elect until Altshuler’s term ends later in 2015, while David Glazer (Google), Nazneen Rahman (Institute for Cancer Research), and Heidi Rehm (Partners Healthcare) will join the Steering Committee as Charles Sawyers and Betsy Nabel step down.
A keynote presentation from Ewan Birney (European Molecular Biology Laboratory-European Bioinformatics Institute) focused on enabling secondary use of clinical data for research through a federated approach to data sharing, increased global diversity in the GA4GH, and long-term engagement with the clinical community. He noted that in the world of human genetics, whether for rare or common diseases or cancer, “sample size is king.” To reach the millions of cases and controls that will be necessary to identify a valuable second rare disease match or to understand the biological bases of cancer, federation is critical. But more than sample size, federation will also allow us to leverage things like genetic drift, an increasingly cosmopolitan global population, and environmental impacts on disease. “The Global Alliance is not only a good thing,” Birney said, “but a necessary thing for us to make progress over the next decade. This is the only way I can imagine this working.”
The GA4GH has made great strides in the two years since its founding, having grown to more than 330 organizational members across more than 30 countries. Some 750 individual members across 48 countries have developed dozens of priority products such as the genomics API and the Framework for Responsible Sharing of Genomic and Health Related Data, which are increasingly being used by organizations, clinicians, and researchers around the globe. But while GA4GH has moved quickly, Altshuler said, the field and the world are evolving faster than anticipated.
Several national initiatives have sprung up around the world to assemble massive cohorts of genomic data. It will be critical to stay relevant and to engage with these efforts, Altshuler said. To that end, an afternoon session welcomed presentations from representatives of eight regions around the globe: Latin America, Singapore, China, The Netherlands, the United Kingdom, the United States, Australia, and Canada. The presenters summarized the current and future data sharing plans of their respective efforts and discussed ways they are leveraging GA4GH tools and approaches as well as ways GA4GH might help them overcome barriers to effective and responsible data sharing. The speakers expressed eagerness in working with GA4GH and said the Alliance could provide assistance in both general and specific areas, including: (1) taking pilot efforts to a national scale, (2) developing information dissemination technologies, (3) implementing tools and approaches within a national health system, and (4) creating a Brazilian national database that is representative of the unique mixed population.
In addition to these national initiatives, several organizations have commenced large-scale data sharing efforts. Security Working Group (SWG) Chair Dixie Baker facilitated a session on the challenges facing Big Data, which include an overarching lack of computational capability and infrastructure to generate, maintain, exchange, analyze, and visualize large-scale data sets. Five presentations introduced solutions being developed to overcome those challenges. Virtualizing computational resources, Baker said, may be the only approach for achieving scalability and elasticity for DNA sequencing and analysis, but that approach comes with its own set of hurdles. Plenary attendees heard from a wide range of activities, including a patient advocacy effort, a commercial cloud provider, and an academic medical center.
At the last GA4GH plenary meeting (San Diego, October 2014), the importance of demonstration projects rose to the surface as well as a need for actionable solutions to the philosophical guiding principles being developed within GA4GH. A morning session at the June 2015 meeting focused on updates since the last plenary, with presentations from John Burn on BRCA Challenge, Heidi Rehm on Matchmaker Exchange (MME), David Haussler on the Beacon Project and other activities of the Data Working Group (DWG), and Graeme Laurie and Susan Wallace on the Privacy & Security and Consent Policies developed by the Regulatory and Ethics Working Group (REWG).
The BRCA Challenge held its first meeting directly after the GA4GH Plenary at the UNESCO headquarters in Paris, France. The meeting led to the development of time plans for a BRCA web portal and phone app. The MME now has three databases using its API and is working to connect them together. The journal Human Mutation will release an edition dedicated to the MME in October 2015 with more than a dozen papers highlighting its activities over the past 1.5 years, including three that document rare disease cases solved through matchmaking. The Beacon Project has been working with several other data sharing initiatives to create a large, interconnected, international network of datasets. Its near term goals include new development tools and use-cases to foster implementations. The REWG’s Privacy & Security and Consent Policies aim to guide practical implementation of the Framework, which was designed to serve as a high level starting point for deliberation and action. The policies are now available on the GA4GH website.
In addition to well-established projects, new activities have been percolating in the areas of eHealth and clinical cancer genomes. A session on these projects included presentations from John Mattison (Kaiser Permanente) and Mark Lawler (Queens University Belfast). The eHealth Task Team has created a catalogue of activities to guide increased data sharing collaboration and is working to manage the ecosystem of ontologies and semantic representations. The Clinical Cancer Genomes (CCG) Task Team recently surveyed the GA4GH community to identify ways to highlight and enable collaboration between ongoing data sharing efforts. It has also produced an opinion paper arguing that the international community must overcome several challenges including lack of interoperability and reluctance to share before Big Data genomics can advance human health.
Ancillary meetings were also held in conjunction with the main plenary session. The DWG and the Clinical Working Group (CWG) held a crosscutting meeting on June 9th, during which audience members heard 15 presentations on topics ranging from large-scale clinical data sharing initiatives to more granular efforts to improve standardization and consistency across technologies. By the next plenary meeting, a newly formed planning team co-led by Lawler and Adam Margolin (Oregon Health & Science University) will identify and launch a cross-cutting global actionable cancer initiative that brings together the domain expertise and medical goals of the CWG with the data models and APIs developed across the DWG.
In addition to the cross-cutting cancer session, the CWG, the DWG, the SWG, and the Regulatory and Ethics Working Group (REWG), as well as MME and Beacon all held internal and several cross-cutting planning meetings, and a Beacon hack-a-thon, over the course of the three day meeting. Working Group Co-Chairs provided summaries of these meetings on June 9th, followed by an open-air panel discussion facilitated by Martin Bobrow. Key themes that arose during this discussion centered on engaging patient, physician, and disease advocacy communities, increasing global representation of the GA4GH, and implementing GA4GH tools and approaches.
Finally, a closing session on June 10th facilitated by David Altshuler identified concrete actionitems for the coming months. The GA4GH Strategic Road Map for 2015 and 2016 helped guide the discussion, which focused on developing priority products, delivering packaged working solutions, supporting demonstration projects, communicating with key audiences, establishing GA4GH as a thought leader, and building leadership and participation.
Some specific recommendations included (1) an initiative of patient advocacy organizational members of the GA4GH to drive increased patient engagement, (2) establishing thought leadership in the area of data sharing incentivization, (3) creating a communications and engagement working group, (4) developing ethics training tools for non-clinical researchers working with genomic data, and (5) encouraging implementation of both the technical tools and regulatory approaches being developed by the GA4GH Working Groups.
Throughout the three-day plenary meeting, discussions tied back to the GA4GH Strategic RoadMap, which outlines activities for 2015 and 2016 focused on priority needs to enable data sharing in areas that provide value to members and the field, so that the effort ensures results, relevance, and sustainability. The objectives and activity directives in the Road Map provide GA4GH Members with the prioritization and framing needed to develop systems, drive concrete actions, and create targeted progress in eight key areas:
In opening remarks, Steering Committee Chair David Altshuler (Vertex Pharmaceuticals) reminded the audience of the history of the Global Alliance for Genomics and Health (GA4GH) and called on them to envision its future with a “laser-like focus on making a difference.”
In early January 2013, the founding members of GA4GH met with nothing but an idea: they sought to create an organization that would bring together people from different sectors, countries, and perspectives to enable genomic and clinical data sharing in order to improve health. “It is remarkable and exciting to see what has happened,” Altshuler said.
Now a network of more than 750 individuals across 320 organizations and 48 countries have organized into four vibrant working groups, which are creating the tools, methods, and frameworks critical to enabling genomic and clinical data sharing. Upon this solid foundation, GA4GH members have established data sharing demonstration projects to put those ideas and people to work.
But while GA4GH has made great, quick strides, Altshuler noted, the world has moved even quicker. “We’re pedaling hard and making progress but the world is speeding up,” he said. The world is at an inflection point, moving from decades of talk to actually seeing genomic data making their way into clinical medicine. Several large-scale clinical genomics projects are developing or underway around the globe. In order for GA4GH to succeed—for data to be shared responsibly and effectively—it must engage strongly with these efforts, Altshuler said.
He also announced changes in the GA4GH leadership, which were established through a nominating committee led by Martin Bobrow (University of Cambridge). Current Steering Committee member Tom Hudson (Ontario Institute of Cancer Research) will serve as Chair Elect until Altshuler steps down later in 2015. “I’ve known Tom for 20 years,” Altshuler said, “and I can’t think of better person to take this forward.” Additionally, he announced three new Steering Committee members: Heidi Rehm (Partners Healthcare), Nazneen Ramhan (Institute of Cancer Research), and David Glazer (Google). GA4GH continues to search for one to two more Steering Committee members with a particular eye toward geographic diversity. Finally, Altshuler announced that founding Steering Committee members Charles Sawyers and Betsy Nabel will step down. Nabel will maintain a senior advisory role while Sawyers, a current Clinical Working Group Co-Chair, will place more focus on cancer data sharing projects.
Altshuler closed by asking members to consider where the GA4GH should be two years from now. “I think everybody would agree we’d like to see tangible benefits to medicine, health care, and patients,” he said. In order to achieve that goal, data sets will need to be brought together in an appropriate, responsible way to enable discovery, better diagnosis, and interpretation of genomic information. But, he said, we must first identify the actual steps to get there. “We need to make sure we don’t, in the spirit of trying to solve all problems, not solve any,” he said.
Ewan Birney (European Molecular Biology Laboratory (EMBL)-European Bioinformatics Institute (EBI) delivered the keynote address in which he argued for the importance of a federated approach to data sharing, increased global diversity in the GA4GH, and long-term engagement with the clinical community.
The worlds of practicing medicine and research are different, Birney said, operating in different language, legal, financial, and regulatory contexts. Data in the clinic are often totally closed, while data in the research space are often totally open. But globally, research is becoming more relevant to healthcare, which depends on the skills and techniques researchers generate. Simultaneously, researchers are becoming more reliant on health care, which will generate a “remarkable data set over the next decade,” which researchers want to use to feed back into health care. In order to do that well, Birney said, we must acknowledge the differences between these two worlds.
He used his own research as an example of how this can work: In collaboration with clinicians at Imperial College London, Birney performed genome wide association studies against a set of characteristic cardiac features as portrayed in MRIs from 1,500 healthy volunteers. “We are repositioning a medical dataset that was generated for purposes of human health to understand basic things about cardiac structure,” Birney explained.
In the current paradigm, which lacks federation and harmonization between clinical and research datasets, Birney was still able to move his research forward. “Maybe people like me can just charm clinicians and there’s enough there,” he said. “Maybe we don’t need to federate more than swapping papers and collaborating with other scientists. Maybe medical data can stay siloed.” But, he said, as human geneticists across all diseases know, “sample size is king.”
With the case for sample size well established for common diseases, Birney focused this part of his discussion on the value of large data sets for rare disease. Taken together, rare disease is relatively common, affecting eight percent of the human population. Every time a gene involved in a rare phenotype is identified, it becomes an amazing point of leverage into human biology, he said. “It may have been discovered with a small set of people, but it immediately tells us about other things, often common diseases or common phenotypes.” But to do this well, he said, we need to screen millions upon millions of people in order to find a match between just two individuals with precisely the same rare allele with the same rare phenotype.
Sample size is also critical for effective cancer studies, since each somatic tumor is so unique. “We need gob-smacking numbers of people to understand the relationship between somatic, germline changes, and environment,” he said. “The calculations go above 10 million.”
More than enabling large sample sizes, though, a federation of data sets would also allow us to leverage things like genetic drift, human migration, and environmental factors. He also noted that infectious disease research, though a complex area with an existing dynamic scientific community, would also benefit from this effort. Thus, Birney argued that federation is critical for understanding a broad range of human diseases.
As a result, Birney said, “the Global Alliance is not only a good thing, but a necessary thing for us to make progress over the next decade. This is the only way I can imagine this working: a federated system that repurposes clinical medical data for secondary research.”
Over the last year, Birney said, GA4GH has created a forum not previously present, which ranges from low-level technical issues to big-picture clinical issues. In particular he noted that the effort to establish a reference genome graph shows how these disparate communities are working together to deliver a critical tool for the community. He noted that the success of this forum, whose kin often fail, goes to GA4GH’s persistent focus on a large but manageable scope. “My advice is to keep it in this manageable zone. It’s easy to try to bite off all of the problems in electronic health records, or all of the problems in genomics, but if we do that we become too diffuse,” he said, noting that the GA4GH could stand to expand its definition of the term “Genomics” to include RNA, proteins, metabolites, and other “omics” areas beyond the genome.
But, he said, GA4GH is still growing and still has places to go. He encouraged a push for implementation of technical products as well as long-term engagement with the clinical community. “We have to be in it for the long game,” he said. “Urgency is useful for motivation, but research and health care have been working for a long time. It is more important to achieve change in five, ten years time than pushing out an API two months earlier than deadline.”
Birney also reiterated the concern for a more global GA4GH, with the current make up too biased towards Anglophone countries. “We need to acknowledge that diversity is good and stretch out of our comfort zone to embrace that diversity,” he said. In particular, he encouraged members to look more broadly across Europe to places like Estonia, Scotland, and Denmark, which are taking unique approaches to ethical data access and regulation, as well as to the entire planet, where there is a huge diversity of talent, diseases, and thought processes.
Finally, he also discussed the various ways his organization is engaging with GA4GH, which include a joint project with the Center for Genomic Regulation in Barcelona to establish a Beacon on top of the European Genome-Phenome Archive, the development of reference and annotation APIs for Ensembl variant data, and an effort to harmonize genomic data file formats.
John Burn (Newcastle University) provided an update on the BRCA Challenge, which held its first meeting directly after the GA4GH plenary with sponsorship from AstraZeneca and hosted by UNESCO at its Headquarters in Paris, France. Next generation sequencing is revealing vast numbers of variants, Burn said, demanding an ever more pressing need to collaborate. The BRCA Challenge is an attempt to enable collaboration in the area of variant interpretation for the genes BRCA1 and BRCA2. In 2003, the Human Variome Project and the International Society for Gastrointestinal Hereditary Tumors (InSiGHT) launched a Variant Interpretation Committee (VIC) for mismatch repair genes, which brought together everyone working individually in that space. The group established a database, which now receives thousands of hits each month from the clinical and molecular genetics communities alike. At the first plenary meeting of the GA4GH, it was proposed to do the same thing across the whole genome, and to start with BRCA1 and BRCA2. Although not perfect genes from a bioinformatics perspective (they are big with lots of variation and noise), they are good because they are well known and already well studied—meaning there is a significant amount of data to use as a test case for integration.
The BRCA research landscape consists of many databases with unique variants, including NCBI, ClinVar, LOVD, and the French Universal Mutation Database. Federation is needed to cull all of the definitive work around the world to present an authoritative network of databases with a reliable curation system for variant annotation.
Data security is key, Burn said, as it will enable critical buy-in from the public and from patient advocacy groups. For instance, Britain has a large amount of data in the National Health System which could be beneficial to this effort, but it isn’t yet being shared because patients are nervous thanks to near-daily data breaches. An Ethical, Regulatory, and Advocacy (ERA) group akin to the REWG of the GA4GH is dedicated to this topic and held a special session at the Paris meeting. In addition to the ERA group, three new subcommittees recently formed to organize and guide activities of the BRCA Challenge. These include the Evidence Gathering Group, the Classified Variant Collection group, and the Interpretation Group.
Burn identified three primary needs of the project, which are being actively worked on by the subcommittees. First, legacy data in hundreds of diagnostic laboratories with up to 20 years of experience issuing reports to families with a history of breast/ovarian cancer must be captured and retained. Second, an API must be developed for identifying BRCA1 and BRCA2 variants from whole genomes and exomes. And third, data from the general population must be collected. Until 2010, Burn said, only those with a family history of breast or ovarian cancer were genotyped. According to research from Nazneen Rahman, one in every 250 people carries a pathogenic BRCA variant. “Once people are aware of this, they’ll want to be tested,” Burn said.
He went on to discuss the story of a Newcastle family that was given conflicting information about the pathogenicity of its variant, which Burn replicated by pretending to be the daughter of the sequenced carrier. He received four conflicting responses from regional diagnostic service laboratories that used a standard approach but interpreted the slightly confusing literature differently. They collectively spent eight hours incorrectly answering his question, Burn said. “We don’t have this kind of time or manpower. We need a reliable reference database,” he said.
A Q&A session addressed engagement with patient groups, engagement with Myriad, and the project’s five-year horizon. “What I want is a database on my phone next year, so I can go to an app and say which pathogenic variants are those,” he said. “If we can get to that point with these two genes, then the next stop is 20,000 genes.”
The BRCA Challenge meeting in Paris led to the development of time plans for such an app as well as for a web portal.
Heidi Rehm (Partners Healthcare) provided an update on the Matchmaker Exchange (MME), which also held an internal planning meeting and a cross-cutting meeting with the BeaconProject on June 11th. MME is a federated network of genomic databases whose goal is to identify causal genes for rare disease by matching phenotypes and genotypes.
While a single case is never enough to implicate a gene as causative, a second case is sometimes all that is needed to do so for a rare disease, Rehm said. At the American Society of Human Genetics meeting in 2013, a group of rare disease researchers and clinicians set out to link their respective databases, which were each currently serving as smaller, self-contained matchmakers, to help identify second cases and thus be collectively more productive for solving rare diseases.
The task of linking these databases presented a significant technical challenge, since each has distinct data schemas and content types. Some may collect variant call files without identifying candidate genes while others focus entirely on the gene. Still others may incorporate data from model organisms or human phenotype ontology terms. Furthermore, even if there is a match between cases based on genes, the phenotypic data may not be the same.
Two initial matchmaker databases—PhenomeCentral and GeneMatcher—developed an API that sifts through these differences to identify commonalities between cases. They then worked with the GA4GH Data Working Group (DWG) to align the API with GA4GH standards, and with the Regulatory and Ethics Working Group (REWG) to create a consent policy, a user agreement to enable data donation and functional access, and a set of requirements for becoming a Matchmaker service. They are currently working on security requirements for each of the databases and the matching algorithms are continually under development. A third database—Decipher—has since launched the API internally and a number of other matchmakers are actively working to do the same. The team is now working to link the Matchmakers together so that a query to one will also have the potential to return matches from the others.
There are two ways to participate in the MME: Databases can link to the Exchange by implementing the API, or a clinician with a single case can enter phenotype and/or genotype information into one of the existing Matchmakers, which will return matches from its own database as well as those to which it is linked through the Exchange. Rehm and others are working to post a decision guide on matchmakerexchange.org to help clinicians identify the most suitable database into which they should enter their case. For instance, Rehm said, if you have just a gene, you might want to enter into GeneMatcher, but if you have an entire VCF file, GEM.app might be the better choice.
In a pilot phase, the team implemented test cases into PhenomeCentral and GeneMatcher and successfully matched all those for which the case was present in both databases, demonstrating that the system works as intended. They then looked at 45 unsolved cases and identified 10 new matches. Two of these are thought to be potentially significant, two are still under consideration, and six were determined false positives (the genes but not the phenotypes were the same, emphasizing the need to optimize the matching algorithms).
A special issue of Human Mutation, guest edited by Rehm, Ada Hamosh (Johns Hopkins Medicine), and Kym Boycott (Children’s Hospital of Eastern Ontario), summarizes the work to date in 16 papers. These include an overview of the whole project, one paper on the API, two on the mathematics of the matching algorithms, several on the individual matchmaker systems, and three that document cases solved through matchmaking. The issue will be released in October 2015, in conjunction with the next ASHG meeting.
Going forward, the team will work to enable hypothesis-free matching, since clinicians often may have sequencing data but still no candidate causal genes. The team wants to aggregate the datasets to look for commonalities that might not be obvious with only an individual case in hand. Additionally, they hope to enable broader sharing of at least some of the data. For instance, Decipher has already proportioned 15,000 cases to be freely searched on. They are also working with ClinGen and the Genetic Alliance to incorporate patient initiated matchmaking, which will engender a unique set of technical as well as ethical complexities. While still in the planning stages, it is expected that a group of physicians will initially receive results from any patient queries to weed out false positives and to serve as Matchmaker stewards.
In a Q&A session, Bartha Knoppers of the REWG noted that MME uses a tiered access approach, in which consent is only required when variant data and other patient details are entered. If only the gene name and structured phenotype terms are entered into the system, the MME does not consider that identifiable, but rather the “practice of medicine” as it is carried out today.
Another audience member suggested that level two access be built into future standard patient consent forms. Rehm agreed, noting that she and others are working with the NIH and the REWG to develop a single page clinical consent form for patients to easily share their data to advance human health. The only way to capture the most data is to advance clinical data sharing, Rehm said, since the clinic, more than research, represents the majority of cases.
David Haussler (University of California at Santa Cruz) provided an update on the Beacon Project and other activities of the Data Working Group (DWG), which consists of more than 500 people across for- and nonprofit entities who are all collaborating to build a working interface and a common language for genomics data sharing.
The group also held internal planning meetings and crosscutting meetings with the other working groups during the three-day event.
The vital information we need to decode our health is held in silos around the globe, Haussler said, but we can change that. A world of technology awaits us if we can break through some of the social and technical issues. The DWG is developing an “Internet of genetic information that will make accessing genetic data as easy as finding a restaurant on your iPhone,” Haussler said.
The general Internet consists of agreed upon “names” for the objects we want to access (urls), protocols for how to access them (http), and a universal understanding for how to semantically interpret content (html). “For genomics, we just need to specialize this for the kind of information we are transmitting,” Haussler explained. The DWG has developed a means for naming objects, which doesn’t reveal anything about the object; methods for requesting data, which are used in all GA4GH tools and applications; and a schema for each type of data that denotes in precise mathematical terms how to interpret the content. Now we can reliably build apps that depend on that interpretation, Haussler said. Two such apps are the Beacon Project and the Reference Human Variation Map, and Haussler provided specific updates on each.
The Beacon Project, led by Mark Fiume (DNAStack), pulls together the simplest of all genetic queries and makes it a ubiquitous feature available on the Internet. First identified by Jim Ostell (National Institutes of Health) at the 2014 GA4GH Plenary meeting in London, Beacons ask trusted data stewards “do you have any genomes in your database with a particular variant at a particular position.” The queried database responds with either “yes” or “no.” Beacons present no technical barriers, but there is a social barrier to exchanging even the simplest, most atomic unit of genetic information, Haussler said. The advantages of the Beacon approach are an open API, which enables interoperability between systems; a federated model, which negates the need for a centralized database and instead relies on trusted data stewards; and the technically simple, sufficiently primitive query, which mitigates (but does not completely eliminate) risk of identification. To date, fifteen organizations have lit 155 Beacons across 252 datasets. “This is a very dramatic uptake of a new technology that’s still in its early defining stages,” Haussler said.
Beacon currently has two levels of access: open and controlled. Open is intended for the anonymous Internet user looking for any record of a variant anywhere in a database. Controlled is for those interested in a whole genome, which contains identifiable, private information and thus requires a full legal contract. A newly proposed access level is registered access for those who want to dig deeper than allowed by open access but who don’t need a whole genome. It would depend on a set of agreements and would require the user to provide credentials. The DWG sees this as an important intermediate level and is working with the Regulatory and Ethics Working Group to develop it.
The Reference Human Variation Map, led by Gil McVean (University of Oxford) and Benedict Paten (University of California at Santa Cruz), is a comprehensive and unbiased representation of the human genome. The current reference genome is not representative of all human variation. “There is a huge amount of human diversity, and by leaving it out of the standard reference we have crippled ourselves in many ways,” he said. A graph representation of all the variation in the human population gets around “what is essentially a giant Tower of Babel problem,” he said. Various efforts to codify genetic variation and link it to phenotype each use different schemes to represent the same thing and none is definitively comprehensive. The Reference Variation Task Team is building a graph reference that will be a comprehensive “Rosetta Stone” for human genetic variation. It does not replace existing references but translates them into one comprehensive archival representation, Haussler explained.
A single dominant line on the graph denotes the current human reference genome while surrounding lines represent different individuals’ genomes. “Think of it like a theme in music with beautiful variations,” Haussler said. “There may be different themes throughout a region.” At its core, the graph consists of individual bases of DNA. Whereas the reference genome may have a sequence of ACGGCC at a certain locus, a common variant among the human population may turn that into GAGGCC. An alternate path on the graph represents this alternative variant. Within the graph model, every base has a permanent identifier that won’t change when we learn more about a locus. “How we identify a genetic variant should be durable over decades,” Haussler said. “And it will be in this new structure.”
These two examples offered a flavor of the many projects ongoing within the DWG, whose ultimate goal is to develop a clean, secure, Internet-based system of accessing and exchanging health information that is available to the doctor, the researcher, and the consumer alike. If we can achieve this, Haussler said, then the whole Internet will be a health learning system and no data will be left on the table.
A Q&A session focused on issues of using crowd-sourced rankings to qualify annotations to the current reference genome; the importance of healthy cohorts; the fact that most data currently left on the table is clinical data; and pseudonymizing clinical data in order to capture its richness. Finally, the distinction between Matchmaker Exchange (MME) and Beacon was addressed. The two are conceptually different, Haussler said. MME matches individuals across a multidimensional query of genetic and phenotypic attributes whereas Beacon queries only whether a database contains a genome with a specific variant. “You could think about it as a Google search,” Haussler said, where Beacon is the initial high-level search and MME allows you to drill down further like clicking on a web page. Heidi Rehm (Partners Healthcare) noted that in most cases matching patients via MME requires different variants on the same gene whereas Beacon looks at the same variant. Beacon and MME are complementary systems, she said, and individual Matchmakers are working to launch their own Beacons.
(University of Edinburgh) and Susan Wallace (University of Leicester) presented two policies developed by the Regulatory and Ethics Working Group (REWG) that are meant to guide practical implementation of the Framework for Responsible Sharing ofGenomic and Health-Related Data, which was designed to serve as a high level starting point for deliberation and action. The policies aim to facilitate and operationalize the Framework’s various principles.
The Privacy and Security Policy demonstrates how an entity or individual involved in providing, storing, accessing, or managing data can and ought to promote privacy while also promoting science and sharing. It serves as a framework for mutual recognition and trust, providing immediate and common ground for interaction.
The policy can also help entities assess their readiness to share data based on GA4GH objectives and aspirations and identify unmet needs which GA4GH can help them address, though it has no authority to enforce the policy’s uptake. Privacy and security have different best practices, Laurie said, and while the fields overlap they are distinct. Privacy is a fundamental human value, enshrined in laws as a basis of rights and responsibilities. Security is a set of practical measures to manage risk. Users must determine which of the policy’s measures are most relevant to their pursuits and identify clear lines of responsibility and accountability in the implementation of the various elements of the policy.
The Consent Policy provides guidance around whether existing consents allow forward data sharing and how to proceed if changes need to be made. It is intended to help in the design of prospective consents and does not override existing consent gained from a data donor. It is specifically geared toward international data sharing and offers practical measures based on a series of best practices that map to the 10 core elements of the Framework. The consent policy is based on five basic principles: That consent is an open, communicative, and continuing process; that there is an intention to share data across institutions, jurisdictions, and national borders with appropriate approvals; that plans for data sharing should be transparent, understandable, and accessible; that data donors have a right to withdraw participation or not participate at all with the understanding that it may not be possible to retrieve and/or destroy data once shared; and that data users and producers will abide by applicable regulations and ethical norms when seeking and conducting international data sharing.
A Q&A session addressed issues of encouraging international data sharing through robust consent policies, the dynamic state of privacy regulation in the European Union, and the distinction between data sharing and data discovery.
John Mattison (Kaiser Permanente) providedan update on activities of the eHealth Task Team, first inviting Gil Alterovitz (Harvard University) to present the eHealth Catalogue ofActivities, an online listing of international genomic and clinical data sharing initiatives. A visual representation of the catalogue reveals a geographic distribution of activities that aligns well with GA4GH membership. Using co authorship between these activities as a proxy for collaboration, the eHealth Task Team identified which organizations are working together and which are isolated and found a direct correlation between collaboration and shared governance. Increasing leadership among shared initiatives, Alterovitz proposed, may help motivate collaboration across currently isolated groups. The eHealth Task Team will continue its work on the Catalogue, expanding its functionality and publicizing it to the broader community, with the hope that it will foster increased collaboration by providing a foundation for dialogue between institutions with shared missions and activities.
Mattison discussed the unchecked growth of semantic representations, data models, ontologies, and APIs, and how to manage them for global “omics” collaborations. To constrain this growth, a byproduct of many siloed activities around the globe, Mattison proposed several recommendations, including the use of federated data enclaves with researcher credentialing protocols, “lighter” data representations, micro-consent and micro-credit to illuminate data hoarding, and data concierges to guide users of an organization’s shared dataset.
Additionally, he proposed a matrix analysis to guide vendor adoption and market penetration of the most valuable ontologies and semantic representations. Migration of data across different institutions and associated semantic environments results in unnecessary semantic degradation by virtue of multiple transformations, Mattison said. But as data sets become more federated, the community will naturally arrive at that smaller, constrained set of representations, which will also reduce semantic degradation. Finally, he posited that “ontology” is a one-word oxymoron, since there is no single right way of organizing all of the information in a given reference model.
A Q&A session addressed the best means for capturing data from clinical care systems, be it patient or vendor directed (Mattison said both are useful) as well as how to drive vendor uptake and market penetration of GA4GH standards, which Mattison said should be done through vendors’ clients.
Mark Lawler (Queens University Belfast) provided an update on activities of the Clinical Cancer Genomes Task Team, which aims to provide added value to and harmonize with ongoing projects and outputs in the clinical cancer genome space. The Task Team has submitted an opinion paper for publication, outlining the global cancer genomics landscape and highlighting GA4GH’s activities and potential in this precision medicine domain. The paper argues that Big Data genomics will not efficiently advance until the international community overcomes a number of barriers, including shortsightedness, lack of interoperability, and reluctance to share and it outlines how the GA4GH vision can deliver for scientists, health care professionals, and, most importantly, patients. The Task Team is also deploying a survey to GA4GH members to identify ongoing efforts, highlight examples of best practice, and enable collaboration between data sharing initiatives in clinical cancer sequencing. Together, the survey and the opinion paper aim to delineate the importance and benefits of data sharing, discuss key challenges of merged data usability, identify best practices, develop solutions, and highlight the importance of implementation. The Clinical Cancer Genomes Task Team is also developing an Actionable Cancer Genome Initiative.
GA4GH has the opportunity to foster, nurture, and drive a global actionable cancer initiative, Lawler said, and he announced intentions to establish a cross-cutting cancer driver project, which will be a joint activity between the GA4GH Clinical and Data Working Groups. This joint activity will create an authoritative approach that defines clinical relevance of the somatic cancer genome. The group will also seek to establish the validity of actionable cancer panels, define optimal standards for technical practices, develop curation and annotation approaches to somatic variant “Big Data,” facilitate regulatory and reimbursement pathways, and establish flexible, clear principles to justify variant test selection, delivering clinical cancer diagnostic, prognostic or predictive tools that are useable, billable, payable, and sustainable.
A lively Q&A session addressed the best first steps in an area as complex as somatic cancer, overcoming the challenge of working with data that is currently fragmented, aligning with regulators to drive uptake, and leveraging crowdsourcing as a tool for integrating currently siloed data.
Facilitated by Mark Guyer (US Precision Medicine Initiative), the participants in this session were asked to introduce their national initiatives and answer three questions: (1) how are you leveraging or planning to leverage GA4GH in your national initiative? (2) how do you plan to share data or support data sharing? and (3) have you identified major barriers that GA4GH is currently not trying to address and/or are their challenges that are unique to your jurisdiction that GA4GH should be aware of?
Morris Swertz introduced Genome of the Netherlands, a project of the Dutch arm of the federated European Biobanking and Biomolecular Research Infrastructure (BBMRI), which seeks to harmonize and enrich existing biobanks around Europe in order to integrate and provide easy access to data. BBMRI-NL consists of 900,000 samples across 200 Dutch biobanks, representing about five percent of the total Dutch population. Four of the 12 broad based projects were specifically relevant to genomics, including Genome of the Netherlands (GoNL), which was the first population-specific whole genome sequencing study in the world.
GoNL led to the discovery of many new variants, knowledge which is being put to immediate clinical use. Two follow-up studies, Biobank Integrative OMICS Studies (BIOS) and the Society of Clinical Genetic Diagnostic Laboratories (VKGL), sought to integrate additional “omics” data into the GoNL database and to allow clinicians to readily query existing knowledge on a particular variant, respectively. The team is now focusing on ways to improve dissemination of the information captured in these projects, with a particular focus on technological challenges. They are eager to synergize with GA4GH in developing and implementing new APIs and connecting with those that are already emerging.
Tim Hubbard introduced Genome England and the 100,000 Genomes Project, a project of the National Health Service focused on treating rare disease and cancer. In addition to generating new clinically beneficial treatments through whole genome sequencing and enabling future research, it is also meant to stimulate activity in the UK economy through genomics related spinoffs. Eleven Genome Medical Centers across 75 hospitals have been established to collect consents, DNA samples, and phenotypic data when a patient presents with a disease that cannot be diagnosed by an existing test. Illumina performs sequencing on the sample and all data are fed into a centralized Data Centre for analysis by Commercial Interpretation Services, which deploy their own algorithms within the Data Centre as virtual machines. A clinical report feeds back to the NHS for verification. Genome England is a member of GA4GH and expects to implement its APIs within its infrastructure as they become broadly adopted as standards. Since the data are not consented to be redistributed outside of the Data Centre, the project will likely interface with GA4GH through the Beacon Project as well as Matchmaker Exchange at its lowest level of consent. Hubbard noted that GA4GH could provide more guidance on implementing its tools and approaches within a health system, pointing to the Global Genomic Medicine Collaborative (G2MC) as another group focused on this issue
Paul Lasko introduced the CARE for RARE (formerly the Canadian FORGE Project) as well as the International Rare Disease Research Consortium (IRDiRC). CARE for RARE aims to provide diagnoses and new therapeutics for pediatric rare disease and to work with patient organizations and other stakeholders to bring genomics into Canadian healthcare. It has led to the identification of causative mutations for 146 rare diseases, half of which were not previously linked to any rare disease, as well as diagnoses for 500 patients. The CARE for RARE database is called PhenomeCentral and is an integral part of the Matchmaker Exchange. While the project leads to clinical diagnoses for patients, it is a research project and thus data are consented for international sharing. It is fully supportive of a federated model of data sharing.
IRDiRC was launched in 2011 to facilitate international collaboration and data sharing among rare disease researchers. It has two goals: to catalyze the development of diagnostics for most rare diseases and to catalog 200 new therapies by 2020. To date, it has linked more than 3,000 genes to rare disease and identified 144 new rare disease therapies. Matchmaker Exchange started within IRDiRC, but took on a higher level of activity with the help of GA4GH. IRDiRC is also involved in GA4GH’s Machine Readable Consent Task Team and believes continuing collaborations between the two groups will be beneficial. Toward that end, Canada launched a funding opportunity in 2014 called Sharing Big Data for Health Care Innovation – Advancing Objectives of the Global Alliance for Genomics and Health, aimed at specifically strengthening the Canadian contribution to GA4GH. An announcement is forthcoming on its outcome
Kathryn North presented two Australian initiatives to integrate genomics into everyday healthcare. First, the Melbourne Genomics Health Alliance (MGHA) is a collaboration of 10 research and healthcare organizations, which compared the clinical utility and cost effectiveness of whole exome sequencing (WES) compared to standard clinical practice as a first tier assay for germline and somatic conditions. MGHA members have developed a series of shared tools, including ethics and consent protocols, a clinical bioinformatics pipeline, and a clinical genomics data repository. In a pilot study, MGHA showed that WES resulted in a 54% diagnostic rate for childhood syndromes, compared to 20% for standard practice. The approach is now being expanded across the entire state of Victoria. Second, the Australian Genomic Health Alliance (AGHA) is a collaborative effort of 41 academic, diagnostic, and genetic services organizations across Australia, which has applied for funding from the National Health and Medical Research Council to develop a national approach to genomic medicine. The proposed program aims to develop a federated genomic data repository that will link phenotypic and genomic data with electronic health records, using the data sharing approaches promoted by the GA4GH. The group also plans to use the GA4GH regulatory and ethical standards for consenting patient data. The biggest challenge facing Australia in this effort is identifying ways to successfully translate the state-based approach into a national, federated system.
Iscia Lopes-Cendes introduced several activities in Latin America. The Latin American Collaborative Study on Congenital Malformations is a clinical and epidemiological investigation launched in 1967 across Latin American hospitals, which has recently begun incorporating genomic information into its database. The Study Group on Hereditary Tumors is working with two other international genomics initiatives: the Collaborative Group of the Americas on Inherited Colorectal Cancer and the International Society for Gastrointestinal Hereditary Tumors (InSiGHT). Two Brazilian projects that could benefit from collaboration with GA4GH are The National Institute of Population Medical Genetics (iNaGeMP), which was established in 2009 to study rare diseases via clinical, epidemiological, family history, and genomic data, and The Brazilian Epidemiological and Biobank Stroke Study, a two-year prospective population-based study of stroke across 5 cities using genomic and clinical imaging data. It expects to collect 2,400 patient samples and 5,000 control samples per year. Samples collected in a pilot study are now undergoing genomic analysis at the University of Campinas (UNICAMP) under Lopes-Cendes’ direction. Next, Lopes-Cendes introduced the School of Human and Medical Genetics, an educational initiative established in 2005 with the Latin American Network of Human Genetics, which has enrolled hundreds of students across every Latin America country as well as several Caribbean nations, with particularly strong representation from Brazil, Mexico, Argentina, and Colombia. Finally, Lopes-Cendes highlighted a collaborative project with GA4GH to address the fact that the Brazilian population is not well represented in international genomic databases. The country’s population has a complex genomic background for which other Latin American countries, including those with mixed populations such as Mexico, cannot be used as proxies. Lopes-Cendes and her team are producing a database of molecular profiles based on SNP arrays and whole exome sequencing. They are currently preparing an environment to publicly share the data, are integrating them into LOVD, and will soon light a Beacon on top of them.
Bin Tean Teh introduced various precision medicine activities in Singapore, which span basic science, translational research, and clinical applications in medicine. Three years ago, Singapore launched POLARIS, a pilot genomic medicine program to identify barriers to the clinical implementation of genomics. This proof-of-concept multi-institutional collaboration to translate Singapore research to improve health has established an infrastructure of CAP certified genomics laboratories; Ministry of Health-compliant software for analyzing genomic data and patient reporting, compatible with local EMR systems; community standards for regulatory and ethical applications of genomics; and a recently launched Precision Medicine Institute. Other recent precision medicine efforts include genetic services to test for the TGFB1 mutation for genetic eye disease and multi-gene assays for sudden cardiac death. Over the next decade, the country will build a medical database consisting of whole genome sequencing data from 5,000 healthy volunteers integrated with serum metabolites, immunophenotyping data, cardiac imaging data, and EMR data. Finally, a centralized facility for clinical genomics will perform rapid data analysis and visualization in concert with a clinical follow up mechanism. Singapore’s precision medicine leaders are eager to work with GA4GH to scale their efforts
Mark Guyer closed the session with an introduction to the US Precision Medicine Initiative (PMI) and the Big Data 2 Knowledge (BD2K) program. BD2K was established to advance basic and translational science by facilitating and enhancing the sharing of research-generated data. It resulted in the creation of an open digital ecosystem to accelerate biomedical research and its application to human health. The Precision Medicine Initiative is a proposed program with three main components: (1) a near-term focus on cancer, (2) a long-term aim to generalize to the full range of human disease, and (3) to advance the nation’s regulatory framework in order to implement precision medicine in practice. The second component will require the creation of a research cohort of at least 1 million participants that is representative of the American population. This will likely involve several existing cohorts. The NIH fully supports GA4GH data sharing goals and approaches and intends for the Precision Medicine Initiative to be consistent with its principles and objectives.
Dixie Baker (Martin, Blanck and Associates) Results facilitated the session, beginning with an overview of the challenges relating to Big Data. The overarching challenge, she said, is the lack of computational infrastructure needed to securely generate, maintain, transfer, analyze, and visualize large-scale data sets, and to integrate the various types of “omics” data with each other and with clinical data. Virtualizing computational resources, she said, may be the only approach for achieving the kind of scalability and elasticity needed for DNA sequencing and analysis. But virtualization raises new security and privacy challenges associated with the blurring of physical boundaries, “Big Data” technologies that render everything “discoverable,” and compliance with national laws and institutional policies. “The only thing we can be sure of is that the challenges will intensify and build over time,” Baker said. The session’s five speakers shared some of the solutions their organizations are taking to overcome those challenges.
Jun Wang (Beijing Genomics Institute) discussed BGI’s efforts to realize its Million Genomes Project, announced two years ago to build a strong national “multi-omics” database in three to five years. CompleteGenomics (a BGI company) currently has capacity to sequence 10,000 genomes per year using its newly released Revolocity platform, which has a current upper limit of 30,000 genomes per year but is being expanded to reach 300,000 and has an ultimate goal of 1 million genomes per year. Achieving that capacity will require significant investment, for which BGI is exploring unique business models. The company anticipates that if More’s law continues to apply, the cost of sequencing a genome will drop to $1 genome by 2019. BGI hopes to work together with GA4GH to develop an open research platform on top of the Revolocity system and aims to create a network of multi-omics data akin to the Internet. Wang said that cultural differences mean different issues of data privacy and ownership in China. As a pilot project, the company has collected a comprehensive set of multi-omics data on the Chinese wheat crop, foxtail, and is analyzing it in a controlled, machine learning environment with promising results.
Angel Pizarro (Amazon) presented Amazon Web Services (AWS), a global provider of on demand, pay-by-the-hour, commitment-free “cloud” computing infrastructure. The ability to easily share data and applications across institutional boundaries and the ability to publish preconfigured resources to the community at large are key to collaboration. Before the cloud, population-scale science was limited to institutions with the most computational (and financial) resources. With AWS, users provision only the size of compute needed for a given project. Data can be put into the cloud and access can be requested and granted on a temporary basis. AWS operates 11 regions worldwide; once data are persisted in a region, they remain within that region. This enables AWS to comply with jurisdictional laws restricting the physical location of data. As an example, the National Database for Autism Research (NDAR) is storing its phenotype data on the AWS cloud. Credentialed collaborators that want to use NDAR data can bootstrap it into their own AWS environment, within which they use their own analytical processes or leverage AWS’s open-source ecosystem of preconfigured genome analysis pipelines. For data security, AWS uses a shared responsibility model similar to that described in the GA4GH Security Technology Infrastructure. AWS uses a shared responsibility model for data and service security. It does not use customer data, but provides a suite of service offerings for the user to control all of its own risk mitigation strategies. At the end, a customer can have a third party audit their solution for compliance with industry standards.
Mathew Pletcher (Autism Speaks) presented MSSNG, an initiative to generate an open access database of 10,000 whole genomes and associated phenotypes from families with autism. The general goal of MSSNG is to accelerate understanding of the genetic underpinnings of autism and its specific goal is to introduce more categorical granularity into the diagnosis. As a patient advocacy group, Autism Speaks intends to provide a community portal to connect donor families to their data and, ultimately, to other families with a shared genetic subtype. MSSNG aims to interface with a diverse set of stakeholders, including academic researchers, large and midsize pharmaceutical and clinical centers, as well as non-traditional users such as entrepreneurs, the diagnostic industry, educators, parents, data donors, and non academic researchers. MSSNG has so far generated more than 3,000 whole genome sequences and is on track to reach its goal of 10,000 by the beginning of 2016. The data will be freely available to credentialed researchers in the Google cloud environment. A web portal will be launched in July 2015 to support simple queries while the Google platform allows command line access for more in depth investigations of the dataset. Autism Speaks is taking on all the costs associated with data hosting and analysis. It is working with the Public Population Project in Genomics and Society (P3G) to establish an access policy to reduce obstacles to data sharing while still protecting patient privacy and honoring consent. Current MSSNG data have not been consented for broad sharing, so Autism Speaks has begun a process of re-consent and going forward will use a universal consent, currently in development. A Q&A session touched on the expected number of data access requests, the breadth of associated clinical phenotype data, and issues of credentialing researchers in a tiered consent model.
Richard Gibbs (Baylor College of Medicine) provided an update on work performed on data from the CHARGE project, in a collaboration between BCM, DNANexus, and Amazon Web Services (AWS) to perform the largest ever cloud-based genomic analysis. The group has expanded its scope to also include sequencing data from Alzheimer’s patients from the Alzheimer’s Disease Sequencing Project. Since last year, BCM has also grown its clinical datasets, thanks to the opening of a routine diagnostic lab that brought in more than 6,000 new cases. The group has made its data available to researchers through a series of portals based on selective access. Gibbs said that much of the success of the collaboration can be attributed to the technical solutions developed by the GA4GH community and requested a continued focus on refining the model. He also requested better solutions to integrating clinical data into the research arena, pointing to Matchmaker Exchange as a good, but as yet not readily scalable, option. He also noted that redundancy between the Matchmaker Exchange and the Beacon project needs to be addressed. Gibbs identified individual privacy, and HIPAA compliance, as the CHARGE project’s biggest challenge. He noted that the collaborations would be well served by an easier, more straightforward approach to data access and consent. The consortium has been working on this in a cohort of pancreatic cancer patients, for whom data are now freely available for querying via the GA4GH API on the DNANexus website. A Q&A session addressed the favorability of federation versus a centralized hub of data, new analytical methods for interpreting the non-coding parts of the genome, protection of patient privacy, and the process of re-consenting patients after testing
Arcadi Navarro (Centre for Genomic Regulation) presented the European Genome Phenome Archive (EGA), a federated collection of 1,300 datasets across Europe, spanning 73 studies from 308 data providers, including International Cancer Genome Consortium (ICGC), IRDiRC, UK10K, and the Wellcome Trust Sanger Institute. EGA includes data that may identify individuals, so the 1.7 Perabytes of information it contains need to be kept secure, despite heterogeneous patient consents. Access to EGA data is controlled in accordance with Data Use Agreements. Furthermore, the data must be able to be redistributed across a variety of databases, each housed in different countries with varying regulatory contexts. The EGA, a project of the European Bioninformatics Institute (EMBL-EBI) and the Centre for Genomic Regulation (CRG), faces similar issues to those being addressed by GA4GH, including how to successfully federate. Luckily, EGA isn’t the first group to face this problem, Navarro said, and then he displayed a picture of Tony Stark in his Ironman suit. Taking a cue from the great, if fictional, inventor, the EGA team set out to create a series of modular parts that interact with one another in a standardized way. Each can work independently, but can also be wrapped into a standard interface, Navarro said. The external services are the same, but the internal organization of “EGA 2.0” is based on micro-services modules. GA4GH can help the group tackle federation, Navarro said, particularly in the areas of secure computing clouds and federated authentication systems. In the second half of his presentation, Navarro addressed the importance of robust metadata for effective redistribution. Sharing relies on good metadata, he said, but there is currently no standardized or proscriptive approach to collecting that information. An EGA task force has been established to address this issue, and will likely engage Beacon and Matchmaker Exchange to do so.
Ancillary meetings took place on June 9th, June 11th, and the evening of June 10th, which included several crosscutting sessions as well as internal meetings of each of the four Working Groups, Matchmaker Exchange, Beacon Project, and several of the Task Teams (see appendix for details). Martin Bobrow facilitated a closing session in the afternoon of June 9th, during which Working Group Chairs provided summaries of discussions thus far followed by a panel discussion with Paul Flicek and Dixie Baker (SWG Co-Chairs), Kathryn North (CWG Co-Chair), David Haussler (DWG Co-Chair), Kazuto Kato (REWG Co-Chair), and Ewan Birney (plenary keynote speaker).
Bobrow noted that the landscape for data sharing has changed considerably over the past year. This is due in large part, he said, to GA4GH members who are spreading the word that genomic and clinical data will not improve health unless we share and analyze them in groups rather than “in our own little siloes.”
A summary of the ensuing discussion follows.
David Altshuler led a closing session on June 10th to discuss action items that emerged during the first two days of meetings, inviting audience members to identify the primary challenges that the GA4GH is not currently taking on but should. Altshuler used the Road Map that emerged after the last Plenary to guide the discussion.
Download the Meeting Report to view full details on Crosscutting sessions and meetings from the Security Working Group meetings, Regulatory & Ethics Working Group, Clinical Working group meetings, Data Working Group, and Demonstration Projects.