Now open for comment: GA4GH Data Use Ontology

17 Jan 2019

The GA4GH Data Use Ontology (DUO) allows users to semantically tag genomic datasets with usage restrictions, allowing them to become automatically discoverable based on a health, clinical, or biomedical researcher’s authorization level or intended use.

 

The GA4GH Data Use Ontology (DUO) allows users to semantically tag genomic datasets with usage restrictions, allowing them to become automatically discoverable based on a health, clinical, or biomedical researcher’s authorization level or intended use. DUO is based on the OBO Foundry principles and developed using the W3C Web Ontology Language. It is being used in production by the European Genome-phenome Archive (EGA) at EMBL-EBI/CRG as well as the Broad Institute for the Data Use Oversight System (DUOS).

Human subjects datasets often have restrictions such as “only available for cancer use” or “only available for the study of pediatric diseases,” deduced from the original biospecimen collection informed consent form, which must be respected when sharing and studying these datasets. Each institution uses unique language in their informed consent forms to describe secondary use restrictions and conditions. DUO is a standard universal system to categorize these conditions, with an aim to allow data access committees and researchers to interpret the conditions in a consistent, structured way.

DUO represents data use terms from three evolving efforts to standardize data use restrictions in the biomedical and genomics research domain:

  • NIH database of Genotype and Phenotype (dbGaP) data use categories. dbGaP is one of the largest public repositories of genomics data in the world
  • Consent Codes  – a global effort led by Stephanie OM Dyke (McGill University) and the GA4GH Regulatory and Ethics Work Stream to define ‘codes’ for specific categories of data use restrictions based on the datasets of the main public genome archives (NCBI dbGaP and EMBL-EBI/CRG EGA).
  • The Automated Data Access Matrix (ADA-M) – work led by Anthony Brookes and other GA4GH members of the ADA-M task team to define a matrix of data use categories that can be used to define data use restrictions and research purpose.

DUO is an evolving effort to provide digital ontological representation for all the data use categories defined by the efforts mentioned above. Its evolution is being led by GA4GH Driver Projects such as the EMBL-EBI/CRG EGA where it is currently used in production, the All of Us research program and the NIH Data Commons Pilot.

DUO has been submitted for product approval by the GA4GH Steering Committee as of January 15, 2019 and is open for public comment until February 15, 2019. Technical comments are invited via the GitHub issue tracker, general comments should be sent to the GA4GH Data Use mailing-list.

Related Products

Latest News

GA4GH welcomes new Chief Product Officer Sasha Siegel
6 Mar 2025
Sasha Siegel joins GA4GH as Chief Product Officer
See more
GA4GH Inc. welcomes new Board Members: Krystal Tsosie, David Glazer, Arcadi Navarro, and Patrick Tan
27 Feb 2025
GA4GH Inc. welcomes four new Board Members
See more
Clinician talks to patient about a potential treatment
25 Feb 2025
The GA4GH Clinical & Phenotypic Data Capture Work Stream standardises patient data capture to advance disease treatment
See more