A harmonised resource for clinical interpretations of cancer mutations

26 Sep 2018

A new resource developed by the Variant Interpretation for Cancer Consortium (VICC), a GA4GH Driver Project, makes clinical interpretation of variants much more consistent by aggregating known information about mutations associated with non-hereditary cancer.

When a patient is diagnosed with or dealing with later stages of cancer, her clinicians need to understand her unique cancer variants to determine the best course of treatment.

A new resource developed by the Variant Interpretation for Cancer Consortium (VICC), a GA4GH Driver Project, makes this clinical exercise much more consistent by aggregating known information about mutations associated with non-hereditary cancer.

The VICC Meta-Knowledgebase aggregates cancer variant interpretations from six distinct resources that contain clinical information related to specific variants or “knowledgebases”: Cancer Genome Interpreter (CGI), Clinical Interpretations of Variants in Cancers (CIViC), Jackson Labs Clinical Knowledgebase (JAX-CKB), MolecularMatch, OncoKB, and the Precision Medicine Knowledgebase (PMKB).

“The new resource serves as a kind of translator so that all of these knowledgebases can be searched simultaneously and their results presented in a common language,” said Obi Griffith, assistant director of the McDonnell Genome Institute (MGI) at Washington University School of Medicine in St. Louis and one of three senior authors on a paper about the resource published on BioRxiv in July. Together with his twin brother Malachi, Obi leads the Griffith Lab which maintains the CIViC knowledgebase.

According to the Wellcome Sanger Institute’s Cancer Gene Census, more than 700 genes have been associated with cancer to date, and 90 percent of these contain mutations that are somatic (not passed down between generations). Unlike BRCA1 and BRCA2, two commonly known genes associated with familial risk of breast and ovarian cancer, somatic mutations are spontaneous, unrelated to one’s family history and specific to an individual’s cancer cells.

Hundreds of thousands of different mutations, or “variants”, in non-hereditary cancer genes have been identified and reported in databases like COSMIC, but only a small fraction of these variants are known to cause disease — the rest are dubbed “variants of uncertain significance,” or VUS, and being told you carry one of these mutations provides no insight into the causes and best treatment options for your disease.

“Distinguishing the VUSs from variants that are pathogenic  (disease-causing) or predict patient outcomes or treatment responses is critical to providing effective care to cancer patients and those at risk of developing disease,” said Malachi Griffith, MGI Assistant Director and another senior author on the paper. But the world of cancer variation classification is a fragmented one, and hundreds of siloed efforts have sprung up around the globe to collate information from clinicians and researchers about whether any particular variant is worthy of concern.

But until the work of the VICC, none of these knowledgebases spoke the same language.  Due to differences in the data structure, curation strategy, and primary literature used by each, curated knowledge is represented differently from one knowledgebase to the next. In practice, generating a consensus report across knowledgebases requires interpretation “polyglots” to assimilate the information from each resource into a cohesive summary. The training, skill, and effort required to perform this translation scales poorly as the number of knowledgebases continues to expand.

“This is a space that is still in the process of being defined,” said Alex Wagner, a post-doctoral fellow in the Griffith Lab and first author on the paper. “For years, each clinical site has been curating literature describing clinical interpretations of genetic variants and building their own knowledgebases to represent them, without clearly defined standards by which they should represent these data. Relevant standards and resources have only recently emerged, and are still rapidly developing.”

The VICC meta-knowledgebase represents an unprecedented framework for structuring and harmonizing clinical interpretations across the world’s knowledge of cancer-related genetic variation.

“We relied on established community resources, standards, and guidelines to transform the data from each knowledgebase into a consistent vocabulary,” said Wagner. “This involves connecting curated knowledgebases through custom-defined rules for translating data from each resource. As a result, we have consolidated interpretations into a single, harmonized open-access meta-knowledgebase that contains 12,856 harmonized interpretations supported by 4,354 distinct publications.”

“This work was a few years in the making, and is ongoing,” said Wagner, who noted that the six knowledge bases currently aggregated by the VICC meta-knowledgebase represent some of the world’s preeminent publicly accessible knowledge on clinical interpretations of non-hereditary cancer variants. “But there are also a multitude of knowledge bases from academic and clinical centers that are not publicized, but would likely benefit from adopting the harmonization framework we outline in our work.” The team is encouraging institutions that maintain such resources to participate in the VICC to increase global consensus of the clinical relevance of each variant and improve cancer care around the world.

 

Research published by members of the GA4GH community does not necessarily reflect an approval as a GA4GH standard, a formal process which consists of five stages (proposed, submitted for approval, under review, approved, retired) and includes peer, security, and ethics reviews. Once approved GA4GH specifications will be posted publicly with adequate documentation.

Latest News

14 Nov 2024
GA4GH 13th Plenary
See more
GA4GH announces an open call for nominations for the GA4GH Inc. Board of Directors
12 Nov 2024
GA4GH Inc. opens call for new Board members to enhance global leadership in genomics
See more
12 Nov 2024
What do we mean by “more diverse” data?: GA4GH’s new product encourages a holistic approach to diversity in datasets
See more