Categorical Variation (CatVar)

Aims to develop a data model (and related tools) for categorical variants in genomic knowledge-bases.

When a patient has their DNA sequenced, the clinical test shows specific changes in their genes. But when researchers collect knowledge and make statements about genomic evidence, they typically look at categories of variation.

For example, take the phrase “TP53 R248 mutations.” A study on breast cancer might use that phrase to refer to point mutations at location TP53.* The phrase is therefore categorical: it aggregates all variants leading to changes at amino acid position R248 of protein TP53.

The rise of categorical variants presents two problems. First, to interpret whether an assayed variant leads to disease, you need to manually search and collate knowledge spread across multiple categorical variants — a slow, error-prone process. Second, complex relationships between categorical variants frustrate efforts to curate and maintain the knowledge-bases that are crucial for connecting genes and disease.

The Categorical Variation (CatVar) Study Group tackles these problems by exploring a formal, computable specification for categorical variants.

*Berns et al. 1998.

Jump to...

Benefits

  • Will make genomic knowledge associated with categorical variants computable and efficiently searchable
  • Promotes more efficient and consistent sharing, curation, and availability of genomic knowledge across GA4GH Work Streams and partner organisations
  • Ensures effective use of categorical variant knowledge — both now and as genomic knowledge grows in the future

Target users

Researchers, clinicians, clinical laboratories, data generators, data custodians, developers, and research institutes

A three panel comic describing the challenges of searching for relevant categorical variants and how Cat-VRS can help.
Image summary: A three panel comic describing the challenges of searching for relevant categorical variants and how Cat-VRS can help.

Community resources

Dive deeper into this product!

Categorical variants pose a challenge for data sharing, storage, curation, and search. A categorical variant is a set of properties that define a domain of observed variants sharing those characteristics. Some common categorical variants describe the effects of splicing behaviour in a gene (e.g. “MET exon 14 skipping mutations”), a shared protein consequence (e.g. “mutations causing an EGFR L858R substitution”), or the expression or activity of a gene product (e.g. “loss of PTEN”).

However, several factors complicate this simple picture. First, new categorical variants are continuously created in the course of genomics research. Second, a single assayed variant belongs to many categories of variation simultaneously. For example, NC_000007.13:g.140453136T>A simultaneously matches a nucleotide sequence variant, gene function variant, a BRAF gene variant, and a BRAF V600E variant. Third, categorical variants themselves have complex, often hierarchical relationships with one another. Finally, existing nomenclatures and knowledge-bases often disagree with each other (and internally) about how to assign variant categories.

The Study Group aims to address these challenges by exploring the creation of a formal model for categorical variants and an associated type-logic for parsing them out. This data model will be implemented in a computable JSON schema, and accompanied by a reference Python implementation for creating and validating categorical variation objects.


Date

Title

Info

27 Mar 2024
Now is a great time to get involved with these efforts.
16 Nov 2023
Join four new GA4GH groups to help shape guidelines for pandemic prep, schema consensus, sequencing metadata, and categorical variants

Title

Info

Repeat

Day

Time

Duration

CatVar meets biweekly, alternating between the first Wednesday of the month and the 3rd Tuesday of the Month.

Monthly
Wednesday
19:00 UTC
1 Hour

CatVar meets biweekly, alternating between the first Wednesday of the month and the 3rd Tuesday of the Month.

Monthly
Tuesday
12:00 UTC
1 Hour

Don't see your name? Get in touch:

  • Larry Babb
    Broad Institute of MIT and Harvard
  • Daniel Puthawala
    Nationwide Children’s Hospital
  • Alex Wagner
    Nationwide Children’s Hospital, Variant Interpretation for Cancer Consortium (VICC)

News, events, and more

Catch up with all news and articles associated with Categorical Variation (CatVar).

Four individuals are collaborating together
16 Nov 2023
Want to help shape guidelines for pandemic prep, schema consensus, sequencing metadata, and categorical variants? Join four new GA4GH groups!
See more