Phenopackets: standardising and exchanging patient phenotypic data

News

22 Oct 2019

Phenopackets: standardising and exchanging patient phenotypic data

22 Oct 2019

The GA4GH Steering Committee recently approved Phenopackets, a standard file format for sharing phenotypic information. The Phenopackets standard aims to facilitate communication between the research and clinical genomics communities by creating an ecosystem of interoperable tools and resources that can use phenotypic data with fewer barriers.

Image Credit: Stephanie Li, GA4GH

More than 60 million genomes are expected to be sequenced for healthcare purposes over the next five years. This mass of data has the potential to inform human health and medicine in unprecedented ways, but that promise will only be realized if the data can be shared across disciplines and effectively linked to clinical outcomes.

The majority of existing formats for describing genotype information do not include a means to share corresponding phenotypic information (e.g. observable characteristics, signs/symptoms of disease). While some genomic databases have defined their own formats for representing phenotypic information, the lack of uniformity amongst these organizations hinders communication and limits the ability to perform analyses across them.

A phenopacket file contains a set of mandatory and optional fields to share information about a patient or participant’s phenotype, such as clinical diagnosis, age of onset, results from lab tests, and disease severity. It is also able to link to a separate file containing a patient’s genetic sequence, if available. Phenopackets are expected to standardize phenotypic data exchange within the medical and scientific settings. This will allow phenotypic data to flow between clinics, databases, clinical labs, journals, and patient registries in ways currently only feasible for more quantifiable data, like sequence data.

“Phenotype data is, by its nature, complex due to the wide array of modalities used to capture trait information,” said EMBL-EBI Bioinformatician Terry Meehan, who has implemented Phenopackets within the International Mouse Phenotyping Consortium (IMPC). “This complexity leads to challenges in data interoperability as differing languages are used between biomedical databases to describe similar results—a serious bottleneck in translating research for clinicians.”

The standard is of significant relevance to the rare disease and cancer communities, in which clinical data—such as lab test results, physical attributes, or disease progression and severity—are often used to differentiate between conditions that share similar phenotypes.

“Phenopackets will greatly simplify representation and exchange of phenotypic information, opening the door for matching rare disease patients in federated query systems supported by GA4GH,” said Metadata Standards Coordinator at EMBL-EBI, Melanie Courtot, who is leading implementation of the Phenopackets standard within the BioSamples database.

Using Phenopackets, clinicians can search through genetic variants that produce similar phenotypes and determine which one best matches their patient. Overall, such matching supports better and faster diagnosis and treatment, and higher chances of remission. Phenopackets also benefit researchers by opening up opportunities to analyze more data and strengthen our understanding of human health and disease.

“Clinicians and researchers with varying degrees of genomics expertise will find the file format useful,” said Melissa Haendel, Principal Investigator for the Monarch Initiative and Lead of the GA4GH Clinical & Phenotypic Data Capture Work Stream. “Phenopackets provide different levels of complexity so that we can exchange both high-level clinical phenotype information as well as in-depth data.” For instance, the standard can be used to describe anything from abnormal fetal movement or decreased white blood cell count to eye color or height.

Most of the fields within the file are optional, giving clinicians and researchers freedom to report only the phenotypic information they choose. If specific lab tests are not administered or a patient’s whole genome is not sequenced, those data do not need to be included in a phenopacket that stores other related information. This flexibility will also allow for the omission of identifiable information, such as date of birth or name, to preserve patient privacy.

To read a phenopacket file, researchers and clinicians can utilize existing software, such as Phenotools (for validating Phenopackets) and Exomiser (for annotating variants).

Peter Robinson, a computational biologist and pediatric physician at the Jackson Laboratory, leads Phenopackets development. Robinson notes that the team hopes to soon release a guide for implementing phenopackets within electronic health records built on the HL7 FHIR framework (the leading standard for storing electronic health data) in order to drive uptake among the clinical community. The development team is also working with journals to require phenotype data to be submitted in the Phenopacket format, which will encourage research scientists to adopt this standard into practice.

“Phenopackets enable a massive network of genomic data sharing, not only within the research or clinical communities, but also between the two groups,” said Robinson. “Now researchers can use patient phenotype information to further their understanding of human biology, and clinicians can reap the benefits of research findings in healthcare.”

Related Work Streams

Clinical & Phenotypic Data Capture (Clin/Pheno) Work Stream

Latest News

Colorful toolbox surrounded by gear icons against a binary code background

27 Mar 2025

refget Sequence Collections is an approved GA4GH product

Colorful lego blocks set against a binary code background

27 Mar 2025

Variation Representation Specification (VRS) v2.0 is an approved GA4GH product

GA4GH welcomes new Chief Product Officer Sasha Siegel

6 Mar 2025

Sasha Siegel joins GA4GH as Chief Product Officer

See all news and events

About us

About us

Strategic Road Map

History

GA4GH Inc.

Leadership

Funders Forum

Equity, Diversity, and Inclusion (EDI) Advisory Group

Staff

Our community

Our community

Organisational Members

Driver Projects

Strategic Partners

Assigned Experts

Individual Contributors

What we do

What we do

Study Groups

Work Streams

GA4GH Implementation Forum

National Initiatives Forum

Communities of Interest

Technical Alignment Subcommittee (TASC)

Calendar

Our products

Our products

Product Development and Approval Process

Implementations

Get involved

Get involved

Join us

Open calls

Implement a product

Attend an event

Become a funder

Subscribe to the GA4GH newsletter

Contact us

News and events

News

Events

Announcements

Publications

Podcasts

Videos

Newsletters

See all

News