GDPR Brief: can genomic data be anonymised?

News

10 Oct 2018

GDPR Brief: can genomic data be anonymised?

10 Oct 2018

Anonymisation is the irreversible alteration of data so that its human subjects are no longer identifiable. Though this makes it incompatible with longitudinal follow-up, and is therefore generally discouraged in precision medicine, it can be an attractive option to comply with data protection law. Indeed, the GDPR does not regulate anonymised data at all, and insists on keeping data in an identifiable form for no longer than necessary for the purposes for which it is processed.

But researchers should never assume that genomic data are anonymous. This may surprise those familiar with US Institutional Review Boards, who regularly view rich genomic datasets as sufficiently de-identified so that their analysis does not qualify as human subjects research regulated by the US Common Rule.

The GDPR links the assessment of identifiability to available technology. This determination cannot ignore that genomic re-identification strategies can now:

determine pe ople’s nam es based solely their DNA and trace amounts of associated metadata; and
retrieve personal data from aggregat es of single-nucleotide polymorphisms (SNPs)across numerous individuals, even when using as few as 25 randomly selected loci.

Genomic datasets that have been coded allow re-identification, even when they may be considered de-identified according to the HIPAA Privacy Rule, can nonetheless only be considered pseudonymised at best under the GDPR. Recital 26 states that pseudonymised data remain personal data.

Yet it would be going too far to state that genetic or genomic data can never be anonymised. The mere observation, for example, that the prevalence of a BRCA mutation is roughly 0.25% of a national population is both “genetic” and “data”, will generally not fall within the GDPR’s notion of personal (i.e. identifiable) data.

To take a practical example, the International Cancer Genome Consortium determined that although it should largely treat the non-cancerous sequencing data it had collected as personal data, genetic variants specific to tumour cells were nonetheless anonymous, with rare exceptions. It freely distributes the anonymous variants to other researchers in accordance with the principle of open science.

Therefore, whether genomic data can be anonymised for the purposes of the GDPR has to be determined on a case-by-case basis, taking into account:

all the means of identification, direct or indirect, reasonably likely to be used by any person, and
objective factors, including the costs of and the amount of time required for identification, the available technology at the time of the processing, and technological developments.

Further Reading

EU Article 29 Working Party, Opinion 05/2014 on Anonymisation Techniques
EU Article 29 Working Party, Opinion 0 4/2007 on the Concept of Personal Data
UK Information Commissioner’s Office, Anonymisation Code of Practice

Relevant GDPR Provisions

Recital 26 – anonymous data are not subject to the GDPR
Article 5(1)(e) – duty to anonymize as soon as is practicable
Article 4(1) – definition of personal data
Recital 26 – identifiability criteria
Article 4(5) – definition of pseudonymisation
Recital 26 – pseudonymized data remains personal data
Article 4(13) & Recital 34 – definitions of genetic data

Mark Phillips is a lawyer with a background in computer science, and an academic associate at McGill University. He advises clients on and writes about various data protection issues.

See all previous briefs.

Please note that GDPR Briefs neither constitute nor should be relied upon as legal advice. Briefs represent a consensus position among Forum Members regarding the current understanding of the GDPR and its implications for genomic and health-related research. As such, they are no substitute for legal advice from a licensed practitioner in your jurisdiction.

Related Work Streams

Regulatory & Ethics Work Stream (REWS)

Latest News

Colorful toolbox surrounded by gear icons against a binary code background

27 Mar 2025

refget Sequence Collections is an approved GA4GH product

Colorful lego blocks set against a binary code background

27 Mar 2025

Variation Representation Specification (VRS) v2.0 is an approved GA4GH product

GA4GH welcomes new Chief Product Officer Sasha Siegel

6 Mar 2025

Sasha Siegel joins GA4GH as Chief Product Officer

See all news and events

About us

About us

Strategic Road Map

History

GA4GH Inc.

Leadership

Funders Forum

Equity, Diversity, and Inclusion (EDI) Advisory Group

Staff

Our community

Our community

Organisational Members

Driver Projects

Strategic Partners

Assigned Experts

Individual Contributors

What we do

What we do

Study Groups

Work Streams

GA4GH Implementation Forum

National Initiatives Forum

Communities of Interest

Technical Alignment Subcommittee (TASC)

Calendar

Our products

Our products

Product Development and Approval Process

Implementations

Get involved

Get involved

Join us

Open calls

Implement a product

Attend an event

Become a funder

Subscribe to the GA4GH newsletter

Contact us

News and events

News

Events

Announcements

Publications

Podcasts

Videos

Newsletters

See all

News