GDPR Brief: can genomic data be anonymised?

10 Oct 2018

Anonymisation is the irreversible alteration of data so that its human subjects are no longer identifiable. Though this makes it incompatible with longitudinal follow-up, and is therefore generally discouraged in precision medicine, it can be an attractive option to comply with data protection law. Indeed, the GDPR does not regulate anonymised data at all, and insists on keeping data in an identifiable form for no longer than necessary for the purposes for which it is processed.

Anonymisation is the irreversible alteration of data so that its human subjects are no longer identifiable. Though this makes it incompatible with longitudinal follow-up, and is therefore generally discouraged in precision medicine, it can be an attractive option to comply with data protection law. Indeed, the GDPR does not regulate anonymised data at all, and insists on keeping data in an identifiable form for no longer than necessary for the purposes for which it is processed.

But researchers should never assume that genomic data are anonymous. This may surprise those familiar with US Institutional Review Boards, who regularly view rich genomic datasets as sufficiently de-identified so that their analysis does not qualify as human subjects research regulated by the US Common Rule.

The GDPR links the assessment of identifiability to available technology. This determination cannot ignore that genomic re-identification strategies can now:

Genomic datasets that have been coded allow re-identification, even when they may be considered de-identified according to the HIPAA Privacy Rulecan nonetheless only be considered pseudonymised at best under the GDPR. Recital 26 states that pseudonymised data remain personal data.

Yet it would be going too far to state that genetic or genomic data can never be anonymised. The mere observation, for example, that the prevalence of a BRCA mutation is roughly 0.25% of a national population is both “genetic” and “data”, will generally not fall within the GDPR’s notion of personal (i.e. identifiable) data.

To take a practical example, the International Cancer Genome Consortium determined that although it should largely treat the non-cancerous sequencing data it had collected as personal data, genetic variants specific to tumour cells were nonetheless anonymous, with rare exceptions. It freely distributes the anonymous variants to other researchers in accordance with the principle of open science.

Therefore, whether genomic data can be anonymised for the purposes of the GDPR has to be determined on a case-by-case basis, taking into account:

  • all the means of identification, direct or indirect, reasonably likely to be used by any person, and
  • objective factors, including the costs of and the amount of time required for identification, the available technology at the time of the processing, and technological developments.
Further Reading

Relevant GDPR Provisions

Mark Phillips is a lawyer with a background in computer science, and an academic associate at McGill University. He advises clients on and writes about various data protection issues.

See all previous briefs.

Please note that GDPR Briefs neither constitute nor should be relied upon as legal advice. Briefs represent a consensus position among Forum Members regarding the current understanding of the GDPR and its implications for genomic and health-related research. As such, they are no substitute for legal advice from a licensed practitioner in your jurisdiction.

Latest News

14 Nov 2024
GA4GH 13th Plenary
See more
GA4GH announces an open call for nominations for the GA4GH Inc. Board of Directors
12 Nov 2024
GA4GH Inc. opens call for new Board members to enhance global leadership in genomics
See more
12 Nov 2024
What do we mean by “more diverse” data?: GA4GH’s new product encourages a holistic approach to diversity in datasets
See more