Genomics England implements GA4GH API to provide secure access to genomic data for the NHS

2 Dec 2019

Genomics England has implemented the standard GA4GH API htsget to serve all of its genomic data from the 100,000 Genomes Program and the Genomic Medicine Service.

htsget

Genomics England has implemented the standard GA4GH API htsget to serve all of its genomic data from the 100,000 Genomes Program and the Genomic Medicine Service. The htsget standard has been developed by the Large Scale Genomics Work Stream of the Global Alliance for Genomics and Health (GA4GH). It is a genomic data retrieval specification that allows users to stream genomic data for selected subsections of the genome, so it is no longer necessary to download all the files in which the data resides. Genomics England is using the API to stream all of its VCF, BAM, and CRAM files, thereby providing direct access to the raw data for clinical care teams across the UK.

“As a streaming technology, htsget allows our systems to only move the data required for analysis,” said Augusto Rendon, Chief Bioinformatician at Genomics England. “Using a standard means that many downstream tools such as genome browsers or analysis pipelines can connect to the data without cumbersome intermediates.”

The htsget API is being consumed in several ways across Genomics England. Laboratory scientists can now more efficiently browse genomic data via software packages such as Genoverse and IGV.js. Morover, services are now built to call the htsget API directly. One such service is used for sample matching to confirm that the DNA held at the requesting lab is from the same individual as the genome held at Genomics England.

Because htsget offers out-of-the-box compatibility with samtools/htslib, it is very simple to integrate into other systems. Importantly, it helps to abstract services better by avoiding difficult dependencies with file systems. 

“Implementing htsget into our ecosystem has been a very positive experience. It was very straightforward to integrate it with pieces of software that were already using samtools/htslib,” said Antonio Rueda, head of the interpretation platform at Genomics England. “This demonstrates the importance of converging on standard and unified protocols. Overall, this means that the labs using these data to interpret our patients’ genomes can simplify their workflows, deploy a wider range of tools, and hopefully diagnose more patients.”  

In a hope that others can benefit from this standard, Genomics England has made the code open source. The implementation is available at https://gitlab.com/genomicsengland/htsget/gel-htsget

Related Products

Latest News

14 Nov 2024
GA4GH 13th Plenary
See more
GA4GH announces an open call for nominations for the GA4GH Inc. Board of Directors
12 Nov 2024
GA4GH Inc. opens call for new Board members to enhance global leadership in genomics
See more
12 Nov 2024
What do we mean by “more diverse” data?: GA4GH’s new product encourages a holistic approach to diversity in datasets
See more