refget

Employs a computer algorithm to unambiguously identify reference sequences for genomic analysis

The first step of any genomic analysis is mapping the new sequence data to a reference sequence — a list of three billion base pairs that have been generally accepted as “normal” for a given population or subgroup. However, standard conventions for naming and identifying reference sequences are lacking. Different organisations refer to the same sequence differently. Developed by the Large Scale Genomics (LSG) Work Stream, the refget API provides a framework to retrieve reference sequences by using an algorithm to derive a unique identifier. The identifier can then be used to verify the integrity of the reference sequence.

All sequencing-based genomic analysis uses a genomic reference sequence — a baseline of knowledge against which variations are observed. There are multiple human reference sequences of increasing accuracy. For example, two organisations may refer to the same sequence using different names, or reuse names to refer to different reference releases. Reliable, reproducible genomic analysis depends on clear provenance back to reference data. refget enables access to reference genomic sequences without ambiguity from different databases and servers using a checksum identifier based on the sequence content itself.

Jump to...

Benefits

  • Reliably access reference sequences for genomic studies
  • Create unambiguous identifiers for reference sequences

Target users

Researchers

Image summary: The refget API helps researchers derive reference genomic sequences with precision.
THEME
CATEGORY
TYPE
STATUS
Work Stream
LATEST VERSION
Product Leads
Staff Contact
Tools & Platforms

Community resources

Dive deeper into this product! All sequencing-based genomic analysis uses a genomic reference sequence — a baseline of knowledge against which variations are observed. There are multiple human reference sequences of increasing accuracy. For example, two organisations may refer to the same sequence using different names, or reuse names to refer to different reference releases. Reliable, reproducible genomic analysis depends on clear provenance back to reference data. refget enables access to reference genomic sequences without ambiguity from different databases and servers using a checksum identifier based on the sequence content itself.


Date

Title

Info

29 Feb 2024
Please review the document by Wednesday, 30 March 2024.
17 Jul 2023
15 Feb 2023
Please review and provide your feedback for CRAM v3.1 and refget v2.0 by 14 March 2023.

Date

Version

17 Jul 2023
27 Feb 2023
9 Mar 2020

Title

Related Driver Projects and Organisations

European Joint Programme on Rare Disease (EJP RD)
ELIXIR Beacon
European Genome-phenome Archive (EGA), EMBL's European Bioinformatics Institute (EBI), Centre for Genomic Regulation

Don't see your name? Get in touch:

  • Jeremy Adams
    DNAstack
  • Shakuntala Baichoo
    University of Mauritius
  • Timothe Cezard
    EMBL's European Bioinformatics Institute (EBI)
  • Robert Davies
    Wellcome Sanger Institute (WSI)
  • Sveinung Gundersen
    Centre for Bioinformatics, University of Oslo
  • Reece Hart
    MyOme
  • Muhammad Haseeb
    EMBL's European Bioinformatics Institute (EBI)
  • Oliver Hofmann
    University of Melbourne Centre for Cancer Research
  • Rasko Leinonen
    EMBL's European Bioinformatics Institute (EBI)
  • John Marshall
    University of Glasgow
  • Nathan Sheffield
    University of Virginia
  • Reggan Thomas
    EMBL's European Bioinformatics Institute (EBI)
  • Andy Yates
    EMBL's European Bioinformatics Institute (EBI)

News, events, and more

Catch up with all news and articles associated with refget.

Machine receives DNA, turns an orange gear, yields a straight orange line.
17 Jul 2023
refget v2.0 links the hidden dictionaries of DNA
See more
8 Jul 2021
GA4GH standards in a global learning health system
See more
5 Dec 2018
Using the GA4GH toolkit: refget API for retrieving reference sequences via checksum
See more