About us
Learn how GA4GH helps expand responsible genomic data use to benefit human health.
Learn how GA4GH helps expand responsible genomic data use to benefit human health.
Our Strategic Road Map defines strategies, standards, and policy frameworks to support responsible global use of genomic and related health data.
Discover how a meeting of 50 leaders in genomics and medicine led to an alliance uniting more than 5,000 individuals and organisations to benefit human health.
GA4GH Inc. is a not-for-profit organisation that supports the global GA4GH community.
The GA4GH Council, consisting of the Executive Committee, Strategic Leadership Committee, and Product Steering Committee, guides our collaborative, globe-spanning alliance.
The Funders Forum brings together organisations that offer both financial support and strategic guidance.
The EDI Advisory Group responds to issues raised in the GA4GH community, finding equitable, inclusive ways to build products that benefit diverse groups.
Distributed across a number of Host Institutions, our staff team supports the mission and operations of GA4GH.
Curious who we are? Meet the people and organisations across six continents who make up GA4GH.
More than 500 organisations connected to genomics — in healthcare, research, patient advocacy, industry, and beyond — have signed onto the mission and vision of GA4GH as Organisational Members.
These core Organisational Members are genomic data initiatives that have committed resources to guide GA4GH work and pilot our products.
This subset of Organisational Members whose networks or infrastructure align with GA4GH priorities has made a long-term commitment to engaging with our community.
Local and national organisations assign experts to spend at least 30% of their time building GA4GH products.
Anyone working in genomics and related fields is invited to participate in our inclusive community by creating and using new products.
Wondering what GA4GH does? Learn how we find and overcome challenges to expanding responsible genomic data use for the benefit of human health.
Study Groups define needs. Participants survey the landscape of the genomics and health community and determine whether GA4GH can help.
Work Streams create products. Community members join together to develop technical standards, policy frameworks, and policy tools that overcome hurdles to international genomic data use.
GIF solves problems. Organisations in the forum pilot GA4GH products in real-world situations. Along the way, they troubleshoot products, suggest updates, and flag additional needs.
NIF finds challenges and opportunities in genomics at a global scale. National programmes meet to share best practices, avoid incompatabilities, and help translate genomics into benefits for human health.
Communities of Interest find challenges and opportunities in areas such as rare disease, cancer, and infectious disease. Participants pinpoint real-world problems that would benefit from broad data use.
The Technical Alignment Subcommittee (TASC) supports harmonisation, interoperability, and technical alignment across GA4GH products.
Find out what’s happening with up to the minute meeting schedules for the GA4GH community.
See all our products — always free and open-source. Do you work on cloud genomics, data discovery, user access, data security or regulatory policy and ethics? Need to represent genomic, phenotypic, or clinical data? We’ve got a solution for you.
All GA4GH standards, frameworks, and tools follow the Product Development and Approval Process before being officially adopted.
Learn how other organisations have implemented GA4GH products to solve real-world problems.
Help us transform the future of genomic data use! See how GA4GH can benefit you — whether you’re using our products, writing our standards, subscribing to a newsletter, or more.
Help create new global standards and frameworks for responsible genomic data use.
Align your organisation with the GA4GH mission and vision.
Want to advance both your career and responsible genomic data sharing at the same time? See our open leadership opportunities.
Join our international team and help us advance genomic data use for the benefit of human health.
Share your thoughts on all GA4GH products currently open for public comment.
Solve real problems by aligning your organisation with the world’s genomics standards. We offer software dvelopers both customisable and out-of-the-box solutions to help you get started.
Learn more about upcoming GA4GH events. See reports and recordings from our past events.
Speak directly to the global genomics and health community while supporting GA4GH strategy.
Be the first to hear about the latest GA4GH products, upcoming meetings, new initiatives, and more.
Questions? We would love to hear from you.
Read news, stories, and insights from the forefront of genomic and clinical data use.
Attend an upcoming GA4GH event, or view meeting reports from past events.
See new projects, updates, and calls for support from the Work Streams.
Read academic papers coauthored by GA4GH contributors.
Listen to our podcast OmicsXchange, featuring discussions from leaders in the world of genomics, health, and data sharing.
Check out our videos, then subscribe to our YouTube channel for more content.
View the latest GA4GH updates, Genomics and Health News, Implementation Notes, GDPR Briefs, and more.
Discover all things GA4GH: explore our news, events, videos, podcasts, announcements, publications, and newsletters.
9 Apr 2019
On Friday April 5, GA4GH held the #CRAM4GH Twitter chat. Guest “panelists” and experts James Bonfield, Thomas Keane, and Ewan Birney helped answer questions on the CRAM file format for genomic data compression.
On Friday April 5th, GA4GH held a live Twitter chat on the CRAM file format for genomic data compression. Leveraging the #CRAM4GH hashtag, the discussion featured guest experts James Bonfield, Principal Software Developer at the Wellcome Sanger Institute and lead CRAM maintainer, Thomas Keane, Team Leader at EMBL-EBI and co-lead of the GA4GH Large Scale Genomics Work Stream, and Ewan Birney, Director of EMBL-EBI and GA4GH chair, who answered questions from the community about the file format.
Visit the #CRAM4GH conversation reel to explore the full Twitter Chat, or view the highlights below.
A file format that uses various algorithms to compress genomic data. By storing parts of a sequence that are different from a reference sequence, CRAM keeps files small and easily accessible.
To start the ball rolling… there has been confusion over the years as to what CRAM can and cannot do (even where it came from).
For more info, see “Cram dispelling the myths”:https://t.co/4vxB1VyYWQ #CRAM4GH
— James Bonfield (@BonfieldJames) April 5, 2019
As genome sequencing becomes more routine, storing data efficiently and sustainably is essential. CRAM has immediate savings opportunities for the wider community:
Genome sequencing is well on the way to be coming a routine clinical assay. BUT genomes are big, e.g. a single human genome in BAM is ~100GB. With CRAM this can be reduced by 50-60%, so that’s immediate $$ savings and enables faster transfer of human data. #CRAM4GH
— Thomas Keane (@drtkeane) April 5, 2019
…to more than the genomics community. It is a fundamental step for implementing genomics in public health
— Mauro Petrillo (@PetrilloMauro) April 7, 2019
While requiring careful attention to data access controls and permissions, clinical genomics collaborators also stand to benefit by converting to CRAM. In response to question from @GeneFiddler, (Hywel Williams of Cardiff University) on the feasibility of adding CRAM support to clinical diagnostic pipelines, all three panelists emphasized interoperability already present in the system:
Good question Hywel. If the pipeline is predominantly using samtools/picard/GATK or their libraries (htslib, htsjdk) then it should be quite painless. If you have parts that don’t fit well, then it’s perfectly fine to use BAM and convert to CRAM at the end.
— James Bonfield (@BonfieldJames) April 5, 2019
Hi Hywel, CRAM is already compatible with some of the most highly used NGS tools, e.g. Samtools/htslib/htsjdk/GATK. I guess you would need to audit your pipeline to find out which tools are/aren’t already compatible. #CRAM4GH
— Thomas Keane (@drtkeane) April 5, 2019
They are probably using BAM now via GATK and Samtools. It should be an easy replacement but in a clinical context you’d need to do it carefully. I would audit the pipeline for the access of BAMs, check by hand each access works with CRAM #CRAM4GH >>
— Ewan Birney (@ewanbirney) April 5, 2019
Like all GA4GH standards, CRAM is developed and maintained within an open forum, enabling greater collaboration and evolution of data analysis. Our panelists shared their reasons for supporting open standards and open software:
Many reasons. First it taps into the broad creativity of academic and commercial community. Second it enables science. Third it is transparent for one of the key data types we have as individuals #CRAM4GH
— Ewan Birney (@ewanbirney) April 5, 2019
With so many human genomes that will be generated across the world, we absolutely need open standards to ensure that the data can be analysed with the best algorithms. Open standards enable this to happen by ensuring interoperability. #CRAM4GH
— Thomas Keane (@drtkeane) April 5, 2019
I am a firm believer in freely available data for scientific research.
Data can only truly be free if the file I/O software is also free for all users, which in turn means the format itself has to be free of royalty-based patents. #CRAM4GH
— James Bonfield (@BonfieldJames) April 5, 2019
The current version of CRAM (V3.0) reduces disk space by 30-50% compared to BAM. Bonfield envisions even greater storage savings with future versions, as well as support for additional data types:
CRAM roadmap. 3.0 (now) -> 3.1 (archive mode; summer?) -> 4.0 (features; longer chromosomes, faster long-read support).
3.1 is expected 10-30% smaller than 3.0, depending on input data.
Eg see BAM v CRAM3 v CRAM3.1 for HiSeq2K and NovaSeq. (All lossless)#CRAM4GH pic.twitter.com/vZX4rRy8Eg
— James Bonfield (@BonfieldJames) April 5, 2019
Community members are invited to share ideas for CRAM V4 here:
James and the folks in the GA4GH file formats group have a public list of potential improvements coming down the pipe for CRAM v4. Again, all publicly documented and welcome contributions https://t.co/d9HBRLo0Oc #CRAM4GH
— Thomas Keane (@drtkeane) April 5, 2019
Learn more about CRAM at www.ga4gh.org/cram/.