What do we mean by “more diverse” data?: GA4GH’s new product encourages a holistic approach to diversity in datasets

12 Nov 2024

The Diversity in Datasets policy framework encourages research teams to consider exactly what types of genetic diversity and what range of non-genetic factors are needed for their project. 

Image summary: This image is a vibrant illustration of diverse individuals, depicted in a stylized, minimalist manner. Each person has distinct features, hairstyles, and clothing, representing a variety of cultures, identities, and backgrounds.

By Connor Graham, GA4GH Senior Marketing and Digital Communications Specialist

The Global Alliance for Genomics and Health (GA4GH) is calling for a broader approach to defining and pursuing diversity in the data used in genomics research, urging scientists to go further than thinking in terms of “ancestral diversity”. The GA4GH Diversity in Datasets policy framework encourages research teams to consider exactly what types of genetic diversity and what range of non-genetic factors (e.g. socioeconomic status, education level, gender identity, sexual orientation, neurodiversity, cultural diversity) are needed for their project. 

Spearheaded by the GA4GH Regulatory & Ethics Work Stream, the creation of the Diversity in Datasets policy framework was motivated both by the numerous calls for more diverse data and the lack of detail guiding researchers to reflect on exactly what is meant by “more diverse” data. The framework highlights that diversity in data is always a means to an end, and is not necessarily about representative data. In essence, it is key for research studies to incorporate a more inclusive approach to diversity and think beyond traditional parameters. By prompting more reflection on what types of diversity are needed, and how this can be pursued, the policy guidance aims to make research more relevant, effective, and equitable 

Many of the calls for more diverse data have focused on diversity of genetic ancestry, typically couched in terms of continental categories. Studies that do not consider other forms of diversity, however, risk being incomplete, potentially leading to gaps in understanding and missed opportunities for meaningful insights. By expanding the scope, researchers can foster a more comprehensive understanding of health and disease.

Led by Anna Lewis, a Research Scientist at Brigham and Women’s Hospital and Harvard Medical School, the Diversity in Datasets policy framework helps researchers identify the types of diversity that are most relevant to their work by guiding them to consider the outcomes they aim to achieve. The framework offers a systematic approach to integrating various forms of diversity throughout the research process. 

“This is crucial not only for gathering robust data but also for ensuring that the research benefits a broader range of people in equitable and meaningful ways,” said Lewis.

The framework is structured around key questions that prompt researchers to evaluate their work at every stage of the data lifecycle:

  1. What are the goals of the project, and how can these goals be framed in a way that maximally aligns with ethical norms? For every research project—whether basic, translational, or clinical—teams should clarify the goals and align them with key ethical principles like public benefit, justice, fairness, and respect for individuals and communities. Ensuring goals support these norms is crucial to ethically advancing healthcare. Teams should also consider additional principles relevant to their specific project and the communities involved.
  2. Given these goals, what type(s) of diversity in data are the most important? Project goals should clarify the types of diversity in data that are most relevant, prompting teams to consider who needs representation and which data points about them are crucial. When the necessary diversity is unclear, teams should transparently acknowledge this and consult existing literature and communities for guidance.
  3. What are the implications for choices made throughout the data lifecycle?
    Researchers are encouraged to consider how decisions at each step of the data lifecycle — from data collection and analysis to interpretation and reporting — impact the overall goals. This reflection ensures that diversity considerations are embedded within the entire process, leading to more accurate and applicable findings.
  4. How can lawful, contextually appropriate benefit sharing be supported at all relevant stages of the data lifecycle?
    The framework highlights the importance of lawful and context-sensitive benefit-sharing with participants, local communities, and local researchers, ensuring that the benefits of research are distributed fairly.

By considering various dimensions of diversity, researchers can better design studies that not only reflect human diversity, but also deliver meaningful benefits across diverse groups.

“We recognise that true diversity in the data we use in genomic research means more than just ancestry,” said Lewis. “It requires a comprehensive understanding of all the ways people are different. Diversity in Datasets is about equipping researchers with the tools they need to incorporate these differences into their work effectively and ethically.”

GA4GH invites researchers to explore the Diversity in Datasets policy framework and to engage in practices that foster inclusive, equitable, and impactful research and discover how expanding genetic research to underrepresented populations can drive more equitable health outcomes. 

For more information on how to incorporate a holistic approach to diversity, you can access the policy here and read the team’s latest article in Nature Genetics.

Related Products

Latest News

14 Nov 2024
GA4GH 13th Plenary
See more
GA4GH announces an open call for nominations for the GA4GH Inc. Board of Directors
12 Nov 2024
GA4GH Inc. opens call for new Board members to enhance global leadership in genomics
See more
12 Nov 2024
What do we mean by “more diverse” data?: GA4GH’s new product encourages a holistic approach to diversity in datasets
See more