PATH

Harmonizing Global Datasets

Integrating large-scale health datasets

Harmonizing Global Datasets

Integrating large-scale health datasets

Harmonizing Data Silos

A typical multiomic dataset contains thousands of measurements. However, current datasets are often isolated due to differences in naming, measurement methods, or even the types of data collected. This makes it difficult to combine datasets and analyze them together. Another challenge is that different datasets often measure different things, which makes it hard to transfer machine learning models between them.

Our goal is to break down these data silos by harmonizing large-scale human phenomic datasets. This means creating standards and tools to combine and analyze data from different sources.

We will do this in two steps:

  1. Develop standardized protocols for harmonizing and integrating data from diverse profiling technologies. This will ensure that data from different sources can be combined and analyzed together.

  2. Develop algorithms and pipelines to enhance the portability of machine learning models of deep phenotyping data. This will allow us to transfer machine learning models between different datasets, even if they measure different things.

By harmonizing these data silos, we can:

  • Gain a more complete understanding of human health and disease.  
  • Develop more accurate machine learning models that can be used across different datasets.
  • Accelerate research and improve healthcare outcomes.

Leadership

Noa Rappaport, PhD

Chief Data Officer, Phenome Health