Many national and international studies have a research focus on human health; these studies often examine specific risk behaviours, their determinants and related health risk indicators. Studies may be cross-sectional, longitudinal in nature or involve an intervention. Some data is made publicly available, other data may be shared with permission and some data is not accessible due to a number of valid reasons. Findings from these studies are often shared as original research publications or summarised in reviews of the extant literature. Often secondary data analysis of a single data-set is encouraged with a focus on new hypothesis/questions. However, an additional potential to exploit existing data does exist and involves the merging of data from two or more data-sets.
Integrative data analysis allows the pooling of data sets from multiple studies, followed by a harmonisation process (data alignment) which involves using definitional and statistical procedures to enhance the comparability of targeted variables. Harmonisation can occur at any stage (input or output) of a study lifecycle i.e. plan forward or look back. Examples of both prospective and retrospective harmonisation activities are ENDAPASI, EuroDISH, Maelstrom, ICAD, BioSHaRE-EU, POLARIS, DEDIPAC. The potential benefits of data harmonisation are: increased statistical power; expanded subgroup analysis and comparative research, exposure heterogeneity, generalisability, implementing standards for data inclusion, further exploitation of available data resources, basis for multi-centered collaborative research and increased cost-efficiency of research programmes (Gallacher, 2007; Doiron et al, 2013; Fortier et al., 2017). The PESS Dept and the Determinants of Diet and Physical Activity Knowledge Hub (DEDIPAC) have recently developed a compendium of 150 European data sets relevant to diet, physical activity and sedentary behaviour and their determinants. The compendium provides the project identifier, contact person details, website URL (if any), brief description of project, relevant publications, nations involved, sample size, gender, age, physical activity and sedentary behaviour and correlate measurement, indices of inequality/ethnic minorities and level of accessibility. This resource is now available at www.dedipac.eu.
Guidelines for effective retrospective data harmonisation have been published and should be followed (Fortier et al. 2017, doi: 10.1093/ije/dyw075). Challenges to effective retrospective data harmonisation may be timely availability of all data, lack of data accessibility, limited data on behaviour of interest, the comparability of the data and the data collection approaches used. To minimise such challenges in the future it is essential that researchers nationally and internationally begin to engage in the prospective harmonisation of data collection methodologies around a shared research agenda. Without this, further insight and/or resolution of challenging societal issues may not occur or may be placed on a much longer timeframe. In the context of prospective data harmonisation the FAIR principles have been developed (Wilkinson, 2016).The FAIR principles suggest that each data resource, associated metadata and complimentary files should be easy to find (‘Findable); they should be accessible using a standardised communication and approval protocol (‘Accessible’); they should be ‘Interoperable’ and thus use a consistent data format and taxonomy for knowledge representation and finally, they should be ‘Reusable’, i.e. ready to be used.
To begin to maximise the collective potential and cost effectiveness of ongoing scientific research, further co-ordination of research agendas and associated data collection methodologies must take place. The existing EU Joint Programme Initiatives are a significant step in this direction. A number of data harmonisation projects using European datasets are currently ongoing within PESS (Alan Donnelly, Matthew Herring, Cillian Mc Dowell, Ann Mc Phail, Ciaran Mac Donncha, Rhoda Sohun).
References:
- Gallacher JE. The case for large scale fungible cohorts. Eur J Public Health 2007;17:548–49.
- Doiron D, Burton P, Marcon Y, Gaye A, Wolffenbuttel BHR, Perola M, et al. Data harmonization and federated analysis of population-based studies: the BioSHaRE project. Emerg. Themes Epidemiol. 2013;10:12.
- Fortier et al. (2017). Maelstrom Research guidelines for rigorous retrospective data harmonization. International Journal of Epidemiology, 46 (1): 103-105. DOI: https://doi.org/10.1093/ije/dyw075
- Wilkinson MD, Dumontier M, Aalbersberg IjJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data. 2016;3:160018.
Ciarán MacDonncha is a Lecturer in Health and Physical Activity in the PESS Department. View Ciarán’s Profile Here!
Ciaran’s Email Address: Ciaran.MacDonncha@ul.ie