Understanding barriers to linking novel lifestyle data for health research, results from the LifeInfo Survey: a topic modelling approach
Lifestyle data from supermarket loyalty cards or fitness apps could help researchers understand risk factors for disease, through text analysis on survey responses, this project aims to understand what would influence people to share these data.
Modern technologies mean vast amounts of data are generated about individual diet and exercise patterns, these data could help researchers understand and fight disease. The LifeInfo Survey consulted the public on their willingness to share data from supermarket loyalty cards and mobile phone apps, websites or wearable devices for health research. With over seven thousand responses, this survey provides a unique insight into public attitudes towards linking novel lifestyle data with health records.
As part of the LifeInfo Survey, those against sharing their lifestyle data were asked what might make them change their mind. This provided thousands of free text responses detailing respondents’ concerns and suggestions for data sharing. The aim of this project was to analyse these textual data in order to identify barriers for data linkage. We believe this intelligence will allow future research to address common concerns and build greater trust and transparency into data sharing practice.
Data and methods
Free text survey responses were analysed using latent Dirichlet allocation (LDA) modelling. This is a technique frequently used in text analysis that uncovers ‘topics’ within texts, based on the co-occurrence of terms. Each topic is modelled to be made up of several words or phrases, and survey responses are probabilistically categorised into topics.
In order to perform LDA modelling free text survey responses were transformed into meaningful ‘tokens’ through data cleaning and processing procedures.
LDA modelling uncovered topics which commonly arise as issues for sharing both supermarket loyalty card data and health/fitness app data. The key barriers identified include: data security and protection, personal privacy, understanding more information about the research purpose and benefits, fear that the data could get into the ‘wrong hands’, feelings of ‘big brother watching’, problems with data accuracy, and not understanding the reason for data linkage. These barriers can potentially be addressed by researchers with varying degrees of ease. For others, nothing would make them share these data or they don’t use store loyalty cards or health/fitness apps.
These barriers are identified through observing the terms most relevant to each topic. For example, Figure 1 shows the 20 topics created when modelling the health and fitness app question survey responses. Topic 11 is highlighted and the most relevant terms for this topic are those such as “not accurate”, “reliable”, “exercise” “step”, “activity” and “run”. This topic can be summarised as issues with data accuracy. Within this topic, respondents provide answers such as they do not record all their activity and that data would only give a partial and inaccurate picture of their exercise patterns.
Value of the research
Novel lifestyle data provides exciting opportunities for research into risk factors for diseases related to diet and exercise such as type 2 diabetes, heart diseases and certain cancers. These data are naturalistic and non-intrusive meaning large datasets can be collected and analysed by researchers at lower effort and cost and without selective reporting biases involved with traditional methods e.g. food logs.
Such research could significantly benefit human life and health as globally one in five deaths, and the loss of 255 million daily adjusted life years, have been attributed to poor diet. While physical inactivity is responsible for an estimated nine percent of global deaths.
To harness these data for health benefits, public support must be attained. Gathering people’s opinions and concerns about sharing these data is the first step in establishing best practice for initiatives that utilise novel lifestyle data.
- Novel lifestyle data could be utilised by health researchers to understand risk factors for disease.
- People are more willing to share their data if they believe in the purpose/benefit of the research and have a detailed understanding of how their data will be used.
- Common concerns for sharing these data are data security/protection, data inaccuracy, personal privacy and fear the data could get into the ‘wrong hands’.
- Health informatics
- Public attitudes
- Natural Language Processing
Michelle Morris, University of Leeds Academic Fellow in Health Data Analytics
Stephen Clark, University of Leeds Academic, School of Geography