Home » News » Opinion: Big Data in Public Health LIDA Seminar

Opinion: Big Data in Public Health LIDA Seminar

The LIDA seminar, “Big Data in Public Health – future horizons, applications and ethical issues” (13th March), was one of the most well-attended and rigorously engaging we have been to at Leeds so far this year, and if the audience Q&A afterwards was any indication, it seems many others would agree. Dr Michelle Morris (University Academic Fellow in Health Data Analytics, affiliate of the CDRC and Leeds Institute for Data Analytics) began proceedings with a talk which presented the case for using Big Data to fill in the gaps in the narrative for public health research. Then Dr Jon Fistein (Associate Professor in Clinical Informatics, Division of Health Services Research, LIHS) took up the Big Data in public health baton to expertly shine a light on some of the critical legal and ethical questions surrounding data usage for public health research.

Dr Morris opened with a cautionary warning that Big Data is not a “magic pill” which is going to single-handedly solve all of the problems endemic in public health research, nor did she suggest that more ‘traditional’ data like the National Diet and Nutrition Survey or GP questionnaire data be replaced by Big Data. By examining Big Data for public health research through the lens of the current obesity epidemic (simply put, resulting from the imbalance of diet and physical activity), she presented a nuanced reasoning of the ways in which Big Data can help to supply the deficits in public health research created by using exclusively so-called traditional data. For example, she pointed out that in the instance of the National Diet and Nutrition Survey, the data is never going to be as representative as researchers would like due to the limited sample size (6000 people every 6 years), despite great efforts made to recruit a representative sample. She described this data as ‘made’ data where the burden of data generation is on the participants and researchers, and where there is likely to be a bias inherent because the data is being generated for a specific purpose. This contrasts with Big Data which is ‘found’ data, such as transactional data available through supermarket storecards.

The example of Bounts’ physical activity app data, where users elect to share data and the scale of data available is vast, demonstrated the potential of Big Data in public health. Here, the generation of data is intuitive, the data itself arguably more objective because it is captured directly from the source and not biased by recall from a faulty memory, and the burden of generation sits with businesses. Dr Morris’s specific example from 2016 Bounts data also illustrated some of the potential pitfalls in interpreting this kind of Big Data in isolation – i.e. peaks and troughs in activity were deemed to be congruent with changes in British Daylight Saving Time, whereas Bounts suggested other promotional and contractual reasons for the peaks and troughs evident in the data. Expanding on some of the challenges that must be addressed when using Big Data, Dr Morris pointed out that supermarket storecard data can frustrate attempts to isolate the consumer patterns of individuals due to unintentional aggregation – i.e. one household’s shopping captured on one storecard. Nonetheless, the potential of engaging with businesses and consumers which generate Big Data and opening up a dialogue with them about data use were plain and compelling, especially in the age of new Big Data.

By asking the room a series of direct questions designed to make us think about our attitudes to sharing our own data for public health research, Dr Morris succeeded in pointing out that the collection of data and its applications in public health research are issues which affect us all, and about which we are bound to have an opinion. Perhaps the most interesting question was whether the audience felt that Big Data could be used exclusively to populate all of the domains identified on the Obesity System Map:

Map identifying all the main areas which have an impact on obesity.
Whole Systems Approach map illustrating the factors impacting on obesity.


The feeling in the room was weighted in favour of ‘no’ in response to this question. Dr Morris explained that, whilst Big Data cannot populate all of these specific nodes, it can populate the majority (as per findings [paper forthcoming], from the Obesity Strategic Network, of which Dr Morris is Director). She concluded that, when used in conjunction with existing traditional data, Big Data is a significant contributor in providing answers to questions of great moment in public health research. Finally, with the examples of two PhD projects by Emma Wilkins and Rachel Oldroyd respectively, it was further demonstrated how consumer-generated Big Data can be timelier, include more metadata and more representative than traditional data.

Having heard of the great potential of Big Data in public health, we the audience then enjoyed an exploration of the legal and ethical issues surrounding public health from Dr Fistein, who is trained as a medical doctor and barrister, and currently sits on the Independent Group that Advises NHS Digital on the Release of Data (IGARD). He agreed that there is more potential to link data in the era of Big Data (or new forms of data irrespective of size). By virtue of the fact that Big Data is ‘found not made’, the challenge for public health research ethically and legally speaking is in determining whether new uses of such data is within the expectations of those to whom it relates when they originally ‘provided’ it.

After outlining some of the broad issues of privacy and consent in the digital age, Dr Fistein invited the audience to give their opinion on whether public health research is in the ‘public interest’ which would provide a legal gateway for the use of data for public health. The response was tentative at first with people wondering if this was a question designed to catch them out, but there was general agreement that is was. Dr Fistein pointed out that, unfortunately, many legal definitions of ‘Public Interest’ do not include public health. He observed that there is therefore a tension between the legal position and that of public health practitioners, many of whom argue that public health activities generally are in the public interest, and this should enable data to be used. An example of such uses is illustrated by the Learning Healthcare System in the following PDF.

The Learning Healthcare System

He noted that one proposed solution to the issue of using data for public health is to anonymise it (as this could potentially make the data use lawful). However, he pointed out that there are several issues that render this approach problematic. One issue is data quality, as it may be impossible to reliably link datasets together when ‘anonymised’ or it may be necessary to use identifiable data to ensure reliable linkage. He also described the challenges related to the sense of ownership individuals feel towards data which is generated about them. Citing the well-known example of the catholic woman who uses a contraceptive pill in order to treat a health issue unrelated to contraception (from Pattinson, S., Medical Law and Ethics, Sweet and Maxwell), he pointed out she might be aggrieved were she later to learn that data about her usage of that pill were being used in a study on how to improve contraceptive treatments.  He related this to the concept of ‘context collapse’ (as described by the Wellcome Trust) – when the patient receives care in one explicit context and then finds that data about them is being used for another purpose which has not been made explicit. In respect of this, Dr Fistein reminded everyone of Dame Fiona Caldicott’s principle that: “There should be no surprises in data use.” Dame Fiona also said in the 2017 National Data Guardian Report, “the most praiseworthy attempts at innovation falter if they lose public trust.”

Taking this further, Dr Fistein concluded by saying that trust is meaningless without a notion of ‘trustworthiness’ (Onora O’Neill https://www.youtube.com/watch?v=XWwTYy9k5nc) and that public health researchers have an obligation to demonstrate trustworthiness, for example by being transparent about the work they are doing and its benefits. On a practical note, he pointed out that as a bare minimum, public health practitioners and researchers should also know the law and be able to defend their position in relation to it. He recommended that they should consult experts early when they are considering data use, in order to anticipate and address any potential ethical or legal issues.

The two presentations were followed by an animated Q&A session with many of the audience members (a significant number of whom were public health registrars) remarking on how engaging they’d found the speakers and the subjects.