Using novel types of data to detect illness caused by contaminated food or drink

CDRC PhD Student Rachel Oldroyd is one of the UK Data Service Data Impact Fellows. Rachel is a quantitative human geographer based at the Consumer Data Research Centre (CDRC)at the University of Leeds, and here discusses how novel types of data are used to detect illness caused by contaminated food or drink.

Affecting an estimated 1 million people at a cost of around £1.5 billion per year (Food Standards Agency, 2011), foodborne illness remains an unacceptably high burden on the UK population and economy. As many victims choose to recover at home without visiting their GP, the number of cases is difficult to measure and severely under reported in national data.

But what is foodborne illness? The World Health Organisation defines it as an Infectious Intestinal Disease caused by the ingestion of a harmful parasite, virus or bacteria, known as a pathogen. A pathogen can infiltrate any part of the food supply chain and can be hard to detect, but will result in symptoms ranging from mild nausea to death. With around 500 annual deaths in the UK attributed to food poisoning, the Food Standards Agency (FSA) are continually developing methods to support their key objective to reduce its incidence.

I currently teach Geographic Information Systems in the School of Geography and I’m studying towards a PhD in the Consumer Data Research Centre, both at the University of Leeds. My research is focused around data analytics for food safety. In particular, exploring the landscape of food safety in the UK and investigating the utility of novel types of data. For the first part of my research I plan to extract geo-demographic variables from the 2011 Census, investigating relationships between these variables and food safety measures (restaurant hygiene scores, hospital admissions and mortality). The second part of the research will focus on the use of novel types of data and Natural Language Processing to detect cases of foodborne illness reported through online reviews and social media. It is hoped that these datasets may provide additional information missed by the traditional GP reporting process.

Many US studies have researched the use of novel types of data for disease detection, reporting timeliness and the inclusion of additional information as key advantages compared to traditional GP data. For example, in a restaurant review, customers may comment on the cleanliness of the restaurant, the quality of the service and/or describe the food they ate. These user-generated comments are extremely useful and are not available from traditional data sources. However, extracting reviews within which customers report illnesses can be difficult. It is not as simple as looking for specific keyword matches, as these will often return false results; for example searching for ‘sick’ may return ‘I’ve never known anyone get sick here’. This is where Natural Language Processing plays its part. A model can be trained to identify sequences of words which refer to illness and return relevant reviews; ignoring those which do not indicate illness.

It’s hoped that this research will have a strong policy impact and will be used to inform and improve the current restaurant inspection process in the UK. Throughout the research I plan to liaise with key industry professionals, including those from the FSA and the local authority to keep the research relevant and timely. I’m delighted to be named as one of the UK Data Service Data Impact Fellows and plan to take full advantage of the scholarship by developing impact through presentations at national and international conferences, disseminating the research through suitable publications and holding stakeholder events and public seminars. Watch this space!