Home » Uncategorised

Job Opportunity: Business Development Manager

Job Opportunity: Business Development Manager

Are you skilled in building and sustaining relationships between academics and external organisations? This role offers an excellent opportunity for those keen to work in an exciting and multidisciplinary environment.

The CDRC continues to grow and as such is seeking a talented and highly motivated Business Development Manager, based at the University of Leeds, who can help us maintain and build our relationships with businesses and other external organisations. We are looking for someone who can oversee a portfolio of partnerships and who will contribute to the ongoing business development strategy of the CDRC.

You will provide a vital bridge between the Centre and the business sector, maintaining and building relationships with existing data providers and encouraging new data partners to work with the Centre. In this capacity, you will carry significant responsibility for building the Centre’s business and data portfolio upon which the Centre’s core services depend. You will also be responsible for working alongside professional service teams at the University of Leeds to articulate and execute the legal agreements and data sharing agreements which underpin consumer data operations. You will work directly with the Centre’s co-Directors on implementing the Centre’s business development strategy and work closely with the Centre’s Public Engagement and Communications Officer to ensure that the Centre achieves impact.

Find out more and view full candidate brief.

World Book Day 2021 – Data Science Interns

World Book Day 2021 – Data Science Interns

To celebrate World Book Day we spoke to some of our Data Science Interns about their favourite reads. 

George, Rosie, Stuart and Simon shared the stories they loved to read as a children and teenagers, and discussed the books that have had the biggest impact on their careers to date.  

George Breckenridge & Stuart Ross

Stuart and George have been working with us over the past six months to Analyse COVID-19 Mobility Responses through Passively Collected App Data. They shared some of their work in our recent blog analysing patterns of Christmas mobility in the UK


What was your favourite book as a child?

‘Who Was Isambard Kingdom Brunel’ (2006) by Amanda Mitchison – I think I read this when I was about 8, which is crazy looking back! A short biography of Brunel and his engineering feats which I think instigated a life-long fascination with our collective journey into the depths of underground civil engineering. 

What was your favourite book as a Teen?

‘Population 10 Billion’ (2013) by Danny Dorling – I’ve always loved Danny Dorling’s writing and this book on demography represented a cornerstone of my reluctantly-optimistic teenage outlook. Its insight that future global resource issues are mostly a product of imbalances in ‘consumption’ rather than global ‘overpopulation’ remains, in my personal view, underappreciated. 

Favourite data related book?

‘Urban Analytics’ (2017) by Alex Singleton, Seth Spielman & David Folc – This concise textbook served as my gateway into truly understanding the diversity and dynamism of urban analytics, perfectly pitched as an introductory text.

Book that has had the greatest impact on your career to date?

‘Imagined Londons’ (2002) by Pamela K. Gilbert (eds). – At a time when I needed it most, this book put theoretical rocket-boosters into my undergraduate dissertation on urban geography, which in turn contributed to my BA classification very helpfully! 


What was your favourite book as a child?

Where’s Waldo by Martin Handford

What was your favourite book as a teen?

This one would be The Old Man and the Sea by Ernest Hemingway

Favourite data related book?

Python for Dummies by Stef Maruch – Great for learning the basics of Python and I still refer back to it from time to time to brush up before an interview. 

Book that has had the greatest impact on your career to date?

An Introduction to Species Distribution Modelling (SDM) Using QGIS and R by Colin D. MacLeod – This is the first book I actually followed all the way through and used as a tutorial to teach myself the basics of SDMs. 

Simon Leech

Simon is working with the CDRC team on a project with the Office for National Statistics (ONS), funded by Administrative Data Research UK (ADR UK) – The Local Data Spaces Pilot. He recently shared his experiences of hybrid working during the pandemic.

Simon, what was your favourite book as a child?

Any from the Horrid Henry Series by Francesca Simon – I remember reading so many of these as a child, and reading them over and over again!

What was your favourite book as a teen?

Gerrard: My Autobiography (Steven Gerrard) – I have to confess I did not read enough during my teenage years, but remember almost exclusively reading footballer’s autobiographies when I did pick up a book! As a Liverpool fan this is the only choice really!

Favourite data related book?

Algorithms of Oppression: How Search Engines Reinforce Racism (Dr Safiya Umoja Noble) – I attended the Open Data Institute 2020 Summit, and found the talk given on this subject very interesting and thought provoking, so I went ahead and bought the book to learn more about the current information ecosystem. 

Book that has had the greatest impact on your career to date?

Spatial Microsimulation with R by Robin Lovelace and Morgane Dumont – I followed this free book closely to produce a spatial microsimulation for assessing Vulnerability to Personal Carbon Allowances for my GIS Master’s Dissertation, something that pushed me to apply for this role as I enjoyed the work so much!

Rosie Martin

Rosie has been working with us for the last 6 months to explore Isolation and Inclusion in a Post-Social Distancing COVID World.

Rosie, what was your favourite book as a child?

Anything by Michael Rosen.

What was your favourite book as a teen?

The Count of Monte Cristo, by Alexandre Dumas.

Favourite data related book?

Moby-Duck: The True Story of 28,800 Bath Toys Lost at Sea, by Donovan Hohn – Set within an entertaining true story, this book introduced me to using data and spatial mapping to understand real events.

Book that has had the greatest impact on your career to date?

How to Lie with Maps, by Mark Monmonier – As an aspiring geographer at the time of reading, Monomier was the first to teach me to develop a critical eye when looking at maps, and how to differentiate the good from the bad in a context where all maps must lie in one way or another.

New paper: Data considerations for the success of policy to restrict in‐store food promotions

Data considerations for the success of policy to restrict in‐store food promotions:
A commentary from a food industry nutritionist consultation

A recently published paper from CDRC researcher Vicki Jenneson discusses new plans to restrict in‐store price and location‐based promotions of less healthy foods and drinks in the UK aimed to encourage healthier choices. With responsibility for implementation likely falling to food retailers, it is important to understand the feasibility of implementation and to ensure policy success. To ensure compliance, retailers will need to assess which products are restricted under the legislation. The large number of products in retailers’ portfolios poses a problem of scale.

A recent research case study found the data available to retailers to be insufficient to accurately apply the rules‐based approach set out by the policy proposal. Misclassification would result in some less healthy products being incorrectly promoted and vice versa. Problems with implementation feasibility have the potential to undermine the public health goals of the legislation. Interviews were carried out with nutrition representatives from the UK food retail and manufacturing sector, to understand the real‐world implications of the proposed legislation.

Industry nutritionists recommended a review of the use of the UK’s Nutrient Profiling Model as the legislative basis, proposed data‐related solutions to implementation problems and suggested a need for shared retailer‐manufacturer responsibility, given the context of data availability.

Read full paper

My Perspective on Hybrid Working: The New Normal?

My Perspective on Hybrid Working: The New Normal?

Hi, I’m Simon, one of the LIDA Data Scientist Interns, in the unique position of splitting my working week between home working and working in one of LIDA’s Safe Rooms (for use when analysing controlled data in secure conditions without internet access). This is because I am currently working on a project with the Office for National Statistics (ONS), funded by Administrative Data Research UK (ADR UK) – The Local Data Spaces Pilot.

The Local Data Spaces Pilot project aims to develop novel insight for Local Authorities in response to the COVID-19 pandemic, providing up-to-date, high-quality analysis at granular levels.  Principally, we will use health data from the Test and Trace programme, non-health data provisioned by the ONS and the Joint Biosecurity Centre (JBC) and Local Authority ingested data to create novel and innovative insights in support of individual Local Authority policy needs. We aim for this work to inform impact monitoring, allocation of resources and a better understanding of the pandemic at local levels. 

Simon Leech

Simon is an Intern at Leeds Institute for Data Analytics, working with the CDRC team to apply data science solutions to solve complex, real-world challenges.

In this article I’m going to share some initial thoughts and feelings on how I’ve found building new working relationships remotely through the Programme, the ways in which my weekly routine has taken shape and the pros and cons of hybrid working.

At the beginning of this project I felt very overwhelmed at the thought of a newly-appointed Data Scientist Intern being thrown in at the deep end with expert colleagues, so I can only thank the rest of my team for their help and support throughout! The ADR UK Support Team is made up of three other academic researchers, among them Post-Doctoral Researchers and Lecturers. The wider support team spans Research Analysts, Directors and high-level leadership colleagues from the ONS and ADR UK. Across the four stakeholders, we have varying engagement.

Typically, the ADR UK Support Team will communicate almost daily, discussing the particular deliverables and current progress. We meet with the JBC, ADR UK and ONS on a fortnightly basis to provide a high-level overview. Insights from these fortnightly meetings are then disseminated to a wider group of stakeholders, and serve as a touch-point for mitigating risks- as it is important to remember this is a pilot study, so there is opportunity to learn what works best!

This project is inherently collaborative; we are working with and for Local Authorities, to provide them with the datasets, code and outputs related to the COVID-19 pandemic. By learning the skills, ways of work, and personalities of the other team members, I believe we have built a strong team dynamic, and one that fosters collaboration, innovation and insight, across the four actors involved.   

I began this LIDA Data Scientist Internship having never had a full-time post, and I still have the bizarre knowledge that my colleagues of nearly 5 months are people I am yet to meet in the flesh.”

However, I really feel I know them well, and feel we all made a concerted effort to gel together.  The Data Scientist Interns have a scheduled Friday evening after work social call, for us to chat and wind down after the week, and also scheduled Coffee Breaks in our calendars: a 30-minute break 3 times a week to get away from our work and simply chat to others in a similar position.

Whether this might be an informal discussion on the project itself, asking for help, or what everyone did at the weekend, these Coffee Breaks ensure we don’t feel isolated while working from home alone, foster friendships and work-place groups and enable us to provide help and support to others. Personally, I have been made aware of the Open Data Institute Conference and various training courses, events and online webinars through fellow Data Scientist Interns, and have sought their help with coding issues, and data visualisation techniques in R. The wealth of knowledge across the Data Scientist Interns is fantastic, and by ensuring we all have strong relationships, we know who to talk to for support on a particular issue, ensuring the collaborative aspect of the Data Scientist Internship Programme, even remotely!

At the time of writing this article, I am typically working four days a week in the Safe Room (for use when analysing controlled data on the ONS Secure Research Service without internet access) in LIDA and one day a week from home (the latter without access to the Secure Research Service where our project and data reside).

The routine in the Safe Room is one I am still getting used to and has definitely resulted in a different and perhaps more productivity-driven way of working. This is because I work in the Safe Room alone, and with the knowledge that this is my only opportunity to access the data being used on the project. I really feel productive in the Safe Room – the lack of internet access, distractions, and even colleagues enables me to power through the research. However, I feel the burden to work, all day 9am-5pm, more keenly whilst working in the Safe Room. Whereas in normal office-based employment, I might be stewing over a coding problem during lunch or a quick break, I find that without someone to talk with, or simply get outside for a quick walk with, I miss escapism and time away from a screen, and often find myself forsaking breaks at the expense of continuing research. Before writing this blog post, I hadn’t really put this into words, so this self-reflection has really highlighted this.

Many of you might be wondering how I cope without access to the internet in the Safe Room, particularly when completing research using R. I have to say this has been the biggest challenge of the work. I liken this to completing an R exam, no notes, no internet, simply your own ability! The ONS also allows you to import code into the Secure Research Service, so if I know what I will need to do next, on my day at home I can prepare the code and ask for it to be imported. The need to leave the room to search for the particular syntax of an issue can sometimes be frustrating, but it is at the expense of access to wonderful, highly granular data. However, it does ensure I get in my steps for the day, so that’s a little win!

The single day I am currently spending at home, without access to the project space is a welcome change. The flexibility to work my hours in a different format enables me to get outdoors for a long walk during daylight, something I sorely miss inside the office. This agile method of working, provides a different focus on my home-working days and gives me the freedom to continue personal development and research any techniques or code required, providing a space for more creative thinking around the challenges encountered.  I sometimes find that within the Safe Room, your head begins to race around searching for answers and it is easy to get overwhelmed. By allowing myself time away from the screen and time outdoors, I am able to re-focus and consider how best to use the remaining 2 days in the Safe Room.

Hybrid Working – the New Normal?

Having started my first full-time post during the pandemic, I have only worked virtually, or worked alone in an empty office. This pandemic has shown that high-quality research can be completed collaboratively, remotely and across different regions – so I see no reason why this should not be a possibility moving forward. I like the agility of being able to adapt my working hours, particularly during these winter months. The ability to get outside during daylight hours really helps me, but I would also love to see some more faces in the office in the future, and hopefully meet all of my colleagues in the flesh!  Finally, I simply hope that, moving forward, working patterns will retain a level of flexibility to enable employees to find a hybrid approach that best suits them. 

Generation Rent and Supermarket Brands

Generation Rent and Supermarket Brands

The recent Channel 5 program “Inside Waitrose” again highlighted the existence of a ‘Waitrose effect’ on the UK housing market.

But what exactly is the effect, and does it apply to homeowners and renters alike?  We asked CDRC Researcher Dr Stephen Clark to explain. 

What is the ‘Waitrose effect’?

The program reported work by retail analyst James O’Malley which identified the location of ‘posh’, high personal income neighborhoods and found that the top 10 poshest neighborhoods contained a Waitrose.

The most often cited study for this ‘Waitrose effect’ was conducted by Lloyds bank in 2018 that quantified the impact of various retail brands on local house prices. This study compared the average house prices in the catchment of various supermarket brands with equivalent prices from farther afield and associated the difference to the presence of the supermarket brand. They found an impact for most major brands, but the highest was for Waitrose, a +9.3% premium.

(source : https://www.channel5.com/show/inside-waitrose/)

Does having a Waitrose or M&S nearby really make such a difference?

Well, what the study did not do is try to control for the other aspects that might drive this difference, for example Waitrose stores may be located close to larger, more expensive properties than in the wider area. This means that the Waitrose effect reported may also be picking up other differences between the local neighborhood and the wider areas.

Whilst this Waitrose effect on house sales is well reported, there is less understanding of the impact of retail brands on other aspects of the English housing market, such as rental properties. This gap in knowledge is what our recent study aimed to fill.

Also, we decided to control for other aspects that make properties more desirable, such as the size of the property, the affluence and character of the neighborhood, access to services and the quality of local schools. This allows us to distill out many of these ‘confounders’ to hone in on a true retailer association.

Another aspect of our study is that we extended the understanding by examining different effects from smaller ‘convenience’ locations and more traditional larger store locations. This is important because the attractiveness of each type of location is different – the range of goods is more limited in convenience stores, but they are allowed to open for longer hours on Sundays.

What data did you use? 

For our study we used two main sources for our data. One was a database of over 1 million private rental listing prices taken from the popular Zoopla property listing site, provided by the Consumer Data Research Centre, from July 2014 to December 2015.

The second was the location of retail brands provided by the property analytics company GEOLYTIX, which allowed us to identify the location of stores, their brands and their size. Using powerful analysis software we were able to associate with each property listed for rent in England, the brand of the closest convenience store and the brand of the closest medium to large store.

Other ‘confounding’ information on the size of the property (bedrooms, bathrooms and reception rooms), the affluence of the neighbourhood (through a classification of areas and local incomes), the distance from the property ‘hot spot’ of West London, the quality of local schools and the access to transport hubs was also attached to the property. This comprehensive and diverse database on over 1 million properties allowed us to use regression analysis to measure the association of each of these aspects on the listing price of the property.

So does the ‘Waitrose effect’ have a similar impact on the rental market?

Our model produced reassuring insights. The confounding variables all estimated the correct magnitude and influence on rental prices, e.g. higher rents for larger properties, in more affluent areas that are closer to good schools and railway stations. And the retail findings were also insightful.

Convenience store 

Properties that had a convenience store within a handy 500m walking distance showed a positive impact on rental prices, after controlling for all the above confounders, over having no convenience store close by.

The biggest effect associated with a brand was for the little Waitrose brand with a 5.6% increase in rental prices. The impact for Marks and Spencer Simply Food stores was almost as high, with a 5.1% increase. The smallest increase was for a small Co-operative store of just +0.8%. Of the Big-4 retailers, the local brand for Tesco express was favored over Sainsbury’s Local stores.

Medium to large store 

Looking at the influence of which brand of medium to large store was closest to the property (irrespective of distance) and measuring this relative to the Aldi brand, we again see distinct differences by brands. The premium is always positive, meaning that all brands command a premium (all be it small in some case) over the Aldi brand.

Again the Waitrose effect is most pronounced with an associated 11.1% increase in private rental prices. With the house sales data, Lloyds bank estimates a 9.3% increase. The second highest brand is again Marks and Spencer, with an 8.7% increase. Two of the Big-4 retailers, Asda and Morrison’s only showed a small premium over having an Aldi store as the closest retailer.

One interesting finding is that Lidl, Aldi’s fellow discounter, shows a price premium over an Aldi and these two Big-4 retailers. It will be interesting to work out why this might be the case.  

Is the ‘Waitrose effect’ London centric?  

Waitrose traditionally has its roots in Southern England, particularly around London and London is recognised as an expensive location for property, both to buy and rent. This then leads us to ask are we just picking up an affluence effect from South East England for our brands?

Well, the reason we introduced the confounders was to try and capture any affluence effect in the South East using other information, such as the local income levels, the affluence of the neighbourhood, its socio-demographic make-up and critically, a distance from West London to capture in this one term the reduction in private rental prices as one moves away from West London. What is left in the variation in rents after these confounders have been accounted for is then tested against the retail brands.

What about the future?

Our study gives us a good understanding of the interaction between retail provision and rental prices in the recent past, but this relationship is clearly dynamic. Questions arise around whether in the post-Covid-19 living, working, leisure and retail landscape, will city centres and associated rental housing be less attractive and likely to remain so? Conversely, are more localised residential areas with associated small high streets now more desirable, with a knock on positive impact on convenience grocery stores and nearby rental costs? Has consumers’ experience with on-line retailing weakened the geographic link between where people live and shop – or have local in-store loyalties transferred to into the e-retailing sphere? Clearly this is an active and important area of research that is best facilitated by access to the types of novel data used here.

Do you have any plans for further work in this area?

In this study, for statistical reasons, we can only really claim an association between the retail brands in a neighbourhood and the rent for local private rental properties. To claim an actual causation requires more sophisticated statistical techniques. One technique that we are actively investigating is Propensity Score Matching which attempts to mimic a traditionally randomised control trial type experiment. Here we are attempting to set up a group of control areas (that do not have the retail brand present) and a group of treatment area (that have the retail brand present). If these two groups can be made to look ‘similar’ in every respect other than the presence of a retail brand, then any difference in the rental price may be attributed to the presence of the retail brand. Initial investigations are proving positive.

Further Information

For further information or a copy of our article, please contact Stephen Clark.

The full article is available here: Clark, S., Hood, N., & Birkin, M. (2021). A hedonic model of the association between grocery brand provision and residential rental prices in England. International Journal of Housing Markets and Analysis.

Job opportunity: Communications & Public Engagement Officer

Paul sat at table

Job opportunity: Communications & Public Engagement Officer (Leeds)

Are you an experienced communicator with the ability to write, edit and manage compelling content for a range of channels? Do you have experience of delivering multichannel communication plans? This role will appeal if you are looking for a busy and varied communications role, providing maternity cover for 12 months from May 2021.

Based at Leeds Institute for Data Analytics at the University of Leeds, you will be responsible for developing and implementing multichannel communications and public engagement plans for the CDRC, to raise awareness of data as a resource for academic and applied research; and to promote the Centre’s research, services, training and education activities.

You will be required to develop web and social media content, produce high quality publications such as reports, newsletters and promotional materials and deliver educational and public-engagement events.

You will have a graduate degree or equivalent (preferably in a communications-related subject) and specialist expertise in a wide range of communications and public engagement activities and practices. Knowledge of content management systems, social media and evaluating communications is also essential.

Find out more and view the full candidate brief.

CDRC Supporting Development of Sktime

CDRC Supporting Development of Sktime

Markus Löning is a PhD student at UCL with the CDRC, and is one of the lead developers of sktime – a Python library for time series machine learning. Time series analysis is a challenging area and many existing tools do not work well with time series data. 

Solving data science problems with time series data in Python is challenging.

Why? Existing tools are not well-suited to time series tasks and do not easily integrate together. Methods in the scikit-learn package assume that data is structured in a tabular format and each row is i.i.d. — assumptions that do not hold for time series data. Packages containing time series learning modules, such as statsmodels (https://www.statsmodels.org/stable/user-guide.html#time-series-analysis), do not integrate well together. Further, many essential time series operations, such as splitting data into train and test sets across time, are not available in existing Python packages.

To address these challenges, sktime was created.

Logo of the sktime library (Github: https://github.com/alan-turing-institute/sktime)

sktime is an open-source Python toolbox for machine learning with time series. It is a community-driven project funded by the UK Economic and Social Research Council (https://esrc.ukri.org/), the Consumer Data Research Centre (https://www.cdrc.ac.uk/), and The Alan Turing Institute (https://turing.ac.uk/).

sktime extends the scikit-learn API to time series tasks. It provides the necessary algorithms and transformation tools to efficiently solve time series regression, forecasting, and classification tasks. The library includes dedicated time series learning algorithms and transformation methods not readily available in other common libraries.

sktime was designed to interoperate with scikit-learn, easily adapt algorithms for interrelated time series tasks, and build composite models. How? Many time series tasks are related. An algorithm that can solve one task can often be re-used to help solve a related one. This idea is called reduction. For example, a model for time series regression (use a series to predict an output value) can be re-used for a time series forecasting task (the predicted output value is a future value).

Mission statement: “sktime enables understandable and composable machine learning with time series. It provides scikit-learn (https://scikit-learn.org/stable/) compatible algorithms and model composition tools, supported by a clear taxonomy of learning tasks, with instructive documentation and a friendly community.”

sktime is a great example of the user community coming together to produce a understandable, compatible, standards based, open source tool to solve a specific problem. CDRC is proud to support the project through Markus’s involvement and aims to provide similar support to many other projects in the future. 

For more details, please check out this blog post at https://towardsdatascience.com/sktime-a-unified-python-library-for-time-series-machine-learning-3c103c139a55 by Alexandra Amidon (https://alexandra-amidon.medium.com/).

Understanding and Comparing Mobility Data – 4th Feb 2021

Understanding and Comparing Mobility Data – 4th Feb 2021

Through the ABC (Accelerating Business Collaboration) Research Programme, funded by ESRC & UBEL, PhD candidate James Todd worked with Geolytix to validate the representativeness of mobile mobility data from Unacast. Geolytix were interested in gaining a deeper understanding of how comparable their (Unacast) data is to alternative mobility data sources as well as insights into the factors that influence the number of devices that are found within small geographical areas.

Overall, the analysis within this project finds that Unacast mobility data is a comparable to many alternative mobility data sources, observing a 70-100% decline in activity by the start of April 2020 across the vast majority of mobility data sources.

This research project composed of 2 main methods. Firstly, a descriptive analysis of mobility trends in London were assessed by comparing Unacast mobility data to a large number of open mobility data sources (Google, Apple, Purple, Open Table, Transport for London, City Mapper, Santander Bike Sharing). Using this method, it was possible to visually compare multiple mobility data sources within the context of Covid-19 lockdown restrictions.

DatasetDescriptionSource (link)
UnacastMobile mobility dataGeolytix (private)
GoogleCategorised mobility dataGoogle (open source)
AppleCategorised mobility dataApple (open source)
SSSWifi footfall dataCDRC (private)
PurpleWifi footfall dataPurple (open source)
Open TableRestaurant reservation dataOpen Table (open source)
TfLTransport use dataTfL (open source)
City MapperMobility index dataCity Mapper (open source)
Santander Bike SharingBikeshare activity dataCDRC (open source)
Open Street MapGeographical features dataOSM (open source)
Table 1. Sources of Mobility Data used in this analysis

To enable a deeper understanding of the representativeness of Unacast data, statistical regression analysis was conducted. A fixed-effect regression was conducted to find the representativeness of Unacast mobile devices in relation to the Local Data Company’s (LDC) Smart Street Sensor (SSS) footfall data. In addition to this, a linear regression was conducted to find the relationship between Unacast mobility data to local geographic features taken from Open Street Map (OSM).

Geolytix were very happy with the project. Blair Freebairn (CEO Geolytix Ltd), said “The work is valuable to us in and of itself, but also as it has sparked additional areas of interest. In particular the comparisons to other broad brush indicators of human movement has provided context and reassurance as to the high-level appropriateness of mobility data. The micro correlations at site level are well elucidated and have shed new light on the nature of mobility data.”

James Todd, PhD candidate, said “This experience has been extremely valuable as it has given me insights into the private sector’s area of interest in the context of mobility data, which I have been working on within my PhD. This has given me many ideas on how I would like to adapt my PhD to include similar analysis as part of an empirical chapter.”

Written by Dr Nick Bearman, Project Delivery Manager

COVID for Christmas? Analysing patterns of Christmas mobility in the UK

COVID for Christmas? Analysing patterns of Christmas mobility in the UK

Examining Christmas 2020

The UK Christmas 2020 period attracted enormous public attention owing to the late cancellation of the 5-day ‘Christmas bubble’ policy to relax indoor household mixing from Dec 23rd-27th across the UK. Although Independent SAGE warned of the transmission risks from households mixing at Christmas, 44% of ONS survey respondents claimed they formed an exclusive Christmas bubble for Dec 25th.

Understanding human mobility responses has been transformed by the emergence of passively-collected smartphone app data. With most ‘Non-Pharmaceutical Interventions’ to curtail COVID-19 transmission relying upon the modification of physical behaviour [1], public health experts have called for the use of this systematic insight to monitor the effectiveness of national and regional ‘lockdown’ policies [1] [2] [3].

In response to public curiosity, and as a guide for health policymakers, this article provides a timely overview of actual mobility patterns observed over the 2020 Christmas period in the UK and their COVID-19 impacts. Using smartphone mobility data, it attempts to address:

  1. How mobile was the UK population over the Christmas period (23rd-27th Dec 2020)?
  2. Were these mobility patterns aligned with the UK government’s revised Christmas policy?
  3. Did mobility patterns have a detectable effect on UK COVID-19 growth rates?

Stuart Ross and George Breckenridge

Authors Stuart Ross and George Breckenridge are part of the Data Scientist Internship Programme at Leeds Institute for Data Analytics, which emphasises using data for the public good.

They are currently undertaking a joint 6-month research project examining patterns of mobility under the COVID-19 pandemic, funded by the Consumer Data Research Centre and supervised by Dr Mengdie Zhuang (UCL) and Prof Ed Manley (University of Leeds).

Data and Methods

For our data source, aggregated mobility data was provided by Cuebiq, a location intelligence and measurement platform. This first-party data is collected from anonymised users who have opted-in to provide access to their location data anonymously, through a CCPA and GDPR-compliant framework. Through its Data for Good program, Cuebiq provides mobility insights for academic research and humanitarian initiatives. Cuebiq’s responsible data sharing framework enables Data for Good partners to query anonymised and privacy-enhanced data, by providing access to an auditable, on-premise sandbox environment. All final outputs provided to partners such as LIDA are aggregated in order to preserve privacy.

Our metrics seek to understand mobility through the recorded ‘destinations’ of devices outside their ‘home’ location area. The ‘home’ areas of users in this analysis were calculated using a frequency-based DBSCAN from the week prior to Christmas. Destinations were then calculated daily for each unique user through a space-time DBSCAN algorithm [4]. The results of these were then aggregated in socio-temporal extent to daily and regional levels, so that the reported results protect privacy.

In order to account for relative mobility during COVID-19, a baseline of 16th December 2020 was used. Figure 1 evaluates the geographic representativeness of the Cuebiq data for 16/12/20, by visualising the UK authorities where the % proportion of the total population in the Cuebiq data outweighs (blue) the equivalent proportion from the ONS 2019 Mid-Year Population Estimates. It demonstrates that although there is good overall geographic representativeness, with no single area 1% over or under its % proportion of the total population, there is also a distinct regional geography to Cuebiq overrepresentation, concentrated in South-East England.

Figure 1: UK choropleth map to evidence representativeness by county of Cuebiq ‘GB’ data, for date 16/12/20. Calculated per county by subtracting % of total population in our Cuebiq dataset by the % proportion of the actual UK population as measured by ONS 2019 Mid-Year Population Estimates. Data Sources: Cuebiq; ONS UK Mid-Year Population Estimates 2019: https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationestimatesforukenglandandwalesscotlandandnorthernireland.

Policy Adherence

Under the final Christmas policy, no household mixing was permitted in Tier 4 England. Elsewhere in Great Britain the 3-household ‘Christmas bubble’ relaxations were limited to Christmas Day only. In Wales, only 2 households could mix on 25th December, whilst in Scotland bubbles were limited to 8 people.

So, did we see the spike of mobility on Christmas Day that we would be expecting, in light of these policies? No.

As Figure 2 evidences by contrast, mean distance travelled per ‘GB’ Cuebiq user actually fell in the lead-up to Christmas Day. There is a clear separation between the baseline (16th) and the entire Christmas period, with mean mobility declining by 40% from 4.5mi to 2.7mi per user on Christmas Day. Furthermore, a second smaller layer of separation exists within the Christmas period itself, within which mean mobility declined by 29% from 3.8mi on 23rd December to 2.7mi on Christmas Day. This noticeable reduction in mobility activity on Christmas Day is unexpected. Though, it is also verifiable insofar as it is reproduced in Citymapper data for Birmingham, Apple Mobility Trends Reports, and Google COVID-19 Community Mobility Reports for the UK.

Furthermore, lower mean mobility is accompanied by a higher % of users not going ‘out’ on an individual day, as well as a higher % of users staying outside their ‘home’ county. As shown by Figure 2, this is again true between both the baseline and the Christmas period, as well as within the 5-day Christmas period itself. A Multiple Comparison of Means test (Tukey HSD) showed a significant difference (P-value < 0.001) between each day’s % homestayers except between the 23rd-24th, and the 25th-26th.

Notably, the % of users outside their home county barely increased by 0.2% between Christmas Eve and Christmas Day, alongside a peak level of users not moving at 65.8%. Further than not just representing an expected Christmas Day spike, the shape of these 5-day figures relative to the 16th December baseline indicate patterns of mobility that are more closely aligned with the previous 5-day ‘Christmas bubble’ policy. On the one hand this could indicate a higher-than-expected uptake of legitimate ‘support bubbles’ for the Christmas period, or a very localised geography to the households that did mix in their scaled-back ‘Christmas bubble’. Nevertheless, these statistics could also support the speculation that adherence to the revised Christmas policy was lower than intended through the policy, with users following the withdrawn 5-day allowance.

Figure 2: Christmas Mobility for Cuebiq Users (Country: ‘GB’)
Date Mean distance
travelled (mi)
Users not
moving (%)
Users outside their
‘home’ county (%)
16/12/2020 4.5 38.8 25.7
23/12/2020 3.8 52.1 29.2
24/12/2020 3.6 52.5 35.3
25/12/2020 2.7 65.8 35.5
26/12/2020 2.3 65.8 37.9
27/12/2020 2.4 62.4 36.0

Figure 2: Visualised table documenting GB user mobility statistics, 16th and 23rd-27th Dec 2020. Includes mean distance travelled (mi, 1dp), users not moving (%) and users outside their ‘home’ county (%).  Data Source: Cuebiq.

Links to COVID-19 Growth Rate

So, did the British counties with the highest levels of recorded mobility over Christmas experience the consequence of higher COVID-19 incidence in early-January?

For this we’ve employed the proxy metric of ‘COVID-19 Growth Rate’, which is distinct from ‘R’ and is an approximation of the % change in the number of (recorded) infections each day. We’ve taken two measures of mobility from our analysis: total number of destinations visited per county and average number of destinations per individual per day per county.

The graphs in Figure 3 represent the correlations at British county level between measured mobility over the Christmas period 23rd-27th December 2020, and the COVID-19 growth rate as measured for these counties 5th-8th January 2021. As shown, the correlation for both mobility measures is positive but underwhelmingly weak. This counters our expectations that a strong positive correlation would exist.

Though plagued by small sample sizes, the top 5 counties by COVID-19 growth rate saw a dramatic spike in early January, when it was anticipated post-Christmas. Despite this, their mobility is far from the highest observed. This begs the question: were their COVID-19 growth rates a product of visitors from other counties ‘bringing COVID with them’ over Christmas? Our analysis also indicated this to be unlikely. In the two counties with useable sample sizes – Torbay and Stirling – both exhibited amongst their visitors a weighted COVID-19 county growth rate below the national average of 329, at 244.9 and 200.8 respectively. They did however both each receive visitors from one Tier 4 county that was far above the national COVID-19 growth rate average – Hertfordshire (13.5% visitors, 731) and Surrey (6.3% visitors, 603), respectively.

Figure 3: Correlation graphs per British county between Cuebiq mobility 23rd-27th Dec 2020 and COVID-19 Growth Rate for 5th-8th January 2021. Show weak correlations. COVID-19 Data Source: https://coronavirus.data.gov.uk/details/cases.

Despite this, it is very unlikely that mobility didn’t play a leading role in December 2020 COVID-19 transmission. A criticism of smartphone mobility data is that it is still unable to detect the fine-scale behaviours that can significantly reduce differential transmission risks, such as the adoption of sufficient social distancing, the wearing of face masks and whether households were definitively mixing [1] [2]. We are also forced to consider all mobile individuals as equally dangerous for transmission [1]. All factors considered, therefore, this analysis lends credibility to these critiques across a short-term period.

For further information please contact Professor Ed Manley, CDRC Co-Director and Professor of Urban Analytics at the University of Leeds.


[1] Grantz, K. H., Meredith, H. R., Cummings, D. A., Metcalf, C. J. E., Grenfell, B. T., Giles, J. R., Mehta, S., Solomon, S., Labrique, A., Kishore, N., Buckee, C. O., & Wesolowski, A. (2020). The use of mobile phone data to inform analysis of COVID-19 pandemic epidemiology. Nature communications11(1), 1-8.

[2] Kishore, N., Kiang, M. V., Engø-Monsen, K., Vembar, N., Schroeder, A., Balsari, S., & Buckee, C. O. (2020). Measuring mobility to monitor travel and physical distancing interventions: a common framework for mobile phone data analysis. The Lancet Digital Health.

[3] Budd, J., Miller, B. S., Manning, E. M., Lampos, V., Zhuang, M., Edelstein, M., Rees, G., Emery, V. C., Stevens, M.M., Keegan, N., Short, M. J., Pillay, D., Manley, E., Cox, I.J., Heymann, D., Johnson, A. M., & McKendry, R. A. (2020). Digital technologies in the public-health response to COVID-19. Nature medicine26(8), 1183-1192.

[4] Birant, D., & Kut, A. (2007). ST-DBSCAN: An algorithm for clustering spatial–temporal data. Data & knowledge engineering, 60(1), 208-221.

CDRC analysis uncovers new rural e-food deserts

CDRC analysis uncovers new rural e-food deserts

A new small area ‘e-food deserts index’ (EFDI) produced by the CDRC reveals that food deserts are not solely an urban phenomenon associated with neighbourhood deprivation. Analysis reveals the presence of rural ‘e-food deserts’ – neighbourhoods that suffer a dual disadvantage of poor access to grocery stores alongside comparatively poor provision of groceries home delivery services.

The multi-dimensional composite index for GB measures the extent to which neighbourhoods exhibit characteristics associated with food deserts. It draws upon measures of accessibility to grocery retail facilities, neighbourhood socio-economic, demographic and mobility indicators and also novel measures of e-commerce availability and usage. Interactive maps highlight the EFDI scores for all LSOAs in England & Wales and Data Zones in Scotland. In common with prominent research into urban food deserts in the late 1990s, there is a clear relationship between neighbourhood level deprivation and the presence of food desert-like characteristics in urban areas.

Dr Andy Newing

Dr Andy Newing is an Associate Professor in Applied Spatial Analysis based in the Centre for Spatial Analysis and Policy (CSAP) at the University of Leeds.

Many of these neighbourhoods benefitted from considerable investment in grocery retail opportunities following the widespread interest in urban food deserts. These investments included large-format store development (such as the Tesco Extra store in Seacroft, South East Leeds constructed as part of a high profile ‘regeneration agenda’). The persistence of food desert-like characteristics in these neighbourhoods highlights the importance of characteristics such as transport availability, household composition (especially the presence of pensioners), personal mobility and income in driving groceries accessibility, all associated with urban deprivation.

The research also highlights new drivers of inequalities in access to groceries between rural areas. In East Anglia for example, the smaller cities of Cambridge (Cambridgeshire) and Ipswich (Suffolk) and the towns of Colchester (Essex) and Bury St Edmunds (Suffolk) fare very favourably on our indicator. These localities benefit from excellent local provision of grocery retail opportunities are not associated with large pockets of urban-deprivation.

The predominantly rural nature of East Anglia means that outside of these principal urban settlements, many neighbourhoods fare relatively poorly on our index, with a limited presence or choice of proximate physical retail facilities and comparatively poor transport provision. Nevertheless, ranking and scores on our EFDI index are boosted in many of these neighbourhoods by the relatively good coverage of groceries home delivery services in this area. Most neighbourhoods benefit from considerable choice in provider, with most of the major grocers offering coverage among these neighbourhoods for their home delivery services, considerably lessening the barriers to groceries access.

By contrast, in rural mid-wales almost all households fall within our worst scoring decile. Access barriers associated with very limited provision of physical retail opportunities are exacerbated by comparatively poor provision of online groceries – in most cases with no choice of retailer and potentially very limited provision of delivery slots, coupled with a low propensity to shop online among households in many of these neighbourhoods. These areas, which we class as rural e-food deserts suffer from the dual disadvantage of comparatively poor access to physical retail opportunities alongside more limited provision of online groceries (home delivery).

The index highlights the barriers in providing services within some of our most remote and rural areas, where population density doesn’t warrant comprehensive food store provision and where retailers also face considerable costs in providing groceries home delivery services to dispersed populations. We hope this indicator will help to focus attention on these inequalities. The index can be explored via interactive maps available for all LSOAs in England & Wales and Data Zones in Scotland. The data and a more detailed user guide can be downloaded via the CDRC website.