Home » Internship

Analysing COVID-19 Mobility Responses (Case Study)

Analysing COVID-19 Mobility Responses through
Passively Collected App Data (Case Study)

Using smartphone GPS mobility data to understand population-scale responses to COVID-19 ‘lockdown’ policies in England.

Project overview

COVID-19 has prompted the enhanced use of novel mobility data in public life, offering fascinating insights into population-wide behavioural responses to Non-Pharmaceutical Interventions (NPIs) such as ‘lockdown’ stay-at-home orders. Here, we use privacy-preserving smartphone data to understand these trends at a regional scale over a longitudinal period spanning January 2020 to May 2021 for England, with a specific focus on examining adherence to policy measures on household visitation.

The concepts of adherence and fatigue to ‘lockdowns’ are highly debated ideas with limited observational evidence, despite their key role in supporting current policy assumptions. The SAGE report of 16th March 2020 underscored this when it said there was “(limited) evidence on whether the public will comply with the interventions in sufficient numbers and over time” (p.2) with respect to COVID-19 measures. Our study uses a novel measure of ‘house visits’ activity to cut out general noise and is explicitly purposed with better informing health policy interventions in the context of a public health emergency.

Data and methods

According to UK Government polling for the Centre for Data Ethics and Innovation (CDEI), 58% of over 2000 UK adults surveyed in Sept 2020 were either ‘quite comfortable’ or ‘very comfortable’ with “researchers using data to improve knowledge to help keep the public safe” during COVID-19, with just 14% being ‘quite’ or ‘very uncomfortable’. This finding was positive overall across all UK regions, all age groups, all income levels, all education levels, and whether or not people were worried about COVID-19 itself. There were also 16.5 million voluntary downloads of the NHS COVID App for modern smartphones in England and Wales in 2021. Clearly, there is a public demand for the harnessing of data to help tackle COVID-19.

Our study used anonymous, privacy-enhanced GPS smartphone mobility data from users who opted-in to data collection for research purposes under a GDPR compliant framework. Data was supplied by American and Italian location intelligence company Cuebiq, under their Data for Good program. We use unsupervised machine learning methods (DBSCAN) to make home and work area assignments, which are then taken out of user activities. Through a validated ‘process of elimination’ using POI analysis, we can then generate an aggregate measure of the proportion of de-identified users taking a house visit, for a given county area, on a given day. The output data is thus aggregated to strict privacy requirements set by Cuebiq for both temporal and spatial scales before it is analysed, yet still able to harness the precision inherent in such emerging data streams, in order to optimally inform public health policy under COVID-19. Limitations of the methods and data, including a potential lack of representativeness, were extensively discussed in the published findings. Importantly, the data could not accurately distinguish between visits to inside homes compared to outside garden areas.

Key findings

This LIDA project led to the publication of an original research paper ‘Household visitation during the COVID-19 pandemic’ in the Nature journal Scientific Reports in November 2021, detailing both methods and results.

Our results track the evolution of a measure of household visitation levels in English LTLAs (Lower-Tier Local Authorities) over time – notated as ‘HEngland,t’ throughout the study. This index value was a national level, calculated through the mean average of weekly levels for each of England’s 315 LTLA areas, excluding the Isles of Scilly due to sample size issues. This weekly measure of levels of household visitation was measured against a pre-pandemic baseline figure taken from across 13th January 2020 to 2nd March 2020. The baseline was specific to both each LTLA area, as well as to each day of the week, to account for relative changes in each locality.

Figure 1: Time-series showing levels of household visitation across Jan 2020-May 2021 (mean average Lower-Tier Local Authorities rate in England against area- and day-specific 2020 baseline), alongside new COVID-19 cases.  Source: https://www.nature.com/articles/s41598-021-02092-7/figures/1

Figure 1 from the paper here shows the evolution in ‘HEngland,t’ across the full study period, as well as the evolution of recorded COVID-19 cases. As can be seen, levels of household visitation dropped dramatically in late March 2020, dropping to an all-pandemic period low of –56.4% relative to pre-pandemic baseline levels on 29th March 2020. In Figure 1 we have marked ‘national lockdown’ periods as those when stay-at-home orders were in place, during which time household visitation was prohibited in almost all cases. By taking mean averages across these time periods, we can witness household visitation levels averaging −39.33% during the 1st National Lockdown (23/03/20 – 12/05/20) below baseline levels, compared to higher rates of average house visits activity recorded during the 2nd National Lockdown (05/11/20 – 01/12/20), when rates were only averaging −15.28% below pre-pandemic levels by comparison. We didn’t witness a great jump in household visitation in the immediate aftermath of the introduction of ‘support bubble’ exemptions in mid-June 2020.

Heading into the 3rd National Lockdown (06/01/21 – 07/03/21), mobility activity reduces pointedly ahead of the imposition of national restrictions, reflecting perhaps the impact of COVID-19 risk perception and/or the new Tiered restrictions announced on 19th December 2020 in response to the detection of the new Alpha variant in South-East England. These trends were reinforced by the imposition of the 3rd National Lockdown on 6th January 2021, which kept levels of household visitation at levels between the 1st and 2nd National Lockdowns at -26.22% below (06/01/21-14/02/21) baseline rates until approximately mid-February 2021.

At this point it was announced by the Prime Minister during a 10 Downing Street Coronavirus television briefing to the nation that 15 million people from the most vulnerable categories in JCVI Priority Groups 1-4 had received a first dose of COVID-19 vaccination. Almost immediately a significant rise in household visitation rates were witnessed by our metric ‘HEngland,t’ across England, such that by the 7th March 2021 levels of household visitation were comfortably above the pre-pandemic baseline, even though coronavirus regulations had stayed the same.

Figure 2: Hex cartogram maps illustrating comparative levels of regional disparities in visitation rate across the whole COVID-19 period studied and for each of the three ‘National Lockdown’ periods. Source: https://www.nature.com/articles/s41598-021-02092-7/figures/2

Figure 2 here illustrates the geographical variation in these household visitation rates for Local Authority Districts at LTLA scale, as mean averages across a) the entire COVID-19 period, and then, for b)-d), across the three National Lockdown periods respectively. These are presented as hex cartograms, prepared with assistance from the UK House of Commons Library. Some regional disparities are shown, notably between North and South, and between urban and rural areas. London boroughs, in particular, appear to have consistently higher relative rates of visitation against the pre-pandemic baseline than elsewhere in England.

Figure 3: Time-series showing levels of household visitation in two LTLA local authorities – Leicester and Liverpool – which experienced ‘local lockdown’ policies impacting household visitation, against the national level for England in grey. Source: https://www.nature.com/articles/s41598-021-02092-7/figures/4

Figure 3 here finalises this summary of our key results, by showing the findings when applied to two individual local authority areas that experienced specific and rigorous local restrictions to tackle sudden outbreaks in cases over summer 2020 – ‘local lockdowns’ as they became known in England. Here, the cities of both Leicester and Liverpool are shown to have exhibited a likelihood of different profiles of adherence to ‘local lockdown’ measures on household visitation. In the case of Leicester, despite a great reduction in visitation when local lockdown was at its strictest compared to the national trajectory, a serious rise in household visits (to above the national level for England) occurs just around the time of the first relaxation on 1st August 2020, even though this didn’t revoke the restrictions prohibiting house visits. By contrast, in Liverpool house visits had stayed meaningfully below the national figure for England throughout the summer period, including after regional measures were introduced on 22nd September 2020.

Value of the research

The research had been directly designed to inform public policy, aligned with LIDA’s commitment to using data for public good. Understanding actual levels of likely aggregate adherence to pandemic policy was highlighted as an area of importance by the House of Commons Health and Technology Select Committees joint report into the UK coronavirus response – “Coronavirus: lessons learned to date” – published in September 2021.

Many activities driving virus transmission are intimately connected to the mixing and mobility of individuals. Our observational findings on behavioural responses in house visits will therefore allow public sector agencies to better understand how English populations responded to a range of lockdown impositions and relaxations, as well as allow us to see how these responses may have been complicated and/or influenced by concurrent public messaging and prevalent COVID-19 risks. A mix of past national and local lockdown policies can therefore be optimised and/or evaluated using our results. The Scientific Reports research paper disseminating the results was highlighted in the ‘Behavioural Science and Insights Unit Weekly Literature Report’ of the UK Health Security Agency (UKHSA) in late November 2021.

The findings received significant coverage in the national British press, featuring in Metro, The Daily Telegraph, The Independent, Daily Express, Daily Mail, The I paper, as well as in other national-scale publications including the Yorkshire Evening Post, The Conversation and current affairs magazine The Week. This was supplemented internationally by mass online coverage from Yahoo! and MSN. According to Altmetric, as of 20th January 2021, the research paper has also been shared on Twitter to a combined total of 2.69 million followers.

Quote from project partner Cuebiq

“We’re proud of the exceptional and novel research led by University of Leeds, not only because it created impactful public goods, but also because it was achieved with an uncompromising commitment to data privacy and governance.”

Insights

  • Measures indicate adherence to household visitation restrictions was relatively high overall but waned both within and between subsequent National Lockdowns in England. This is rare observational evidence for shorter- and longer-term ‘fatigue’ in compliance with COVID-19 restrictions, at various stages of the pandemic lifecycle.
  • About 15th February 2021, when the Prime Minister informed the nation that 15 million people from the most vulnerable in JCVI Priority Groups 1-4 had been vaccinated, a significant and unprecedented rise in household visitation rates was witnessed nationally, to above pre-pandemic base rates, despite lockdown regulations staying the same. This indicates that people may have paid meaningful attention to levels of protection carried by the most vulnerable members of British communities when determining their visiting activities, and/or have adhered far less to relevant pandemic regulations once vaccinated.
  • Measures of household visitation indicate that household visitation activity was responsive to prevalent COVID-19 risk, ahead of the implementation of restrictions (i.e. Alpha variant in December 2020), as well as before they were officially lifted (1st and 3rd National Lockdowns), offering evidence individuals may respond to a perceived personal and/or collective risk of COVID-19 infection over and above current government policy or guidance.
  • Local lockdowns in Leicester and Liverpool indicated a likelihood of contrasting profiles of adherence over time to ‘local lockdown’ measures prohibiting household visitation, also highlighting the potential of smartphone mobility data to indicate waning population-wide adherence in a single aggregated local authority area (where sample size N > 10 is consistently satisfied, to protect against the risks from Statistical Disclosure).
  • Cuebiq mobility data for England is geographically representative across a series of temporal and spatial aggregations, and across several points in the pandemic for our sample, even if other factors of social representativeness remain rightly unknown.

Research theme

Health informatics & urban analytics.

People

Mr Stuart Ross, LIDA Data Scientist Intern

Mr George Breckenridge, LIDA Data Scientist Intern

Dr Mengdie Zhuang, Lecturer in Data Science, University of Sheffield

Prof Ed Manley, Professor of Urban Analytics & LIDA Fellow

Partners

Data provider: Cuebiq Inc., NYC, Milan

Funders: LIDA intern work funded by the CDRC (Consumer Data Research Centre), so in turn by the ESRC (Grant ES/L011891) of UKRI. Broader research project also supported by i-sense, so in turn by EPSRC (Grant EP/R00529X/1) of UKRI

Images

Open Access: Images licensed under a Creative Commons Attribution 4.0 International License, from Ross et al. (2021) Household visitation during the COVID-19 pandemic. Scientific Reports. Springer Nature.

Networking and Partnership Building: An intern’s perspective

Two jigsaw pieces being held close together over green grass

Networking and Partnership Building: An intern’s perspective

Introduction

I want my work to have an impact and I believe that harnessing the increasingly available abundance of data is one way of ensuring this. LIDA presents the opportunity to combine my sociological and quantitative/computational skills, and I feel grateful that my internship project with the CDRC at LIDA synchronizes my expectations perfectly.

I work on the OpenInfra project, exploring the potential of (crowd-sourced) open-access data (OpenStreetMap) in planning active travel infrastructure. Open data could lead to a more accessible and inclusive decision-making process by including citizens in the process of building the active travel infrastructure and network they want to use every day. However, the data is “messy”, constantly updated but still lacking completeness. It is open but not easy to access or use, and although it might have mapping protocols in place, this does not mean that there are no errors (my all-time favourite is the width value of –1).

To help reduce the scope of the problem, I decided to focus on accessible pedestrian infrastructure. One of the first things I did was search for relevant policy documents. The Inclusive Mobility guide was released over 10 years ago (it has now been recently updated), so I suspected that it might not contain the most up-to-date recommendations. I thought that familiarity with current qualitative research on accessible pedestrian infrastructure might identify what essential information on street elements might not be, as of now, representable in OpenStreetMap.

As I was searching for qualitative research on my subject, I discovered an on-going project at the University of Leeds that has various synergies with my project, so I contacted them. This was the first time I had ever reached out to someone to explore how two projects might collaborate together, so it meant stepping out of my comfort zone. Whilst not easy, it is proving to be very rewarding. Here, I will share some lessons learnt that, I believe, gave ground to successful networking and partnership building.

Seeking Partnership

Before I discuss building partnerships with external stakeholders, I want to highlight that the most important partnership to build is with your project team. Mutual trust and support between you and your supervisors are integral to advancing any project.

Writing that first email

In my case, I sought partnership to get a better grasp of my project and data needed for accessible pedestrian infrastructure planning. A clear idea of “why” gives purpose for reaching out. For me, it was helpful to think about the initial email as a cover letter. The following questions guided my email:

  • Introduce yourself: who are you? Why are you qualified to contact them?
  • The why: why are you contacting them (e.g. expertise in a domain, methods)? How did you find out about them?
  • Benefits: what are the potential benefits of them partnering with you?
  • Call to action: what do I want to achieve as a result of this email (e.g. organize a meeting)?

In my case, the trickiest part was to identify why they would be interested in meeting me. I approached this by reading their project website and an academic paper their team had published, trying to understand the project and agenda/factors that drove them as a team. I found that raising awareness of the struggles faced by people with disabilities is integral to their project. We also want to highlight the importance of mapping data relevant for accessible pedestrian infrastructure, so in my email I noted this overlap. I was careful not to overpromise or come across as too certain of their interest at this stage.

Initiating a partnership for the first time can be challenging. It took more than two weeks for me to sit down and write that email, not because of a busy schedule, but because I was worried about not receiving a reply. The key factor in overcoming this was acknowledging it and recognizing that it goes hand-in-hand with my imposter syndrome. Being honest with myself helped to put everything into perspective: nothing but time would be lost if I sent an email, but I would gain self-confidence and, potentially, a meeting.  I was also aware of my project team being positive about me contacting them, therefore trusting me enough to enable me to give a personal touch to the project. These little realizations, or rather self-reminders, were very reassuring and empowering, leading to my first successful initiation of partnership building.

Scheduling and running the meeting

When I got a positive reply from them, I was over the moon – proud of myself for having taken that first step! Yet, I also knew that the next step was the meeting scheduling. Retrospectively, I can say that scheduling requires active listening.  For example, there was a person in their team who currently lives in another time zone, hence I was asked to schedule meetings after 4PM GMT.  Little pieces of information like this might pave the way for a successful meeting before it even starts!

Leading a meeting was an unknown field for me. I had a myriad of questions ranging from chairing the first meeting to making sure that the meeting allowed for discussion of both projects in parity, as well as the potential bridges between them. Here, I took advantage of the fantastic LIDA community and asked my personal buddy to share her experience. I got an invaluable piece of advice – do not be afraid to communicate your aspirations and hopes for the meeting up-front. Indeed, from the first email enquiry, this collaboration was about communication and testing the ground, so the meeting did not have to be “perfect” to be productive. This realization took the pressure off my shoulders.

The meeting went really well: it reassured me that our project is timely and needed and, more importantly, it exposed me to new interdisciplinary ideas and applications of OpenStreetMap data. For example, we discussed the potential of addressing the qualitative-quantitative divide (often thought of as binaries), organizing walk-alongs and mapathons, and a question on using OpenStreetMap data for 3D modelling. Not all of these ideas may be realised, but the process of engagement and listening have broadened my perspective on OpenStreetMap and its applicability to qualitative research. Finally, it made me feel that I am working towards doing what I set out to do: combining my sociological and computational skills for social good.

Final thoughts

The entire experience of reaching out has not been just about networking and partnership building per se, but also stepping out of my comfort zone to suggest (and realise) ideas to my project team. It can be challenging to do if you (as I was) are assigned a project that is far from your field of expertise. Here, again, I want to reiterate the importance of building collaborative working with your project team – it takes time, trust, and willingness to communicate honestly, especially about fears and worries. Indeed, imposter syndrome can hinder my motivation more often than I would like, but moving one step at a time and, most importantly, collecting and appreciating those steps have been invaluable, especially in the face of stakeholder meetings.

The experience of networking and partnership building has strengthened the central position of communication in a (data science) project. Not only does it help to promote or disseminate its outputs, but also to shape one’s own perspective towards the project itself. For me, listening emerged as a key tool of effective communication, that perhaps needs to be given more credit in data science if the project is to have a real-life impact.


Author: LIDA Data Scientist Intern, Greta Timaite. Greta has a BA in Sociology and an MSc in Big Data and Digital Futures from Warwick University.

GOLIATH: Geographies of Lifestyle, Activity, Transport and Health (Case Study)

GOLIATH: Geographies of Lifestyle, Activity, Transport and Health
(Case Study)

Consumer data can provide insight in to a wide range of human activity, but there is a trade-off between privacy and utility of the data.

Project overview

Consumer data collected by commercial providers have huge potential for a range of research purposes but can be challenging to access as they are often held in secure environments. Secure handling of these datasets is crucial, as consumer data contains sensitive attributes (e.g. address) or commercially sensitive data (e.g. they have been purchased or contain licenced information). This project provides a proof of concept for creating enhanced and aggregated versions of consumer datasets for research purposes, and a dashboard for exploring those data.

Data and methods

Taking securely held consumer datasets within the Consumer Data Research Centre (CDRC), the objective of the project was to produce non-disclosive and aggregated versions of the data whilst maintaining the unique characteristics and value of those data. An R Shiny app visualising the aggregated data has been developed to showcase the utility of non-disclosive datasets for research purposes. Based on a randomised sample of Whenfresh/Zoopla consumer data, key matrices such as median price and affordability are calculated for different property types at the Middle Layer Super Output Areas (MSOA) level. Additionally, open data is used to calculate further metrics, for example, the attractiveness of an area based on Census flow data. The next steps include improving the efficiency, loading and updating times of the R Shiny app so that it can be populated with additional datasets.

Key findings

Using existing data, especially anonymised and aggregated consumer data, this research project can be seen as a proof of concept for an ‘alternative’ or ‘big data’ census. Different data types, e.g. time series, static, and origin-destination flow data, have successfully been combined and can be explored by the user in a dashboard (Figure 1).

Figure 1 – Screenshot of GOLIATH dashboard

Value of the research

The prototype R Shiny app forms the basis for further work in providing a dashboard for exploring local area statistics. Moving forward, other consumer data could be included as part of GOLIATH, for example, transport and lifestyle datasets. Utilising consumer data in addition to traditional census counts contributes to efforts to create an ‘alternative’ or ‘big data’ census.

Insights

  • Devised methods for the aggregation and calculation of metrics for secure consumer data
  • Developed a prototype R Shiny App for the visualisation of spatially disaggregated information

Research theme

Urban analytics

People

Maike Gatzlaff
LIDA Data Scientist Intern

Dr Nik Lomax
Co-Director of the Consumer Data Research Centre

Professor Mark Birkin
Co-Director of the Leeds Institute for Data Analytics

Dr Will James
Research Fellow, University of Leeds

Partners

The Consumer Data Research Centre

Funders

The data for this research have been provided by the Consumer Data Research Centre, an ESRC Data Investment, under project ID CDRC [Project Number], ES/L011840/1; ES/L011891/1.

Measuring Ambient Populations during COVID-19 (Case Study)

Graph showing footfall data results

Measuring Ambient Populations during COVID-19 in Leeds City Centre
(Case Study)

The COVID-19 pandemic led to lockdowns being implemented all over the world, including in the UK.  The aims of the project were to investigate relevant data sources for modelling the ambient population of Leeds City Centre during COVID-19 and analysing the impacts that lockdown policies had on urban footfall.  The research builds on previous work undertaken with Leeds City Council by intersecting key dates from the English lockdowns and integrating these into machine learning models to assess the importance of different aspects of lockdowns. It also predicts what “business as usual” may have been like had there been no pandemic.

Analysis notebooks and scripts can be accessed at https://github.com/tbalbone31

Data and methods

Leeds City Council have been collecting footfall data for more than a decade. The data were wrangled and aggregated to create a history going back to 2008.  These data were then analysed alongside key lockdown dates to determine where trends in urban footfall intersected, raising questions about what aspects of these policies might have had the most impact.  The data cover a relatively small geographical area of Leeds City Centre and only reflect pedestrian traffic going past the locations identified by the cameras.  There are issues with data quality, such as potential double counting, periods of time with missing data and inconsistent file formatting, however it covers a large temporal scale and many problems can be worked around.

Google COVID-19 Community Mobility data was analysed as a potential alternative data source to the Council data.  It shows changes in mobility from a baseline for six different destinations (see the website for more details).  The smallest relevant spatial coverage is the Leeds City Region.  This was considered too large to isolate any changes impacting the city centre, making comparison of trends difficult.

The Council footfall data were resampled to show daily counts on which analysis was then conducted.  Visual analysis was undertaken to identify footfall trends over the course of the pandemic against key dates pertaining to the implementation and lifting of certain COVID-19 restrictions. These key dates were decided from research into when major legislation came into force or government announcements about restrictions were made.  The questions generated from initial analysis were then explored by creating a series of machine learning models using Random Forest Regression in the Python SciKit Learn package. 

The first model included a series of input variables to represent different aspects of society that had restrictions placed on them alongside other external conditions (such as weather, school/bank holidays, day of week, etc).  Variable importance was used to identify what (if any) aspects of lockdown might be significant in predicting future changes in footfall. The second model omitted any lockdown related inputs and was designed to make predictions on what “business as usual” might have been like had the pandemic not happened. 

Due to the inherently ordered nature of time series data, both models were validated using a method known as “Walk-Forward Validation” instead of the default Cross-validation included in SciKit Learn and often used on Random Forest Ensembles.  The implementation of Walk-Forward validation allows the model to be retrained after every prediction on the validation dataset, essentially “walking forward” through the time series.  This avoids potential data leakage because of the randomised nature of Cross-validation.

Key findings

The chart below shows the resampled footfall data intersecting with key dates from COVID-19 restrictions.

Key dates are shown as a dotted line with a number relating to a key.  Red zones indicate “official” lockdowns whilst orange represents periods where a variety of restrictions were in place but in the process of being lifted/introduced individually.  A summary of how this impacted footfall is below:

  • Footfall started to drop immediately after the announcement on 16th March 2020, no official restrictions implemented.
  • After non-essential shops and schools reopened on 15th June 2020, footfall started to rise again.
  • Footfall continues to rise through summer until around 22nd September 2020 when some restrictions were announced.
  • Footfall rises whilst Leeds is in tier 2 and 3, potentially because gatherings are only permitted in public spaces.
  • The second and third lockdowns drive footfall back down again until restrictions begin to ease again in April 2021.

The first machine learning model was intended to explore whether any lockdown variables would be significant in predicting future changes in footfall.  Variable importance (top 10) is shown below.

The most important lockdown-related features were indoor entertainment and non-essential retail.  Whilst this is only an initial model and not a definitive conclusion, it does help indicate what aspects of lockdown might have impacted pedestrian traffic in the city centre more than others.

The second model was designed to test how useful the data would be in predicting what “business as usual” may have been like.

There was little difference between error scores across different numbers of trees, so a compromise of the best score and least processing power (500 trees) was chosen.  The model predictions using this hyperparameter are shown below.

Results from this initial model are by no means definitive, however the potential to quantify how much footfall has been lost exists.  For example:

  • Average daily footfall in the lead up to Christmas (taken as 30th November to 24th December 2020) was approximately 36% lower than predicted.
  • Average daily footfall over the school holidays was approximately 63% lower than predicted.
  • Approximate footfall loss for individual Bank Holidays was also calculated.  Most recorded over 90% lower than predicted values except for the August Bank Holiday which was around 22% lower.

Value of the research

Initial analysis has already been delivered to Leeds City Council.  An aggregated dataset of footfall camera data has been created and is available on the Consumer Data Research Centre (CDRC) Data Store for future research.  The initial models developed can be used and refined by future researchers and develop more accurate predictions, whilst more specific time series packages can be explored.

Insights

  • Urban footfall and ambient population was significantly impacted by COVID-19 lockdown policies (as was intended).
  • Closure of Indoor Entertainment and Non-Essential retail appear to be the most important lockdown-related factors in predicting footfall change.
  • Consideration must be given to how time series data is processed in classic machine learning models such as Random Forests.

Research theme

Urban analytics

People

Tom Albone – Data Scientist Intern (LIDA)

Dr Nick Malleson – Professor of Spatial Science

Professor Alison Heppenstall – Professor in Geocomputation

Dr Vikki Houlden – Lecturer in Urban Data Science

Dr Patricia Ternes – Research Fellow

Partners

Leeds City Council

Funders

Consumer Data Research Centre

Isolation and Exclusion in a Social Distancing COVID-19 World

Isolation and Exclusion in a Social Distancing COVID-19 World
(Case Study)

CDRC Data Scientist Intern, Rosalind Martin, working with Professor Susan Grant-Muller, Professor Alison Heppenstall and Dr Vikki Houlden from the University of Leeds, and Professor Rachel Franklin from the University of Newcastle, has produced a dashboard that identifies geographical areas which might experience increased isolation and exclusion as we leave the COVID-19 pandemic and lockdowns.

Project overview

Although much work has already been completed which identifies individuals most at risk from health impacts of the COVID-19 pandemic, there is considerable uncertainty regarding which societal impacts will persist as the UK leaves COVID-19 lockdowns. This project was undertaken with the aim of advancing the understanding of the social and spatial impacts of emergence from lockdown, particularly understanding how previously implemented restrictions will have impacted individuals and households. Using SPENSER, a synthetic population, we have identified individuals and households at risk from five COVID-19 restrictions: shielding, school closures, limited household interaction, furlough and limited to local area, along with households at risk from unique combinations of these five scenarios. This has been translated onto a dashboard which displays additive counts of household level impacts at the Middle Layer Super Output Area (MSOA) level.

Data and methods

We applied five COVID-19 restrictions (that cover a breadth of socio-economic impacts) to individuals and households across Yorkshire and the Humber. Our population came from SPENSER, a synthetic micro-population, along with additional characteristics obtained from supplementary datasets. The criteria for an individual or household to be impacted by each restriction were influenced by external statistics and are as follows:

Shielding: a randomly extracted 4.83% of the population who had been classified as in poor health, based on answering that their day-to-day activities were limited a lot due to a long-term health problem or disability in the 2011 census. The ailing population is representative of MSOA level trends and split into four age categories (0-15, 16-49, 50-64 and 65 and over).

School closure: households with at least one child aged 13 or under. This age was chosen as it is the age cut-off for forming a COVID-19 ‘childcare bubble’.

Limited household interaction: all single-person households as determined by a household size of one (a pre-existing characteristic in the SPENSER data).

Furlough: the proportion of individuals working in (1) Accommodation and food service activities, (2) Arts, entertainment and recreation, and other service activities and (3) Wholesale and retail trade, repair of motor vehicles and motorcycles industries, were identified at the MSOA level from 2011 census data and replicated proportionally in our SPENSER population. The average percentage of furloughed employees was then identified. These were 61.3%, 67% and 13.8% respectively.

Limited to local area: all households who live in an MSOA where there is no accessible green space within 1km. These data were from CDRC’s Access to Healthy Assets and Hazards dataset.

Once all the restrictions had been applied to the households, each household was assigned to a scenario which represented a unique combination of all of the five restrictions. There were 32 scenarios in total. This enabled additive counts of impacts on households to be calculated. These final outputs are displayed on the accompanying dashboard. Counts of household impacts are displayed alongside total household counts for each MSOA and Indices of Economic Insecurity, produced by Smith et al. (2020) and used with permission.

Front page of the Isolation and Exclusion dashboard

Key findings

This project has resulted in the development of an interactive dashboard, showing counts of household-level impacts at the MSOA level for Yorkshire and the Humber. Although patterns of household-level impacts are difficult to see from these maps, this work has explored how to use proxy data in order to identify individual- and household-level impacts from COVID-19 restrictions, and begun to unpack the complexities of combining data at the household level. This is something that must continue going forward as academics and policy makers continue to face the challenges that accompany understanding the social and spatial impacts of the emergence from lockdown.

Through this work, it has become apparent that certain COVID-19 specific datasets do not exist yet (such as the uptake of ‘support bubbles’) so assumptions have to be made on the extent of impacts. This detail should be added in to future tools when possible. Where data do exist, they are often lacking spatial resolution and so it has to be assumed that patterns have coarse geographies. This detail should be added in to future predictions when possible. Going forward, work must utilise more specific and detailed datasets.

The use of SPENSER as a micro-population has been foundational to understanding the impact of restrictions on individuals and households. It is recommended that any work going forward on this matter also uses small area population data as without it, any patterns of social and spatial impacts of emergence from lockdown will be coarse from the start.

Value of the research

The COVID-19 pandemic, with its associated lockdowns and restrictions, has brought vast change to the routines of families across the world. This work has had a small part in deciphering what these changes could mean for those across Yorkshire and the Humber. Dashboards with mapping have shown to be an important tool for understanding how health impacts of COVID-19 are distributed, this same logic applies to how lockdown restrictions combine spatially.

The dashboard can be found at: https://isolationpostcovid.azurewebsites.net/

Insights

  • COVID-19 causes health, social and economic impacts
  • Creation of a dashboard that displays different flavours of lockdowns
  • Supports pre-existing conclusions regarding the impact of COVID-19 lockdowns
  • Interrogation of complex layers of information aids policy reform
  • Current data are insufficient to capture COVID-19 lockdown impacts

Research themes

  • Urban Analytics
  • COVID-19
  • Spatial Inequality
  • Interactive Visualisation

People

Rosalind Martin, Data Scientist Intern at LIDA/CDRC

Professor Rachel Franklin, Professor of Geographical Analysis at the University of Newcastle

Professor Susan Grant-Muller, Chair in Technologies and Informatics at the University of Leeds

Professor Alison Heppenstall, Professor in Geocomputation at the University of Leeds

Dr Vikki Houlden, Lecturer in Urban Data Science at the University of Leeds

Partners

Consumer Data Research Centre (CDRC)

Funders

This project was funded by the Consumer Data Research Centre.

Funding for SPENSER is provided by The Alan Turing Institute, project reference R-LEE-004.

References

Smith, D., Moon, G. and Roderick, P. 2020. Indices of Economic Insecurity: Version 2, August 2020. GeoData Institute, University of Southampton. [Online]. [Accessed 18th March 2020. Available from: https://www.mylocalmap.org.uk/iaahealth/

Being A Data Science Intern

Photo of Rosalind Martin outdoors wearing a blue coat and a scarf

Being A Data Science Intern – insights, challenges and benefits

Rosalind is one of the Leeds Institute for Data Analytics’s (LIDA) current Data Scientist Interns, with a background in Geography (BSc) and Geographical Information Systems (GIS MSc).

I’ve always been a fan of physical geography, but as module choices expanded throughout my degrees I was increasingly drawn to (spatial) data modules. I love using GIS and coding to solve big data challenges.

My internship has been made up of two six-month projects, both funded by the Consumer Data Research Centre (CDRC). My first project was titled ‘Isolation and Exclusion in a Social Distancing Covid World’. Here, I worked under the supervision of academics from the Universities of Newcastle and Leeds, aiming to identify people and households at risk of isolation and exclusion as a result of Covid lockdown rules.

Photo of Rosalind Martin outdoors wearing a blue coat and a scarf

My second project is in the world of nutrition where I’m working closely with Leeds academics, Dr Michelle Morris and Vicki Jenneson, and a retail partner. I am designing an open access tool which will assist retailers in implementing new policy restricting the promotion of foods that are high in fat, salt and sugar – a crucial part of reducing obesity in the UK.

What has been my experience of the LIDA Internship Programme?

Aerial view of desk with hands over a laptop keyboard, pot plant, glasses and pen

As I’m sure many people would echo, the Covid pandemic has placed our jobs in unfamiliar situations. The reality of this internship being my first full-time post means that I’ve not been comparing my days to ways I have worked in the past. Instead, my experience has been shaped by remote team working with virtual training, coffee breaks and meetings. Although working from home (WFH) comes with its own challenges and complexities, I believe this has given me the capacity to be thankful to work on engaging projects rather than pining for something I used to have!

Due to the pandemic, many interns have been able to experience otherwise inaccessible conferences and workshops as they’ve transitioned online. I’ve been to events held by The Alan Turing Institute, the Royal Society, CDRC and more! Working as a remote cohort, the interns have set up coffee breaks and a weekly “pub” session to replicate those water-cooler conversations, lost due to WFH. This space allows us to talk about our projects, seek help from others who have different skillsets and to simply get to know each other.

What have I been proud to have accomplished so far on the internship?

Coding while WFH has been a true test of my perseverance. In the absence of spinning my chair around to ask for a fresh pair of eyes, I’ve really had to learn how to use documentation and online forums to navigate my coding challenges. I’ve also learnt how best to send questions (with reproducible examples) to other interns or my supervisors. I’ve seen a visible increase in my confidence and ability between my first and second projects, and I know this skill will continue to serve me in future careers. 

What are my quick hacks for getting the most out of the internship?

  • Obtaining data always takes longer than you think: be proactive in learning methods, using dummy data and reading around the subject while you wait
  • Talk to the interns: each intern has a different background and therefore their own unique combination of skills. Ask questions and be ready to offer your own experiences if asked
  • Write detailed descriptions of your GitHub commits: your future self will thank you when you return from Annual Leave to find you have a detailed record of what you were working on before you left for your holiday

How has working with the Consumer Data Research Centre (CDRC) helped with the delivery of my first project?

My first intern project aimed to identify those at risk of isolation and exclusion under Covid lockdown rules. In order to make detailed predictions of impacted individuals and households, I worked with a micro-simulated synthetic population called SPENSER. This CDRC and Alan Turing Institute funded project was essential for me to make predications at the household level. I also used other datasets to support my work including CDRC’s Access to Healthy Assets and Hazards dataset. The availability of these datasets enabled me to explore the Covid restrictions that were thought to negatively impact an individual’s risk of isolation.

How will this Internship help me progress my career in data science?

I have learnt more of the mechanics of data access throughout both of my projects – ranging from obtaining freely-available through to applying for safeguarded datasets (including how long the process can sometimes take!). In my projects, I have had the opportunity to talk to the City Council, UK and international universities, not-for-profit organisations and retailers. Speaking to people in a wide range of data roles has helped me to better understand the opportunities available in data science, and how roles interact with non-data scientists. 

Why would I recommend the LIDA Data Science Internship?

The LIDA Data Science Internship has given me the opportunity to own the delivery of two data science projects situated in very different subject areas. This has really expanded my understanding of how data can be used to solve very complex but nationally topical challenges. Owning the delivery of the projects as someone straight out of their Master’s has been a challenge, but I have been well supported by experienced supervisors and the extended LIDA network. With the breadth of internship projects and collaborators available across and in partnership with LIDA, the internship is the place to be!

LIDA is currently recruiting for its next cohort of Data Scientist Interns, due to start at the end of September 2021, with several projects taking place within the CDRC. Click here for more information and to apply.

Celebrating collaboration: the CDRC Masters Dissertation Scheme

Celebrating collaboration: the CDRC Masters Dissertation Scheme

Celebrating collaboration: the CDRC Masters Dissertation Scheme. Thursday 29th April 2021, 10:30-15:00.

The CDRC Masters Dissertation Scheme, now in its tenth year, has been successfully run by the Consumer Data Research Centre for the last seven years. The event celebrated the success of the scheme, and explored the changing nature of academic-industry collaboration. Masters students who had gone through the scheme presented project case studies, and a selection of alumni spoke of the positive impact the scheme had had on their data science careers. A panel session rounded off the event with a discussion of the possibilities and ambitions for the next seven years of the Masters Dissertation Scheme. The event was attended by industry partners, MDS alumni, and the CDRC team including Paul Longley, Alex Singleton, and Jonathan Reynolds.

Speaker biographies

Programme

1030-1130: The Business of Engagement. Session recording (Longley 0:06, Dugmore 7:05, Reynolds 28:27, Squires 41:21)

  • Introduction & welcome: Professor Paul Longley, Director, CDRC
  • The evolution of academic-industry collaboration: Keith Dugmore, Demographic Decisions. Slides
  • CDRC: Where are they now? MDS 7 years on: Dr Jonathan Reynolds, Deputy Director (Oxford), CDRC. Slides
  • The business of engagement: the firm’s perspective: Martin Squires, Director of Advanced Analytics, Pets at Home. Slides

1145-1245: Alumni presentations. Session recording (Murage 2:16, Davies 25:10, Tonge & Montt 45:53)

  • Nombuyiselo Murage, Tamoco. Dissertation at Tamoco. MSc Geographic Data Science, University of Liverpool. Slides
  • Alec Davies, Pets at Home. Dissertation at Sainsbury’s. MSc Geographic Data Science, University of Liverpool, PhD Geographic Data Science. Slides
  • Christian Tonge, Movement Strategies. MSc Geographic Data Science, University of Liverpool, and Cristobal Montt, Movement Strategies. MSc Data Science, City, University of London. Dissertations at Movement Strategies. Slides

1400-1505: Alumni presentations (continued) and panel discussion. Session recording (Ushakova 1:48, Samson 21:29, Panel 37:26)

  • Alumni presentation: Dr Anastasia Ushakova, Senior Research Associate, University of Lancaster. Dissertation at British Gas.
    MSc Public Policy, UCL; PhD Computational Social Science. Slides
  • Alumni presentation: Nick Samson, Associate Director, CBRE. Dissertation at British Gas. MSc Geographic Information Science, UCL. Slides
  • Panel Discussion. The next 7 years. Achievements and ambitions: Alex Singleton, Deputy Director (Liverpool), CDRC;
    Samantha Hughes, Analytics Innovation Manager, Avon; Martin Squires, Director of Advanced Analytics, Pets at Home.
  • Thanks & conclusion: Professor Paul Longley, Director, CDRC

Nick Samson, 2014 MDS alumnus. Dissertation at British Gas. Project title: Can smart meters save consumers and British Gas money and carbon by pinpointing which consumers are most likely and best placed to install insulation in their homes?