CDRC affiliated researchers show how check-in activity between different venues can support urban planning
Researchers in the Geographic Data Science Lab at the University of Liverpool have shown how mobility data can provide insights into the structure and function of cities.
They conducted a study of human behavioural patterns using a longitudinal mobility dataset from the Foursquare Future Cities Challenge (FCC) to describe check-in activity (movement) between different venues (point of interests) across ten global cities including London, New York and Singapore. From this analysis, the researchers developed a Geographic Data Science framework that transformed the Foursquare check-in locations and user origin-destination flows data into knowledge about the emerging forms and characteristics of cities’ neighbourhoods.
Work like this will enable urban planners to understand more about the nuances of human behaviour to create better experiences for the population and guide urban development.
Researchers demonstrate the value of consumer data for population dynamics and utilities research.
A team of five researchers, based in Liverpool’s Geographic Data Science Lab (GDSL), recently took part in a three-day hackathon organised by O2, Virgin Media and Northumbrian Water. The hackathon, which took part in a hybrid online/remote format, comprised teams from many different backgrounds, including app design, UX, digital services and academia.
The team, led by Patrick Ballantyne, comprised of GDSL PhD students; Danial Owen and Sian Teesdale, and CDRC researchers; Zi Ye and Meixu Chen, with project support from CDRC Deputy Director, Professor Alex Singleton.
Utilising some impressive datasets from O2/Virgin Media and Northumbrian Water the team presented their take on “The value of consumer data for population dynamics and utilities research”.
Over the course of three days, the Liverpool team constructed a UK customer geodemographic, using data on audience insights from O2/Virgin Media, to capture the search interests of O2/Virgin Media customers.
They then applied their customer geodemographic, utilising the excellent O2 motion and Northumbrian water datasets, to demonstrate applications to population dynamics and utilities research. They investigated the recovery of different residential areas following the COVID-19 pandemic (see application 1), and the applications of geodemographics for selective targeting of water meters where they would provide better value for customers (see application 2).
Team leader Patrick Ballantyne reflected upon the experience: “This hackathon was a great opportunity to collaborate with some of my excellent colleagues on a short and exciting project, drawing upon our individual skills and talents. It enabled us to get a better understanding of how new sources of consumer data, such as O2 motion, can be used to provide solutions to societal problems”.
It is hoped that the CDRC will establish a partnership with O2/Virgin Media, enabling researchers to use their data, to provide solutions and empirical evidence to help answer societal problems.
CARTO add CDRC dataset to their Spatial Data Catalog and use it to explore UK retail centres
CARTO, one of the world’s leading location intelligence platforms has added CDRC’s Retail Centre Boundaries dataset to its Spatial Data Catalog. The dataset contains spatial boundaries for Retail Centres across the UK, as well as measures of supply vulnerability, online exposure, a clone town measure, E-resilience, and a hierarchical classification. It is based on a hexagonal H3 grid, designed for spatial analysis at scale. This gives the dataset some unique benefits when cross-analysing with other datasets.
Miguel Álvarez, Lead Data Scientist and Helen McKenzie, Geospatial Advocate at CARTO took advantage of this to explore UK retail centres. They looked for patterns by cross-analysing CDRC data with external datasets from CARTO’s extensive Spatial Data Catalog. One insight from their analysis is that, on average, Regional Centres have by far the highest number of employees per centre, averaging at around 40,000 people working within them. However, in total Town Centres (e.g. Arnold, Nottingham) are by far the biggest employer with over 600, 000 people working there.
Another CDRC dataset available on CARTO’s Spatial Data Catalog is our “Retail Centres Typology”, which is a previous version of Retail Centre Boundaries, based on 2015-18 data. However, the two datasets are not directly comparable, as they contain different information and geometric types.
If you want to go exploring using a CDRC data please visit our data catalogue, which has over 80 datasets covering the following topics: Population & Mobility; Retail Futures; Transport & Movement; Finance & Economy; and Digital.
The CDRC team were delighted to collect four awards at the University of Leeds inaugural Research Culture and Engaged for Impact Awards earlier this week.
The awards celebrated the role that all members of the research community – participants; collaborators and partners; academic, research and technical staff; professional services and students – have to play in developing and promoting a positive and inclusive research culture, as well as contributing to the impact our research makes locally, nationally and internationally.
CDRC Co-Director, Professor Mark Birkin, commented “CDRC has prospered as a research centre through an inclusive approach with a commitment to invest in the future of all team members, and through the development of robust partnerships with outside organisations. I am delighted that these values have been recognised and validated so generously through the Research Culture and Impact Awards at the University of Leeds.”
This collaboration with the IGD (Institute of Grocery Distribution) and their 20 retailer and manufacturer members has enabled us to trial large scale consumer interventions that incentivise healthy eating. This partnership has built on previous collaborations such as the strategic partnership between Leeds Institute for Data Analytics and Sainsbury’s and the ESRC Strategic Network for Obesity cementing a leading reputation for trusted cross sector collaboration to effect change in the food system.
Team members: Dr Michelle Morris, Dr Victoria Jenneson, Dr Stephen Clark, Diogo Ann Onuselogu, Alexandra Dalton, Francesca Pontin, Hannah Skeggs (IGD), Becky Shute (Sainsbury’s), Paul Evans and Dr Emily Ennis.
Engaged for Impact – Finding a better way
This award recognises all the ways in which new thinking and acting, new products and knowledge, lead to creating and galvanising change and innovation.
This research revealed shortfalls in available nutritional information and practical implementation guidance for the UK Government’s Nutrient Profile Model (NPM). The NPM will be used as the basis for restricting the placement of certain foods in supermarkets as part of the UK Obesity Strategy – but data availability does not meet legislative purposes. Having consulted with industry nutritionists from retail and manufacturing companies to develop recommendations for industry and the UK Government, the team are now developing an online NPM calculator to support implementation of legislation in an open access, scalable and transparent way.
Team members: Dr Victoria Jenneson, Dr Michelle Morris and Rosalind Martin.
Engaged for impact – Caring for the future
This award recognises research impact that’s likely to build over time, leading to a fairer, safer and more equitable world and healthier environment.
Winning project: Understanding and improving the carbon footprint of school meals in Leeds
Working with Leeds City Council, this project changed council practice for designing climate-friendly school menus, by co-creating a Carbon Calculator assessing food’s environmental impact. In collaboration with the Leeds Social Sciences Institute, the project team were able to design a suite of engagement activities supported by the ESRC-funded Local Accelerator Fund. This allowed the team to use data from the tool to develop an online game and classroom activities to encourage primary school children to think about the planet’s future through their own food choices.
Team members: Dr Emily Ennis, Alexandra Dalton, Dr Michelle Morris, Mel Green, Kevin Mackay (Rethink Food), Polly Cook (Leeds City Council), Ellie Salvidge (Leeds City Council) and Gillian Banks (Leeds City Council).
Research Culture – Open research and impact
This award recognises initiatives that increase the transparency, collaboration, inclusivity, reproducibility and efficiency of research processes to build trust and accountability. It focuses on aspects such as open access and open data, and promoting the use of open platforms for sharing research data, activities, outputs and impact.
Winning project: Opening up data science to solve real-world problems
CDRC Leeds were recognised for building trust and accountability through rigorous governance and infrastructures, including our virtual research environment (Leeds Analytics Secure Environment for Research) and our research management process. As well as encouraging transparency and reproducibility by creating diverse types of derived data products, aimed at diverse groups, from policy makers and researchers, to activists and children.
“[The CDRC]appears to be a beacon of good practice and it would be useful to transfer the ways of working / methods and approaches to infrastructure and skills to others.”
Research Culture Awards Judging Panel
Team members: Professor Mark Birkin, Professor Ed Manley, Dr Nik Lomax, Dr Emily Ennis, Adam Keeley, Dr Pete Baudains, Kylie Norman, Robyn Naisbitt, Mel Green, Oli Mansell and Paul Evans.
CDRC’s Dr Michelle Morris and Kylie Norman were also included in an award won by our colleagues at Leeds Institute for Data Analytics:
Research Culture – Equality, diversity and inclusion in research
This award recognises initiatives that make positive changes to embed a culture of equality, diversity and inclusion in research.
Winning project: Championing recruitment for diversity on the LIDA Data Scientist Development Programme.
Team members: Kylie Norman, Dom Frankis, Dr Michelle Morris and Professor Nick Malleson.
Dr Emily Ennis, CDRC Research and Impact Manager commented “These awards demonstrate CDRC’s commitment to fostering open, collaborative, and co-designed research with our external partners in a way that uses data science for public good. It has been inspiring to see our research projects recognised for their impact to society beyond academia, thanks to our partnerships in retail, education, local government, the charity sector, and education, among others.
Additionally, we have also seen recognition for research led by data scientists across a range of career stages and disciplinary backgrounds, as well as appreciation for the integral role professional services and technical staff play in building open and impactful research within CDRC.”
What we choose to put into our shopping baskets and how we make those choices will come under the microscope in a series of pilot trials designed to encourage healthy and sustainable diets.
Data analysts from the University of Leeds have joined forces with social impact organisation, the Institute of Grocery Distribution (IGD), to test different ways to encourage healthy and sustainable eating.
They are working in partnership with 20 leading retailers and manufacturers, including Morrison’s, Sainsbury’s and Aldi, to trial different strategies, including signposting better choices, the positioning of products in shops and online and the use of influencers and recipe suggestions.
Some have already begun to use some of those techniques in real-life settings as part of the research designed and implemented by the Leeds Institute for Data Analytics (LIDA) and the Consumer Data Research Centre (CDRC).
Researchers from LIDA and CDRC will analyse the results by capturing and measuring sales data from each intervention, enabling the project group to see exactly what is going on in people’s shopping baskets and assess what truly drives long-term behaviour change.
Dr Michelle Morris, who leads the Nutrition and Lifestyle Analytics team at LIDA and is a CDRC Co-Investigator, said: “I am passionate about helping our population move towards a diet that is both healthier and more sustainable. I believe that unlocking the power of anonymous consumer data, collected by retailers and manufacturers, is a really important step towards this goal.
“Working with the IGD and its members to evaluate their healthy and sustainable diets programme is very exciting – testing strategies to change purchasing behaviour and evaluating the wider impact of these changes.”
The pilot trials have been funded by IGD and form a key part of the charity’s Social Impact ambition to make healthy and sustainable diets easy for everyone.
Hannah Pearse, Head of Nutrition at IGD, said: “We want to lead industry collaboration and build greater knowledge of what really works. Our Appetite for Change research tells us that 57% of people are open to changing their diets to be healthy and more sustainable, and they welcome help to do it. But we also know that people don’t like to be told what to do and information alone is unlikely to change behaviour.
“We believe consumers will make this transition if we make it easier for them; that’s why we are delighted to be partnering with our industry project group and our research partners at the University of Leeds, to pilot this series of interventions over the coming months. The team at LIDA are experts in capturing, storing and analysing big data and have a variety of academic specialties that will be critical for this work.”
The work being carried out by CDRC researchers at the University of Leeds is unique because it will use the secure infrastructure at LIDA to allow retailers and manufacturers to share anonymised transaction data over a sustained period of time.
It is hoped that the results of the first pilot trial will be published towards the end of this year.
Successful roll-out of COVID-19 vaccines requires complex logistical delivery to help ensure everyone can receive their dose. England has established over 1700 vaccination sites distributed across the country to help provide vaccines to the population. Excellent progress has been made so far with over 95% of eligible people having received their first dose. Despite this, there are concerns that gaps in the location of vaccination sites may limit the opportunity for equitable uptake in certain communities.
This blog explores the distribution of vaccination sites across England to identify if it is an important factor in explaining vaccination uptake.
Measuring Accessibility to vaccination sites
We used open data on the location of vaccination sites on the 26th of March 2021 from NHS England. There were a total of 1753 sites across England. The NHS notes that 99% of the population live within 10 miles of their nearest vaccination site.
We measure accessibility through estimating the time-weighted road network distance of each postcode in England to its nearest vaccination site. We then calculate the average for MSOAs (Middle Super Output Areas), equivalent to large neighbourhoods within towns or cities (average population size ~7000 people).
The map below shows accessibility to the nearest vaccination sites for England. As expected, accessibility is best in urban centres where the average drive-time to the nearest vaccination site is often <2 minutes, around 1 km on urban roads. In contrast, the largest travel times were in remote rural areas where access was poorest. Here, residents could often have to travel more than half an hour or over 20 km to access their nearest vaccination site. This suggests that the ‘as the crow flies’ metric to establish accessibility, used by the NHS to suggest that 99% of people in England live within 10 miles of their nearest vaccination site, may not suitably account for the true distance required to travel for the most remote populations. By our calculation 1.73% of postcodes in England are over 10 miles (~16 km) from their nearest vaccination site, with the most remote postcodes up to 57 km away by road. This may be particular issue for those who are unable to drive or may be avoiding public transport due to the pandemic; those who are the most important to get vaccinated.
Does accessibility matter for understanding uptake?
We next compare our accessibility data to data on vaccination uptake from NHS England for the 1st of April 2021. As of this date, all people over 50 years old and the clinically vulnerable were eligible for a coronavirus vaccine dose. For this reason, we focus just on adults over 50 years old. We also use data on population estimates from ONS.
The following map shows the percentage of the population that received their first dose. The majority of MSOAs (85.4%) have an estimated uptake >90% reflecting the success of the vaccination roll-out. However, there are some geographical inequalities with areas in the north of England and towards Wales that were previously shown to have poorer accessibility also having lower uptake. Interestingly there are also lower levels of vaccination uptakes in over 50s in some urban centres, particularly in and around London.
There appear two key groupings for those areas with lower than average uptake. Urban centres, where access is widely available, and the most remote areas where access is very poor. The high drive times that appear with these remote areas may be a barrier for uptake, reflecting isolated communities who are unable or discouraged to make this journey. The disparity within urban centres is likely to represent very different drivers, including poorer uptake among marginalised and ethnically diverse populations that Local Authorities are working hard to support.
Overall it appears that poor access to vaccination sites may affect vaccination uptake for only the most extreme examples. Poor uptake in urban centres presents an equally worrying issue, that requires further analysis. All the code and data to replicate our analyses can be found on GitHub.
The CDRC’s Masters Dissertation Scheme has bounced back this academic year after the impact of the pandemic in 2020. For 2021 we have received a record total of 22 proposals from industry sponsors. 67 students applied for projects and 22 students were finally matched with 20 projects. Sponsors include: Barbour ABI (2 projects), Blinc Partnership, Cambridgeshire County Council, Carto, Entain Group, Here Technologies, Idealista, Institute of Place Management, International Organization for Migration, Local Data Company, Movement Strategies (2 projects), Pet Care Provider, Sainsbury’s, Tamoco, The Data City, The Registry Trust and Walgreens Boots Alliance
Applications were received from the following universities: UCL (38), City, University of London (6), Liverpool (6), Loughborough (6), Edinburgh (4), Westminster (2), Bristol (1), Glasgow (1), Leeds (1) Nottingham (1) and Oxford (1).
The broad appeal of the Masters Dissertation Scheme saw applications received from students studying across the following disciplines: GIS, Social and Geographic Data Science, Spatial Data Science, Advanced Quantitative Methods, Business Analytics, Environmental Change and Management, International Real Estate and Planning, Logistics and Supply Chain Management, Smart Cities and Urban Analytics and Sustainable Urbanism amongst many others.
For more details about the projects, please have a look at our website. If you are interested in participating in the scheme next year, please email Melanie Chesnokov.
CDRC data scientist intern Sebastian Heslin-Rees, working with Dr Nik Lomax, Dr Stephen Clark, and Dustin Foley developed a classification of commercial and employment land use in England and Wales using location and time-series data
Commercial areas and the businesses that inhabit them are not just an important addition to the vitality of urbanised areas but in many ways are essential to the ability of these places to flourish. This project has been utilising the newly available Whythawk dataset to construct a model for presenting and thus, understanding the spatial distributions of commercial areas across England and Wales. Largely, this has involved clustering workplaces of similar characteristics to distil a set of key workplace types, which can then subsequently be mapped and analysed. The Whythawk dataset is more detailed and up-to date than previous workplace/commercial classifications, which have been built from 2011 census data. Consequently, this could provide additional insights and novel avenues for academic research, policy initiatives and location analysis.
Data and methods
The Whythawk data contains details of commercial properties across England and Wales. It contains data such as the type of commercial property, the floor space, and employee count and business revenue. The data comes from both Valuation Office Agency and from local councils.
At the heart of our methodology is an unsupervised machine learning approach known as K-means++. Essentially, K-means++ groups variables of similar characteristics into the same cluster, to distil a specified number, K, of distinct clusters. It does this by minimising the total squared Euclidean distances between the cluster centroid and the data points within that cluster. In our case we used the percentages of floor space of each commercial type per postcode zone (e.g. LS15 8G). To add another layer of nuance to our classification and help further the distinctions between the clusters, we also generated and included an array of additional factors. These factors were selected based upon how they could impact the perceived attractiveness of an area, especially when viewed through the lens of retail and commercial attractiveness. For this we created an index of commercial diversity, rates of crime per business, as well as including measures for degree of urbanisation and accessibility by rail, road and bus.
We produced nine distinct classification types from the k-means clustering algorithm, labelled as follows: Urban mixed commercial land use (Retail focused), Public services, Diverse Industrial and warehousing areas, Urban office spaces, Less urbanised mixed commercial land use (warehousing, retail and leisure spaces), Low diversity Industrial areas, More urbanised and diverse public services, High street retail and As yet untitled (mixed). Moreover, there was also substantial variation in distribution across the nine clusters when examining our additional variables (Crime per business, Diversity, Degree of urbanisation and Accessibility). For instance, Figures 1 and 2 below display an example of the composition of clusters 1 and 4. We can see that the clusters are distinct in their composition of commercial activity. Notably, cluster 1 demonstrates significant diversity of commercial activity, whilst incorporating a large retailing component, whereas cluster 4 has a very low diversity focusing mostly on office spaces.
Figure 1 Catplot displaying the composition of cluster one, urban mixed commercial land use
Figure 2 Catplot displaying the composition of cluster four, urban office spaces
The clusters were subsequently mapped at Unit Postcode level. All postcodes with a cumulative commercial floor space below 100m2 were removed, so that the spatial distribution and characteristics of key commercial space can be examined. Two examples of this mapping can be seen below in Figures 3 and 4.
Figure 3 Map of Greenwich (SE10) in South-East London by commercial cluster type
Figure 4 Figure 3 Map of Leeds city centre (LS1) by commercial cluster type
Lastly, this model can be combined with other data points to provide additional utility for businesses. One avenue for this is examining how business rateable and rentable values compare across the distinct cluster types. For instance, clusters 0, 2 and 4 have their mean and median rental and rateable values significantly above clusters 1, 5 and 8.
Value of the research
The results could be used by businesses to readily locate commercial areas of interest when performing tasks such as determining optimal locations for new store outlets. Additionally, this model can be used in conjunction with many other research endeavours concerning urban analytics that seek to determine the characteristics and dynamics of urban areas. For example, this may be in terms of examining workplace and neighbourhood dynamics, commuting flows as well as retail and high-street health.
Utilising novel datasets combined with unsupervised machine learning.
Developing a unique classification concerning commercial land use across England and Wales.
Providing insight into urban dynamics.
Sebastian Heslin-Rees – Data Scientist Intern, University of Leeds
Dr Nik Lomax – Project supervisor, University of Leeds
Dr Stephen Clarke – Research fellow, University of Leeds
Dustin Foley – Data scientist, University of Leeds
Covid-19 has strained already insufficient Local Authorities resources, with infection and transmission of Covid-19 further exacerbating existing social inequalities. Four CDRC academic researchers (Dr Mark Green, Dr Jacob MacDonald, Dr Maurizio Gibin and Simon Leech) have been working for the past 6 months using the Office for National Statistics Secured Research Service (ONS SRS) on the Local Data Spaces project.
After engaging the JBC and 25 local authorities, we identified two consistent core research priorities which focused on broader COVID-19 health impacts and inequalities, and on economic vulnerability and recovery potential. From this, we developed a series of nine reports leveraging the secured data available through the SRS infrastructure – and further replicable and generated consistently for all local authority regions across the country (and available via the CDRC Geodata Packs platform).
For each local area, a set of reports are built to profile the themes of:
Demographic Inequalities in COVID-19;
Ethnic Inequalities in COVID-19;
Geospatial Inequalities in COVID-19;
Population, Housing and Affordability;
Industry Densities; Economic Vulnerabilities;
One of the outputs in the reports, allowing used to compare changes in retail and recreation over time for the country (area) and their local authority (line).
We made use of the highly detailed administrative and survey datasets held securely within the Office for National Statistics (ONS) Secure Research Service (SRS), including core national data products such as NHS Test and Trace, the COVID-19 Infection Survey, The Business Structure Dataset (BSD) registry and the Business Registry and Employment Survey (BRES). Non-disclosive research work was conducted within the SRS environment, and generated into the series of reports for each area across England. These data sources were supplemented where relevant with openly available datasets such as the ONS Population Estimates, Google Mobility Data, and CDRC open data products such as the CDRC Business Census, and Access to Healthy Assets and Hazards (AHAH).
From our meetings with local stakeholders, it became clear the huge variation in resources available for research and analytical capacity, and that the Covid-19 pandemic has stretched resourcing within local authorities. Local authorities co-designing analyses alongside the research team ensured the reports generated were relevant and useful, and helped fill evidence gaps at local levels.
We created non-disclosive outputs from the ONS SRS packaged into a series of reports for each local authority district in England. These reports are available through the CDRC Geodata packs platform for any local stakeholder to download. All R scripts, both for data cleaning and analyses are available for re-use by local authority analysts or local researchers in the future, enabling reproduction and even extension of the analyses. The openly-available (appropriately disclosed where necessary) code and workflow pipelines used to clean and format these datasets and produce final reports provide a number of practical efficiencies. Where local analysts have limited resources or capabilities in accessing, working and analysing massive national studies and datasets, cleaned scripts and code to bypass the data wrangling stage can be invaluable when rapid-response research outputs are needed. Alongside this, we hope this may empower those local authorities with lower analytical capacity to be able to access granular data to inform local level evidence bases.
Another output from the data pack reports, allowing users to compare positive Covid-19 rates by work sector for England (green) and their area (purple).
In the short term, reports will be used by local authorities and stakeholders, allowing them access to an evidence base of the impact of Covid-19 at a local level. The way the reports and replicable code are available to other accredited researchers within the SRS (and available appropriated disclosed external to the SRS) allow local authorities to explore these avenues for their own local research priorities. Locally focused research and data is clearly in demand and this resource will be a key part in local authorities’ response to Covid-19.
Poor diet is a leading cause of death in the United Kingdom (UK) and around the world. Methods to collect quality dietary information at scale for population research are time consuming, expensive and biased. Novel data sources offer potential to overcome these challenges and better understand population dietary patterns.
In a recent paper in Nutrients CDRC researchers Dr Stephen Clark and Dr Michelle Morris used 12 months of supermarket sales transaction data, from 2016, for primary shoppers residing in the Yorkshire and Humber region of the UK (n = 299,260), to identify dietary patterns and profile these according to their nutrient composition and the sociodemographic characteristics of the consumer purchasing with these patterns.
Results identified seven dietary purchase patterns that they named: Fruity; Meat alternatives; Carnivores; Hydrators; Afternoon tea; Beer and wine lovers; and Sweet tooth. On average the daily energy intake of loyalty card holders – who may buy as an individual or for a household – is less than the adult reference intake, but this varies according to dietary purchase pattern.
In general loyalty card holders meet the recommended salt intake, do not purchase enough carbohydrates, and purchase too much fat and protein, but not enough fibre. The dietary purchase pattern containing the highest amount of fibre (as an indicator of healthiness) is bought by the least deprived customers and the pattern with lowest fibre by the most deprived. In conclusion, supermarket sales data offer significant potential for understanding population dietary patterns.