Home » Uncategorised

Beginner’s Python Review

Beginner’s Python for Data Analysis – Review

Adam Keeley, Analyst at Leeds Institute for Data Analytics, shares his thoughts on our recent online Beginner’s Python for Data Analysis course.

The CDRC Beginner’s Python for Data Analysis training course was outstanding. I’m actually not a complete novice when it comes to Python, and I suspect a few others in the class also had varying levels of experience already. The course was structured such that it covered off the basics of object oriented language without dwelling on them. This helped fill in the gaps in my baseline understanding and ensured we were all quickly up to a similar standard.

The delivery of the content was well paced. When new concepts were introduced we were talked through some worked examples before being asked to work through a set of exercises on our own. Except we were never on our own; Fran and her demonstrators were always on hand to offer help and guidance in a friendly and constructive way. The course structure was logical and methodical with each new concept built on principles previously established and embedded through active application. There were no great conceptual leaps so at no point did I feel lost or wondering ‘how on earth did we get to that?’.

Inevitably when remotely delivering practical training of this nature, there were some technical issues. Impressive efforts were made to proactively sidestep these through the use of online, containerised environments. Every one of us began working from a standardised online environment, set up in advance with all of the required software and package dependencies available. Unfortunately the internet connections of the class were of varying reliability and some of us failed to connect consistently. The order of the course schedule was changed to get us set up on our own machines early but again, the remote delivery meant that every machine was different and the subsequent technical issues threatened to dominate the experience. I was extremely impressed with how well the delivery team handled the situation, providing technical support to all of us both as a group and individually when required. We were soon working with data again and probably with a greater understanding of Python as a result.

One benefit of the online delivery was the open chat enabled us to help each other out as we found solutions, not just troubleshooting the Python environment but in the exercises too. I found my fellow students friendly and keen to offer help when they were able. Coupled with the delivery teams knowledgeable and amiable nature this helped foster an environment where asking questions was easy.

By the end of the two day course many of us were thinking of ways to implement what we’d learnt in our workplace or include it in our research. Many of us were asking questions about the techniques and concepts we’d learnt, in terms of specific application to data and problems we face in the working world. These questions were answered helpfully and with a breadth of knowledge, with tips on where to learn more.

I would definitely recommend this course to anyone curious if Python might be useful to them, or for whom programming does not feel accessible. This course will let you in on the big secret: programming doesn’t have to be difficult!

A classification for English Primary Schools using open data

Two girls writing at school

A classification for English Primary Schools using open data


CDRC researchers, Dr Stephen Clark, Dr Nik Lomax and Professor Mark Birkin have developed a new classification for English primary schools to encourage better collaboration and enable more nuanced benchmarking.  

England has statutory regulations in place that ensure state funded schools deliver broadly the same curriculum. However there still exists a wide range of contexts in which this education takes place, including: the management of schools; how the schools chose to spend their budgets; individual policies in regards to staffing, behaviour and attendance, and perhaps most importantly, the composition of the pupil population in the school. Given these contexts, one outcome of interest is the attainment profile of schools, and it is important that this performance is judged in context, for the benefit of pupils, parents and schools.  

Developing a new classification  

To help provide this context CDRC researchers, Dr Stephen Clark, Dr Nik Lomax and Professor Mark Birkin have developed a new classification using contemporary data for English primary schools.  Thanks to the European Regional Science Association, they have made available a resource that identifies families of primary schools in England, where schools share common characteristics.  

The recently published study allocates schools into one of 32 sub-groups, allowing schools to compare their performance, either academically or financially with similar schools. These groupings allow the identification of “families of schools”, to act as a resource to foster better collaboration between schools and enable more nuanced benchmarking. 

Users are able to search by location to view schools in their area.

A novel approach to classifying schools  

Dr Stephen Clark explains “We identified the two most important aspects as firstly, the ethnic composition of the school, either from the background of the pupils or the number of pupils where English is not a first language.  

The second being the level of affluence, measured by neighbourhood characteristics or the number of pupils eligible for free school meals. Other important aspects include the degree of oversubscription for popular schools, and the number of authorised or unauthorised absences. 

In this study the academic performance of the school is not ’baked into’ the classification, so that differences in schools performance can exist, and be identified and investigated or shared.”  


Looking at the Midland’s City of Derby, we see that for a school like Reigate Park Primary they would be better to look to schools further afield, such as St Mary’s Catholic or Ashgate schools for appropriate peers to share experiences with, rather than the closer by Brackensdale school. 

Map of Derby highlighting that Reigate Park Primary would be better to look to schools further afield than their immediate neighbour Brackensdale Primary
Map of Derby – Background Map © OpenStreetMap contributors


Using open data  

The data for the study comes from the Department for Education in England, collected each year during the Spring term’s Annual Census of Pupils and Schools.   

This regularity of data allows the groupings to be revised over time, which is useful as the circumstances of a school are not fixed over time.  We would expect to see some changes to the groupings as schools are transformed through new leadership or by changes to catchment areas and their population.   

View the data

Find out more  

Secure Labs Reopening and Remote Data Service

code on a screen

Secure Labs Reopening and Remote Data Service

It has been a very interesting few months, with many of our working practices changing, with both positive and negative changes. I am very happy to announce that our secure labs in London and Liverpool are now re-open, with Covid safe rules to allow users safe access to the labs. We will be in touch with lab users, do please contact us if you have any questions.  

As one of the Economic and Social Research Council’s data infrastructure investments, CDRC was asked to join a recent meeting discussing how we have been able to respond to COVID-19, both in terms of what our research has been used for, and how we have pivoted to provide more services online. We have had to adapt to and change how we work, often on a relatively short timescale, but hopefully for a better experience overall.  

New Remote Secure Data Facilities 

One of our main developments which is being rolled out is secure, remote access to some of our Secure data sets. Making data available through UCL’s Data Safe Haven allows us to provide access to some of our secure data sets which were previously only available within our secure labs, requiring a physical visit to London or Liverpool. We have had to renegotiate our data licensing agreements with our data partners to enable this, so currently only some secure datasets are available using this method.  

Data Safe Haven is an ISO 27001 accredited facility, with 2 factor authentication ensuring that only those who are allowed to access the data can. We have also implemented our standard secure data output checking, ensuring that any outputs from the lab are secure and non-disclosing.  

Remote Working 

All of our staff are now working from home which has required us to update our working practices. Both UCL and University of Liverpool have now adopted Microsoft Teams, and working within the Teams framework has allowed us to simplify and rationalised our collaboration, scheduling and document management processes. We must always remember the variety of people’s opinions, with some of our staff very keen on home working, and some very keen to get back to the office as soon as possible. With many of our staff in London, space at home is often at a premium, particularly for a full time home office, which many of us never envisaged before. 

One change this has precipitated is a move to electronic signatures for signing user agreements. Spearheaded by UCL Legal, we are now able to accept electronic signatures (using Adobe’s DocuSign process) on our user agreements, removing the need to print, physically sign and scan documents.  

The last six months have brought new ways of working, and new approaches to all of our lives – who would have thought that everyone wearing face coverings would become accepted in everyday life? We will continue to keep you up to date with developments with our Secure Labs and new remote secure data technologies.  

Remote Training 

We have also been able to move all of our training provision online, with a number of courses run through Zoom recently to enable online delivery. We are also in the process of developing two new courses (Advanced GIS Methods Training: AHAH and Multi-Dimensional Indices and Advanced GIS Training Methods: IUC and K-means Clustering) which will be delivered online in the autumn. Check out the links for more details. Whilst online training is not the same as in person training, it does have the advantages of not requiring travel, and overnight stays, which is a bit positive to many people.  

We will continue to provide updates to how our service changes and develops. If you would like to use our data, or if you have any questions, please do get in touch via at nick.bearman@ucl.ac.uk or info@cdrc.ac.uk.  

Join the CDRC team: Research and Impact Manager

Join the CDRC team: Research and Impact Manager

Are you an experienced and capable research manager who has an interest in working in the field of data analytics?

We’re looking for a talented and highly motivated Research and Impact Manager who can help us deliver impact from our data assets and affiliated projects. We are looking for someone who can oversee a portfolio of projects and deliver innovative support for both new and ongoing research.

Working with our team of academics and researchers you will help us to deliver excellent research which utilises our unique data assets. You will be involved in the complete lifecycle of the project, from inception and planning, through to delivery and dissemination of outputs. As such you will have the opportunity to be involved in the delivery of substantial projects which achieve wide reaching impact and ultimately raise the profile of the CDRC.

As Research and Impact Manager you will oversee the delivery of CDRC research projects and maximize their wider impact. You will be responsible for developing a project initiation process that ensures research delivers appropriate outcomes and impact, and for defining a framework for collating data and narratives on research impact. With these frameworks in place, you will track the progression of research projects, supporting academics and researchers in delivering objectives, and promoting collaboration with stakeholders.

Working with the wider CDRC team, you will help coordinate the delivery of diverse, high-impact research outputs, from journal papers and conference publications, to blog posts and tweets.

You will also play an important role in advising the Centre Manager and Directors on financial matters, producing forecasts and plans for expenditure throughout the grant.

Further information and Candidate Brief

Utilising novel data streams in the fight against COVID-19

View of earth from outer space

Utilising novel data streams in the fight against COVID-19

Matthew Carter is a PhD student and part of the EPSRC CDT in Distributed Algorithms. Read his blog to find out how researchers at the University of Liverpool are utilising novel data streams, data science and machine learning to help in the fight against COVID-19.

“The outbreak of the Coronavirus Disease (COVID-19) has caused widespread disruption to societies and economies around the world. Despite their reliability, conventional data streams (such as the daily reports by the Office of National Statistics) provide limited insight into the pandemic. Our team of researchers, led by Professor Simon Maskell, is developing systems that utilise novel data streams to provide deeper insights into the pandemic as it unfolds.

With social distancing measures in place, a large amount of discourse relating to COVID-19 now takes places on social media platforms such as Twitter. These platforms contain a treasure trove of information that can help us answer questions such as how many people are exhibiting Coronavirus symptoms today? However, not all information is created equal – these platforms also contain a lot of misinformation which could potentially cause harm to members of the public.

We developed a system to track and analyse tweets that mention symptoms of COVID-19. This system ‘listens’ for tweets that mention COVID-19 symptoms. Once identified, tweets are fed through a machine learning classifier which identifies whether it relates to a user’s personal symptoms, someone else’s symptoms or if the tweet contains misinformation.

Twitter dashboard data

Dashboard used to monitor Twitter data

We can also use geolocation data to calculate the number of users who tweet about symptoms in each region of a given country (where geolocation is permitted by the user). It is also possible to determine the number of users who are travelling between different regions of a given country. This information could potentially help to identify new outbreak clusters within a country and provide insight into how members of the public responded to lockdown measures.

To make this information easily accessible, we developed a ‘Symptom Watch’ dashboard, which reports a daily count of the number of tweets that mention symptoms. These counts are currently provided per state in the USA and at various levels (local and upper tier authority, NHS region and national) in the UK. This functionality will be extended to other countries in the near future.

We have also been working with Evergreen Life to analyse data from their health and wellness app. In response to COVID-19, Evergreen Life have been asking app users questions to gain insight into the pandemic. Users are asked to report, for example, if are isolating or if they or someone in their household has symptoms. The depth and breadth of the data collected is really impressive and could answer an endless number of questions.

Laptop and mobile phone

 Evergreen Life dashboard data display

The team has developed solutions to answer some of these questions, for example – the average duration an individual experiences symptoms of COVID-19 for. User reports to the Evergreen Life app are sporadic and we therefore don’t see a complete timeline of reports for the full duration an individual is exhibiting symptoms.

To deal with the sporadic nature of user reports, we defined and fit a Bayesian model in the ‘Stan’ programming language, which enabled us to determine that users were most likely to experience symptoms for 3.06 days.

Where users report a household member exhibiting symptoms, we can gain insight into the interaction of COVID-19 within households by determining the time between two household members falling ill. We also know whether a user is isolating and subsequently develops symptoms. From these reports, we can quantify whether isolating reduces your chances of developing coronavirus. We analysed data collected between March and June this year and determined that individuals who did not isolate were 35% more likely to report symptoms within 7 days of reporting that they were not isolating.

The work we have done so far demonstrates how novel data streams can be utilised to gain a deeper understanding of the COVID-19 pandemic.  When combined with more conventional data streams, these novel data streams could aid governments in making more informed decisions to combat the virus.”

GISRUK 2020 – CDRC Researchers

GISRUK 2020 – CDRC Researchers

Earlier this month CDRC researchers and students from the CDT Data Analytics and Society presented papers (virtually) as part of the 28th Geographical Information Science Research UK Conference.

Measuring Lifestyle Courier Work Area Similarity – Kostas Cheliotis, Sarah Wise, Fraser McLeod, Tom Cherrett, Julian Allen, Oliver Bates, Maja Piecyk and Tolga Bektas. 

High-Frequency and High-Resolution Neighbourhood Housing Price Dynamics: Identifying Spatio-Temporal Hotspots and At-Risk Areas for London, England – Jacob L. Macdonald. 

Evaluating the impact a restaurant aggregator might have on a UK National Restaurant Chain and with that impact in mind consider whether prevailing retail theories apply in the online world – Jason Dalrymple.

Estimation of small-area tourist demand using Airbnb accommodation in London – Zi Ye, Graham Clarke and Andy Newing. 

Must all maps be the same? A Mixed Quantitative-Qualitative Approach to Resource Allocation – Andrew Renninger. 

Towards data-driven human mobility analysis – Terje Trasberg and James Cheshire. 

Measuring Beauty in Urban Settings – Alessia Calafiore. Measuring Beauty in Urban Settings.

A Global Comparison of Bicycle Sharing Systems – James Todd, Oliver O’Brien and James Cheshire. 

Proposing a new method for creating integer weights for Iterative Proportional Fittings based on locally calibrated model – Jason Chi Sing Tang and James Cheshire. 

Retail Vibrancy and the Composition of the UK High Streets – Abigail Hill and James Cheshire. 

Towards Real-time Agent-Based Pedestrian Simulation using the Ensemble Kalman Filter – Keiran Suchak, Nick Malleson, Jon Ward and Minh Kieu

GISRUK is the largest academic conference in Geographic Information Science in the UK. Since 1993, GISRUK has attracted international researchers and practitioners in GIS and cognate fields, including geography, computer science, data science, and urban planning to share the latest advances in spatial computing and analysis.

COVID 19 – Providing new insights to aid societal recovery

Birds eye view of a crowd of people on street

COVID 19 – Providing new insights to aid societal recovery

Earlier this year we announced our involvement in the Emergent Alliance – a not-for-profit community established to share data, expertise and resources to work together to aid economic recovery in 2020 and shape a new normal.

We are working through our parent organisation Leeds Institute for Data Analytics to provide the COVID-19 data alliance with scientific expertise and access to global academic research networks.

What is the Emergent Alliance?

The Emergent Alliance is a collaboration of large organisations, small businesses, institutes and individuals – founding members include Leeds Institute for Data Analytics, IBM, Google Cloud, The Data City, Truata, Rolls-Royce, Microsoft, ODI Leeds, SATAVIA, Fieldfisher and Whitespace.  Since launching in April we have added a further 29 members and our Alliance continues to grow globally.

Drawing on this diverse collaboration of corporates, individuals, NGOs and Governments, the Alliance will contribute expertise, data, and resources to inform decision making on regional and global economic challenges to aid societal recovery post Covid-19.

We want to have the ability to take a broad set of economic, behavioural and sentiment data, fuse it together and provide new insights and practical applications to the global Covid-19 response.

The models produced will identify lead indicators signalling economic recovery cycles that global businesses and government can use to build operating confidence in investment and activities that shorten or limit recessionary impacts.

Who and how can the Alliance help?

Governments, businesses and individuals around the world have been challenged by Covid-19 to act quickly, decisively and effectively using the best available scientific evidence and insight.

The Alliance provides a much-needed independent means of sourcing and shaping ideas regarding Covid-19 recovery and longer-term sustainability is required.

By mobilising skills, data and analytics from a wide range of organisations and institutions at a global scale, the Emergent Alliance can help to contribute to a step-change in the use of data to address the economic and social challenges which come in its wake.

The outputs from the alliance are intended to help understand economic implications and aid recovery, to provide insight beyond what is currently known.

Our role in the Alliance

LIDA has a number of roles within the Alliance, including:

  • providing a secure, accredited and independent data sharing platform;
  • offering data analytics support for the execution of projects;
  • leading the preparation of ‘challenge statements’;
  • and it is one of a small number of members with responsibility for shaping the alliance’s strategic direction.

CDRC is working through Leeds Institute for Data Analytics to provide the Alliance with scientific expertise and access to global academic research networks.

Current Challenges

Activities are structured around a series of Challenge Statements, which articulate the social and economic problems to be addressed. The Alliance currently has four challenges underway:

As well as sharing results from challenges, members are asked to make their data available publicly wherever possible. There has been a lot of progress in curating data sets and sharing information through an online cataloguedeveloped by Open Data Institute (ODI) Leeds.

Whilst the Emergent Alliance does focus on the recovery from Covid-19, its cross-industry collaboration model and practices for data sharing could be applied to any number of pre-existing societal challenges.

Ads for junk food in the UK seem to be concentrated in poorer areas

Burgers and chips

Ads for junk food in the UK seem to be concentrated in poorer areas

Billboards advertising unhealthy food are concentrated in poorer areas and areas with a higher proportion of overweight children in Liverpool, UK. These findings may also apply across the country.

Using a combination of artificial intelligence and street-view images, CDRC researcher Mark Green and his colleagues at the University of Liverpool mapped the content and geographical location of more than 10,000 outdoor adverts in the city.

The research featured in New Scientist this week, where subscribers can read the article in full.

Dr Mark Green

Dr Mark Green is a Senior Lecturer in Health Geography at the Geographic Data Science Lab at the University of Liverpool.

Geographical determinants of health

Mark’s research explores the ways in which features of the neighbourhoods we live and interact with daily imprint on our health. His previous work includes:

Find out more about Mark’s research and the Geographic Data Science Lab at the University of Liverpool.

Championing Localism – No more road to nowhere

Better towns roadmap road sign

Championing Localism – No more road to nowhere

Innovative roadmap launched to lead the way towards better, healthier and more successful towns

Today sees the launch of a new and unique approach to support towns and the challenging journeys they face. The Better Towns Roadmap consortium has created a highly visual and interactive roadmap to help transform towns across the UK. The consortium brings experienced practitioners together with academics who share a passion for repurposing towns, rejuvenating forgotten spaces and engaging their communities.

A collaboration between HLM Architects, Didobi and realestateworks has developed a step-by-step roadmap to realise any town’s short or longer term goals. It creates clear links between a town’s vision, goals, and targeted outcomes, presenting customised outcomes that defy traditional ‘one size fits all’ approaches to regeneration, adaptation and change.

The decision to create #bettertowns was driven in part by frustration at the lack of effective collaboration across the industry and the desire to join up projects from their conception to delivery.

The belief is that multi-disciplinary, collaborative, data-driven approaches can transform any town’s prospects through a succession of deliverable projects linked by a unique vision.

The #bettertowns roadmap facilitates the successful repurposing of towns where rigour, process and logic are applied at every stage.
Each town’s journey proceeds by successively creating a baseline, defining a mission, appraising options, creating an action plan and delivering outcomes. Each step is further unpacked on the bettertowns website and is supported by an extensive virtual library and self-assessment questionnaires.

Olivia Paine, HLM Architect’s Asset & Workplace Project Lead, has worked with many local authorities to help rationalise and revitalise their assets. She is passionate about the collaborative potential of #bettertowns which will help local authorities not only to achieve goals but to understand the process behind the journey:

“Towns have faced increasing challenges over the years and the current pandemic has seen a shift in the pace of change and an increased necessity for well-informed, connected local authorities. We understand that every town has the potential to be unique and welcome the return of ‘localism’ to the agenda. It is not about selling a generic product, it is about sharing knowledge and information
to support towns achieving their desired outcomes.”

Brian Thompson, founding Director of realestateworks, a niche consultancy specialising in public sector and collaborative asset management, warns against quick fixes to systemic issues facing the sector:

“Every town is infuenced by but also shapes the economy of neighbouring towns and communities. We look beyond quick fixes and beyond simply the High Street for only by doing so can sustainable, viable, and long-lasting strategies be defined.”

Matthew Hopkinson, Co-Founder and Director at Didobi, is a well known and highly respected practitioner in this space having been involved in a number of High Street Reviews as well as having created data analytics, aide-memoires and research on towns:

“I am delighted that after a year of planning we are able to launch the Better Towns Roadmap. I believe that this highly visual, clearly staged and evidence driven approach is what our towns need in order to address the challenges they face today and in the future. It is a long journey and one that needs to be measured against key milestones if success is to be realised and I hope that by sharing the Better Towns Road Map approach we can support all towns on their journeys.”

A partnership with the Consumer Data Research Centre (CDRC) brings additional benefits to towns and their communities by providing access to additional insights, tailored research, unique datasets and intelligence from the academic domain.

Professor Paul Longley, Director and Principal Investigator at the CDRC welcomes the opportunity to support the repurposing of towns using unique and timely analysis ready data:

“The CDRC is delighted to contribute to the Better Towns Roadmap consortium and strengthen academic links with businesses and local authorities, particularly at this challenging time. We believe that we bring in-depth expertise in data and analysis to this important initiative.”

The consortium will be crowdsourcing data via a self-assessment page on the website. Data will be captured from representatives of councils, businesses and communities on their towns in order to help understand the issues and opportunities that everybody faces. The data will be aggregated, analysed, anonymised and shared in order to better understand different perceptions of salient shared issues.

A frequently updated key publications library will be maintained on the website. This will help raise the overall level of awareness of the tools,techniques, guidance and success factors among those that share interest in creating better towns.

Visit www.bettertowns.co.uk for more information on the roadmap and details of the self assessment

Notes to editors:

Created in 2020 the Better Towns Roadmap consortium was formed to work with towns, both large and small and to bring a multi disciplinary approach to re-purposing towns from the beginning of the journey through to the execution of a successful plan.

Based on the collective idea that multi-disciplinary, collaborative, data-driven approaches can transform any town’s prospects the Better Towns Roadmap consortium consists of a group of respected experts, that individually bring unique knowledge and expertise.

They understand towns from an occupier, investor, local authority, and community perspective, with first-hand experience, witnessing the benefits of combining architectural creativity, data-led insights, effective stakeholder engagement, and feasibility testing. They bring a deep knowledge of retail data, places and trends and are rigorous, independent, and objective in their approach, design, analysis, data and systems.


What can tweets about contact tracing apps tell us about attitudes towards data sharing for public health? (Part 3)

Crowd of people in railway station

What can tweets about contact tracing apps tell us about attitudes towards data sharing for public health? (Part 3)

At the end of my last blog post about Covid-19 apps, I speculated it was unlikely that the UK’s Track and Trace app would gain enough public trust and support to be a success.  Since that blogpost was published, it was announced that UK’s app might not be ready until winter, followed by news that the centralised NHSX app has been abandoned for a decentalised alternative developed by Apple/Google.  

Many people have reacted to this news on Twitter resulting in a spike of tweets about the Track and Trace app (Figure 1). In this blog post I will present findings from sentiment analysis on these tweets to understand people’s reactions to the new decentralised app and discuss the future of data-sharing post-Covid-19.  

Graph showing number of tweets about Covid-19 tracing apps
Figure 1: Daily number of tweets about all ‘Covid-19’/’Coronavirus’ apps from all countries (blue) and only  the UK’s ‘Track/Test and Trace’ app (orange), collected 24 April to 16 June 2020 and 17 June to 25 June 2020 respectively.  

Holly Clarke

Leeds Institute for Data Analytics

Holly Clarke is an Intern at Leeds Institute for Data Analytics, applying data science solutions to solve complex, real-world challenges. She is working for the LifeInfo project with Michelle Morris, researching attitudes towards novel lifestyle and health data linkages and how access to this information could improve public health. 

Read the previous parts of this blog:

Read part 1
Read part 2

The positives and negatives of Covid-19 apps  

This sentiment analysis includes tweets about the UK’s Track and Trace app posted between 17th and 25th June 2020, thereby, focusing in on recent events.  Sentiment analysis matches words within tweets with common positive and negative words categorised in the “Bing” dataset, developed by Bing Liu in order to identify their sentiment. Overall, this analysis tells us there are more commonly used negative words within recent tweets about Track and Trace app than positive, indicating the tweets hold mainly negative content (Figure 2).  

Proportion of sentiment words in tweets that are positive and negative for tweets about all Covid-19 apps, collected 24 April to 16 June 2020, and tweets about the UK’s Track/Test and Trace app, collected 16 June to 25 June 2020.
Figure 2: Proportion of sentiment words in tweets that are positive and negative for tweets about all Covid-19 apps, collected 24 April to 16 June 2020, and tweets about the UK’s Track/Test and Trace app, collected 16 June to 25 June 2020.  

The nature of these positive and negative words is also very telling. The negative words refer predominantly to the management of the app rather than issues about data privacy and the app itself; “failure”, “incompetence”, “fiasco”, “shambles”, “chaos”, “disaster”, “lying” and “debacle” all feature prominently (see Figure 3).  As a comparison, sentiment analysis on all general tweets from 24 April to 16 June 2020 about Covid-19 apps (Figure 4) shows common negative words to be more technology focused and in line with common concerns about data-sharing – “breach”, “risk”, “concerns”, “issues”.  

50 most frequently used positive and negative sentiment words used in tweets about the UK’s Track/Test and Trace app, collected 16 June to 25 June 2020.
Figure 3: 50 most frequently used positive and negative sentiment words used in tweets about the UK’s Track/Test and Trace app, collected 16 June to 25 June 2020.  

The positive words refer to more common topics around data-sharing and technology e.g. “trust”, “protection” and “safe” across both datasets of tweets. This indicates engagement with the topic of data sharing and a significant proportion of the tweet sentiment words are positive across both datasets.  

As part of the sentiment analysis I have controlled for negation, inversing the positive/negative categorisation if a common negator is directly before the word (e.g. “not good” or “don’t trust”). In the figures these are shown with the pre-fix “neg_”.  However, linguistic features such as sarcasm, humour and questioning are not easily picked up through sentiment analysis. Some instances of positive words like ‘wow’ or ‘promises’ may also be used in a critical way.   

Overall, although both datasets of tweets include more negative than positive words, the recent events around the UK’s Track/Test and Trace app seem to have framed the app more negatively than Covid-19 apps more generally due to “waste” and issues around the development of the app.   

50 most frequently used positive and negative sentiment words used in tweets about ‘Covid-19’/’Coronavirus’ apps from all countries, collected 24 April to 16 June 2020.
Figure 4: 50 most frequently used positive and negative sentiment words used in tweets about ‘Covid-19’/’Coronavirus’ apps from all countries, collected 24 April to 16 June 2020. 

What will Track and Trace mean for people’s attitudes to data sharing?  

When I began writing this blog series on Covid-19 apps, countries across the world were rapidly launching contact tracing apps to quelle the spread of coronavirus through technology and the UK was poised to trial their app on the Isle of Wight. Two months later the Track and Trace app journey has certainly not been smooth and the app’s importance has been downgraded from “world beating” to “the cherry on the cake”.  But what does this late-stage Apple/Google switch will mean for public opinion?  

Research on attitudes to data-sharing, as discussed in my last blog post, frequently finds that people’s willingness to share their data is dependent on which actors are involved. People tend to have high trust in the NHS and the lowest trust in private companies. Hence, we might expect the shift from an NHSX app to one involving tech giants Apple and Google to be met with opposition. However, the sentiment analysis indicates conversation about the Track and Trace app mainly focuses on the wastefulness and “shambles” of the switch rather than inherent mistrust in private companies.  

Initial findings from my work with the LifeInfo project, exploring public opinion about linking lifestyle data (e.g. supermarket loyalty card or fitness app data) with health records, may explain this. My analysis highlights that data-sharing and trust in actors is not as straight forward as might be expected.  

Although people generally have high levels of trust in health organisations, respondents repeatedly expressed concerns that their supermarket loyalty card data might be seen by their GP if these data were linked for health research. Many worried their GPs would unfairly judge their diet and lifestyle, and even withhold treatment. Yet, respondents were happy for supermarkets (private companies in which research finds people to have the least trust) to store and use their loyalty card data.  This indicates that attitudes about data sharing are not simply informed by trust in actors but are also influenced by the type of data involved and social norms about how it is currently used.  

In the context of coronavirus apps, this could mean that users are more comfortable with mobile phone providers using data to alert them about exposure to coronavirus than the government or NHS. Many mobile phone users share vast amounts of data with technology companies through everyday use of apps and services which they may be uncomfortable sharing with the government or healthcare providers.  Therefore, a contact tracing system involving Apple and Google, and especially a decentralised one which enables more data privacy, might encourage wider use than the NHSX app.  

The future of data sharing post Covid-19 

The Covid-19 pandemic will undoubtedly create lasting changing across many aspects of our lives including attitudes towards data sharing.  The pandemic had led us to consider sharing unprecedented amounts of data it has also made clear the inadequacies of our medical data sharing systems.   

In the context of the LifeInfo study, access to lifestyle data linked to health records could help researchers better understand and prevent diseases such as diabetes, certain cancers and heart disease, The World Health Organization attributes 30% of yearly global deaths to poor diet and physical inactivity, so it is a substantial challenge. However, for participants to willingly share their data they must trust organisations to safely, responsibly and transparently use it. 

Successful contact tracing apps had the potential to demonstrate that data sharing could help improve health while maintaining personal privacy and data security. Yet, technological failings, privacy concerns, and government mismanagement in the UK could turn public opinion against data sharing initiatives in the same way other high-profile failings such as care.data didAbove all, the Track and Trace app highlights how detailed consideration of peoples attitudes towards data sharing is vital for initiatives to be successful.