What can tweets about contact tracing apps tell us about attitudes towards data sharing for public health? (Part 2)
In my last blog post I wrote about the importance of understanding attitudes towards data-sharing for public health and its link to my work with the LifeInfo Project. I also advocated that, by analysing Twitter conversations about using contact tracing apps to manage the spread of Covid-19, we could uncover some of people’s underlying thoughts, fears, and hopes about personal data sharing for better health.
Analysis showed many to have taken to Twitter to share their thoughts, particularly sparked by the introduction of the Australian ‘COVIDSafe app’ and the Isle of Wight trial for the UK ‘NHS track and trace app’. In this blog, I will focus on the textual content of these tweets, using Natural Language Processing (NLP) and topic modelling, and discuss how this relates to opinions of data-sharing in the time of Covid-19.
Hot Topics – what do tweets say?
Continued tweet scraping means 18,170 tweets have now been collected on the topic of Covid-19 apps, between 24th April and 1st June 2020. As seen in my last blog, the most frequently used words in these tweets indicates that the conversation has been shaped around particular contexts. Nationalities and geographies are prominent – ‘Isle [of] Wight’, ‘Australian’, ‘UK’, ‘India’, ‘Icelanders’ – as are technological and state actors – ‘government’, ‘NHS’, ‘apple’, ‘google’. This introduces the idea that the wider conversation about Covid-19 apps is made up of smaller topic clusters about different places, specific apps, and associated news stories.
Topic Modelling is a technique often used to uncover latent topics within a set of documents (in this case tweets about Covid-19 apps) through statistical analysis of semantic similarities. The Latent Dirichlet Allocation (LDA) algorithm is built on the principle that each document is made up of small number of topics, and each topic is identifiable by its use of words.
This algorithm was employed to uncover 12 topics within the Twitter conversation about Covid-19 apps, as displayed in Figure 1. A data-driven approach was used to identify the number of topics, taking the model with greatest probabilistic coherence across topics1. Table 1 has a full breakdown of topic interpretations with the words most associated with each topic displayed in Figure 1.
Topics, however, need to be interpreted. Some consist of a finer granularity of related content, for example, Topic 9 focuses on symptom tracking apps, predominantly in the UK context but also in Kogi State, Nigeria. Other topics are modelled as separate but found to be lexically similar through hierarchical clustering. This is displayed by the dendrogram at the top of Figure 1, indicating how closely related the twelve topics are to each other2. For example, Topics 11 and 12 both focus on rights, and concerns about the security and privacy of coronavirus apps. Yet, the former takes a more technological perspective, using words like ‘cyber’, ‘tech’ and ‘digital’, while the latter uses more legal jargon – ‘legislation’, ‘laws’ ‘legal’.
Table 1 showing the interpretations of each topic, found through topic modelling, developed through examination of topic ‘top words’ and taking examples from original tweets 3
|TOPIC||INTERPRETATION OF TOPIC|
|1:||UK’s decision to develop a centralised app over the Google/Apple decentralised system.|
|2:||GNB union tells UK care workers not to use Coronavirus ‘Care Workforce app’ .|
|3:||Track and Trace app trialled on the Isle of White. Links with Dominic Cummings (Government Advisor).|
|4:||Topics of privacy, force, democracy and surveillance relating to Indian ‘Aarogya Setu app’. ‘Icelanders’ appears, perhaps because Iceland was also an early adopter of a contact tracing app.|
|5:||Australians downloading COVIDsafe app. Political, with direct references to Scott Morrison (PM) and Greg Hunt (health secretary), ‘LNP’ (Liberal National Party), ‘police’ and ‘auspol’ (Australian politics hashtag).|
|6:||Several threads of news stories within this cluster; Chinese city plans to transform Corona app into a permanent health tracker, people using a symptom tracker app in Canada, ‘Dido’ in relation to Dido Harding (ex-director of TalkTalk)’s involvement in NHS app.|
|7:||Technological issues with contact tracing apps – Bluetooth, battery, background data, Android, Apple. Also, privacy, surveillance and ‘mission creep’ concerns with NHS app.|
|8:||Broad topic with themes of safety – ‘stay’, ‘safe’, ‘home’ and mobility ‘freedom’, ‘travel’.|
|9:||Covid-19 Symptom Tracking app UK (Zoe) and research from King’s College with these data. Similar Kogi State (Nigeria) symptom self-assessment app.|
|10:||Parallels drawn between data collection by contact tracing apps and other social media platforms, negative words and expletives used, often in relation to irony, people rejecting apps due to privacy concerns but using Facebook etc.|
|11:||Personal rights and privacy/security concerns about coronavirus apps, from technological perspective ‘experts’ ‘technological’, ‘cyber’|
|12:||Personal rights and privacy/security concerns about coronavirus apps, from a legal perspective – ‘misuse’, ‘access’, ‘laws’ ‘legislation’.|
Overall, topic modelling conveys several things about Covid-19 app attitudes. First, as hypothesised, conversations are strongly shaped around context, detailing prominent news stories and events, but this also influences how apps are talked about. Topic 4 includes references to issues of ‘privacy’, ‘surveillance’, ‘democracy’ and ‘force’ when talking about the Indian ‘aarogya setu app’ which is mandatory for government employees. Topics 1 and 7 have more of a technological and practical focus within the UK context, although still in relation to personal privacy, discussing centralised/decentralised apps, Bluetooth and battery issues.
Second, although contact tracing apps are the predominant focus, topic modelling distinguishes other Covid-19 apps sharing different kinds of data. Topic 1 focuses on symptom trackers and Topic 2 the ‘Care Workforce app’ to disseminate information.
Third, most topics report events rather than represent attitudes or positions, although many do contain negative words such as ‘concern’, ‘warn’ or ‘worries’. Topic 10, however, stands out by including expletives – indicating anger is evident within this part of the conversation.
All these topics make up a significant portion of the total tweets, ranging from 6.6% for Topic 2 to 10.6% for Topic 10, conveying that no single topic dominates.
COVIDSafe vs Track&Trace: Actors Matter
Alongside creating a dataset of tweets about Covid-19 apps I have also been collecting tweets about specific apps – the Australian ‘COVIDSafe app’, and the ‘NHS Covid-19 app’/‘Track&Trace app’, in the UK. My aim was to compare attitudes towards these apps and expose potential commonalities and differences between issues such as personal privacy, surveillance and data security, and link this to policies and practices. The overwhelming interpretation of this analysis, however, is that actors matter.
Support or opposition to data sharing is greatly influenced by who we are giving access to these data and our trust in these actors. Research consistently shows that, in the UK, we have the highest trust in the NHS, lower trust in central government and the lowest trust in private and technological companies. In a blog post, Helen Kennedy, suggests that as contact tracing apps involve all three actors, the public will not know whether to trust them or not.
The word ‘government’ appears prominently in both the COVIDSafe (Australia) and Track&Trace (UK) wordcloud as one of the most frequently used terms, showing the centrality of the state to conversations about contact tracing apps. Relatively high levels of trust in governments to use data appropriately means they may be in an advantageous position to convince people to share their personal data to help track and quell the spread of coronavirus. It is also possible that the ‘NHS’ branding of the UKs app could influence people to support and use it due to high levels of trust in this organisation.
It would be remiss not to mention, however, the frequent references in the UK data to Dominic Cummings, the government chief advisor who during the time period of tweet collection has been scrutinised due to his journey to Durham during lockdown. Many frequently used words reference a rumour that his sister is involved with the contact tracing app – ‘Idox’, ‘sister’, ‘director’, ‘contract’, ‘Alice’. Although this has been found to be untrue by the fact checking organisation FullFact, this association could undermine public trust and creates confusion about who has access to data collected from the app.
As is shown in Figure 3, tweets containing reference to a ‘NHS Covid-19 app’5 (the official naming of the UK’s app) are in line with tweets about a ‘Track and Trace app’ until mid-May. Past this point tweets about the NHS app are eclipsed by those about ‘Track and Trace’, showing the diminishment of the NHS branding at this point. At the same time, tweets about the ‘Track and Trace app’ contain fewer references to the ‘NHS’ and ‘government’ and greater references to ‘cummings’ (Figure 4). This seems indicative of a shift in the public mind away from the app as a neutral, technological, health tool towards something more political.
Comparatively, while references to Scott Morrison (PM) and Greg Hunt (Health Minister) are evident within the Australian dataset, the conversation seems to focus on the government as a unified actor. The UK have not announced a date for the track and trace app to be released nationally, but given downloads have fallen short of targets in Australia, it seems unlikely mass support will be mobilised here.
In my next blog post I will be looking at what sentiment analysis on tweets can tell us about people’s attitudes to contact tracing apps, follow up with any current developments, and round off this blog series with a discussion of how the Covid-19 pandemic might impact data-sharing practices and attitudes going forward and what this could mean from projects like LifeInfo.
1 10 models were created with topics ranging from 1-101 in intervals of ten to estimate an optimal ‘topic window’ where topics were found to have the greatest probabilistic coherence. A further 20 models were then created within this window (11-31 topics) to select the model with the highest overall topic coherence.
2 Closeness of topic found through hierarchical clustering using the Hellinger Distance for phi – (P(token|topic))
3Top words found by the highest phi values per topic, where phi is P(word|topic)
4 common ‘stop words’ are excluded, for example ‘is’ or ‘and’, also the words directly related to the search terms for the app e.g. ‘covidsafe’ , ‘track’ or ‘app’.
5 Inclusive of any reference to NHS ‘corona’ or ‘covid’ app.