Home » Archives for April 2018

CDRC-GISRUK Data Challenge: Papers online now

The Consumer Data Research Centre (CDRC) collaborated with GISRUK to host a data challenge.

Delegates were asked to develop a novel analysis or visualisation of CDRC and associated data in order to investigate the hypothesis set out in the Economist article “The immigration paradox Explaining the Brexit vote”, which argued the rate of change in number of migrants in an area rather than the total headcount influenced the Brexit vote.

We honed in on four finalists who presented their papers at GISRUK 2018 on 17 April and the winning paper announced on 19 April. A summary of the event – including reasons for selecting the winning paper – can be found here.

You can access each individual paper below:

Paper 1: Winning paper:
Title: SpaCular – Disclosure of spatial peculiarities of the Brexit
Authors: Joao Porto De Albuquerque (University of Warwick) , Konstantin Klemmer (University of Warwick) , Rene Weserholt (University of Heidelberg), Andra Sonea (University of Warwick)
Abstract: Immigration has consistently rated as the most important issue the UK faces, to a much higher degree than the average in the EU, despite UK not being among the EU countries with the highest share of foreign born or with the highest increase in foreign born population. Whilst the UK experienced, between 2002-2015, a 76% increase in immigration, at closer inspection data contradicts the stereotypical image of the immigrant so much misused during the Brexit vote: 47% of UK immigrants in the 15-64 age range have tertiary education, the highest proportion highly qualified immigrants by far among all EU countries. Additionally, non-European immigration consistently formed the majority of the immigration even after the A2 EU accession in 2007.

Paper 2:
Title: Tension Points: A Theory & Evidence on Migration in Brexit
Author: Levi John Wolf, University of Bristol
Abstract: The GISRUK Data Challenge asked: was Brexit primarily driven by the rate of change in migration, rather than the total headcount? To interrogate this, I used local regression methods, hierarchical models, migration data provided by the Office of National Statistics, and a novel method to extract population volatility from fine-grained Consumer Data Research Centre Data. Depending on the implicit hypothesis used to operationalize the contest question, I find Brexit voting at local authority level was driven in part by the rate of change in their population structure, but some types of change drove Leave voting and some drove Remain.

Paper 3:
Title: Rapid change in ethnic composition – part of a wider Brexit picture?
Authors: Edward Abel et al, University of Manchester
Abstract: Correlations have been reported by The Economist between a 13 year rate of change in the proportion of foreign born individuals in UK local areas and voting in the 2016 European referendum. Using regression and principal components analysis, we confirm the significance of rate of change in ethnic diversity, driven by a change in White British, White Other and Black populations. This varied by region, and the time window used for comparison also significantly impacts model results. Superficial correlations between change in Asian and Black populations and ‘leave’ voting were eliminated by including model variables linked to urban living. Age composition, turnout and population density all had smaller effect sizes than changes in ethnic composition.

Paper 4:
Title: Investigation of the impact of changes in ethnic mix on the EU referendum result
The authors have requested for the CDRC to not publish this paper in order for it to be progressed into a journal paper. 
Author: Aihua Zhang and Paul King, University of Leicester
Abstract: The Economist article “The immigration paradox Explaining the Brexit vote” (14 July 2016) argues that the rate of change in the number of migrants in an area, rather than the total headcount influenced the Brexit vote. This argument, however, was simply made by looking at the individual factor of ‘foreign-born’ (or ‘UK-born’) population in isolation, with no formal analysis. By contrast, in her recent research paper published in World Development (Volume 102, February 2018), Zhang applied two statistical analyses (Multivariate Regression and Logit Regression) to the actual referendum voting data obtained from the Electoral Commission and the UK’s latest census data. She found that the impact of the factor of ‘UK-born’ (and thus ‘foreign-born’) population proportions on the EU referendum results was insignificant, while other factors, such as, Higher Education, Turnout, Gender dominated the impacts on the outcome of the EU referendum. To address the question of ‘whether the rate of change in number of migrants in an area influenced the Brexit vote’, we apply the two aforementioned statistical approaches to the CDRC geographical dataset of 11 ethnicity categories that are mapped to the referendum results by Local Authority district or Council Area. Total headcount /level of immigration had no significant impact on the Brexit vote; the rate of change in ethnic mix had some minor impact on the referendum result;Areas in England and Wales with higher increase rates of British Chinese populations tended to vote Remain.

CDRC Intern wins award from the International Association of Law Enforcement Intelligence Analysts (IALEIA)

CDRC and LIDA Intern Natacha Chenevoy recently received an award from the International Association of Law Enforcement Intelligence Analysts (IALEIA) for her work with Lancashire Constabulary on identifying online hate crime.

The award recognises individuals for outstanding contributions as intelligence analysts, investigators, or prosecutors utilising intelligence products leading to the achievement of the organisation’s objectives’

Natacha’s projects, in collaboration with Lancashire Constabulary, explored the use of social media data to identify online hate crimes.  You can read more about each project below:

Project 1: Application of Natural Language Processing for identification of online hate on Twitter

Project 2: Analysis of police-recorded hate crime in Lancashire

Natacha collected her award at the IALEIA Annual Training Event in Los Angeles, which was attended by over 700 participants from across the globe.

CDRC-GISRUK Data Challenge: winner announced

We invited researchers intending to register as GISRUK 2018 conference delegates to develop a novel analysis or visualisation of CDRC and associated data, in order to investigate the hypothesis set out in the Economist article “The immigration paradox Explaining the Brexit vote” that argues that the rate of change in number of migrants in an area rather than the total headcount influenced the Brexit vote. The article can be viewed here.

Issues that we asked participants to potentially address included (but by no means were limited to):

  • Whether Local Authority district is the most appropriate scale at which to ground analysis
  • Whether country of birth or ethnicity as defined by CDRC is the best predictor of voting behaviour
  • Whether the country of birth of recent immigrants plays any role in shaping voting intentions
  • Whether enfranchised members of recently arrived ethnic minority groups are themselves likely to vote for Brexit
  • Whether established party political affiliations affect the share of the Brexit vote
  • Whether voting behaviour varies according to other local, Regional or national circumstances.

We provided two sets of data, to form the main sources of data for the challenge:

  • CDRC small area predicted ethnicity data from 1998-2017.
  • A copy of the Electoral Commission’s official results of the UK’s EU referendum results, by voting area (council areas in Scotland, constituencies in England and Wales, and a single result for Northern Ireland).

Judging the challenge
In second place, and highly commended by the judges, was “Tension Points: A Theory & Evidence on Migration in Brexit” by Levi John Wolf from the University of Bristol. The judges thought this was a good conceptual work using a variety of datasets including the CDRC’s. It demonstrated a new application of a method, and was clearly constructed, convincingly modelled, showing some interesting findings and a very clear style of presentation, with the technical elements explained well and an effective results dissemination.

However, the winner, by the narrowest of margins, was “SpaCular – Disclosure of spatial peculiarities of the Brexit”, authored by Joao Porto De Albuquerque, Konstantin Klemmer (who presented), Andra Sonea, from the University of Warwick, and Rene Weserholt from the University of Heidelberg. The judges thought this paper showed good conceptual work, new application of a method, was very analytical and made good use of the CDRC-specific data, particularly through exploration of its temporal variation. The delivery of the research was clear and effective.

On winning the prize, presenter Konstantin Klemmer stated “Thank you for the opportunity to participate in this fantastic challenge and, of course, for our award! As recent news regarding the “Windrush” generation show, immigration is one of the most important political issues in the UK. As such, the task set by the CDRC data challenge was particularly fascinating to our team. With backgrounds in geography, social science and computer science, we took an interdisciplinary approach to the challenge, focusing not only on the temporal aspects of immigration, but also the spatial dimension. Our findings not only support the initial hypothesis posed by “The Economist” that change in immigrant population drove the Brexit vote, but also that the spatial composition of immigration patterns is crucial! With information about spatial variation added, we can substantially enhance the initial temporal trend model. The CDRC challenge was an overall great experience for our team and has motivated us to explore our findings further and continue our studies. Thanks again to the whole CDRC team for hosting this brilliant challenge”.

The winning Warwick/Heidelberg team share a prize of £500 and a copy of the best-selling book “London: The Information Capital”, co-authored by CDRC co-investigator and UCL senior lecturer Dr James Cheshire.

The four short papers that were shortlisted for presentation, including the winning one, will be published on the CDRC website in due course.

Building simple smartphone apps workshop: Opinion from LIDA intern David Marshall

I recently attended a 1-day course to learn how to build simple smartphone apps without coding, hosted by the Consumer Data Research Centre (CDRC) in the Leeds Institute for Data Analytics (LIDA). The training was delivered by Dr Chris Birchall of the Leeds University School of Media and Communication. I went into the course with no knowledge about the way apps work or how to create them, but by the end was able to successfully create my own simple app which allowed users to fill in a form and which collected their data into a spreadsheet. In just a couple of hours I was able to learn the basics of app development.

The course started with introductions and I was impressed by the range of research and industry backgrounds from which the other delegates came and the range of rationales for being at the training, with a mix of students, researchers, teachers and industry professionals looking to develop apps for businesses.

The software we used during the training is Thunkable, which is a free-to-use software allowing users to drag and drop visual objects in order to create an app which can be used on Android or iOS devices. Once I’d registered with Thunkable and downloaded the app, creating the initial interface for my app was easy. Simple drop down menus and drag and drop objects allowed me to create the interface for a potential login screen. Typing a code into the “Test function on the Thunkable app then enabled me to test the app on my phone easily.

Creating a login screen for my app on Thunkable

 

The (slightly) trickier bit came when attempting to link the buttons and menus to other screens within the app. The idea for this comes from a software called Scratch which allows users to create blocks of code using a jigsaw technique. This is a much more accessible way for people with no experience of coding to learn the basics of coding. I was able to drag and drop jigsaw pieces from the left hand side, joining them on the right hand side, to create an app with a register and login page, as well as home page once logged in.

Linking buttons and menus using the Thunkable app

 

I then linked the app I created with Google spreadsheets so that the data filled in by the user could be stored safely and easily in a secure database. It was also possible to display some results from the database (a list of names) live on the app itself. The idea of this is that you could display live events linked to a calendar or a list of people attending an event etc.

Throughout the session I was able to browse some of the other features and potentials of the software. Apps created using the software can be exported and put on the Google Play Store for free at the touch of a button (iOS is not free and the app must meet certain requirements). Images, passwords, buttons, menus, sliders and web browsers are all easy to add. The app can be linked to the smartphone’s camera, recorder, accelerometer, location and barcode scanner. It can also create easy links to external apps such as social media. From what I can tell the only downside of the software is that creating better looking graphics and interfaces is extremely difficult. However, as a free piece of software for creating simple apps I found it impressive and very easy to use.

David Marshall is a Leeds Institute for Data Analytics intern who has just completed the research project, Textile Data Analytics (TDA): to enable Technology Innovations in Fashion Industry, in partnership with a high-end fashion retailer.

This training course is the first of two on Building Simple Smartphone Apps being hosted by the CDRC as part of Leeds Digital Festival 2018.

Oxford Retail Futures Conference – call for papers

Understanding outcomes of data-driven research in retailing

Background

Retailing is one of the first sectors to have employed large datasets at both strategic and operational levels for a variety of purposes, ranging from frequency marketing, store location, product selection, and supply chain management.

The amount of data generated by internet users, mobile devices, sensors (Internet of Things), organisational and integrative IT systems continues at significantly high levels. A high volume of data, in a variety of formats, can be relatively easily captured and stored.

However, the challenge lies in the extent to which data-driven research and analysis can deliver real insights and outcomes which carry genuine business value, how the results of the analyses will be used, and how data-related tools can improve business performance and competitiveness.

Topic Selection

In this call for papers or extended abstracts (minimum 1 page of A4), we would like to capture the current state of the art in areas related to large data sets and real-time analytics in the retail sector and supply chains, particularly but not exclusively – in relation to spatial data. Submissions may include theoretical and conceptual work, as well as examples from practice, but should focus on outcomes, impact, and/or managerial implications. Results of analysis of large data sets such as those of the ESRC Data Initiative’s Consumer Data Research Centre (https://data.cdrc.ac.uk/) are also welcome.

The call is focused, non-exclusively, on the following topics (applied in the retail context, both at the store-end and in the extended retail value/supply chain):

  • Analysing the spatial consequences of changes in customer shopping behaviour
  • Behavioural outcomes, including nudging effects
  • Changing role of performance measurement in relation to physical space
  • Consequences of data-driven analysis for retail business models
  • Data-driven changes in organisational practice
  • Ethical aspects of data collection & analysis
  • Evaluating the impact of new data sources for retail firms
  • Impacts of data on the efficiency of corporate decision-making by retailers, suppliers and third party business service firms
  • Implications of data-driven research in retailing for public policy
  • New methods and tools for analysis of retail data
  • Supply chain consequences of emerging retail distribution networks
  • The data requirements of omnichannel retailing

Papers submitted will be reviewed by the academic board. Extended abstracts and work in progress are welcome.

Deadlines

  • 20th August 2018 – draft abstract/paper submission
  • 24th September 2018 – notification of abstract/paper acceptance
  • 25th November 2018 – submission of final papers/extended abstracts

Members of the Conference Academic Board

  • Dr Richard Cuthbertson, OXIRM, Saïd Business School, University of Oxford, UK
  • Dr Jonathan Reynolds, OXIRM, Saïd Business School, University of Oxford, UK

Contact Details

The conference is being organised jointly by the Oxford Institute of Retail Management, Saïd Business School, University of Oxford and the Consumer Data Research Centre (CDRC).

Registration details can be found here.

For enquiries, please contact OXIRMEnquiries@sbs.ox.ac.uk.

Registration fee

£195

The fee can be waived for students and presenters. Therefore, please use the dedicated students or presenters registration link.

All other delegates please use this registration link

Opinion: Big Data in Public Health LIDA Seminar

The LIDA seminar, “Big Data in Public Health – future horizons, applications and ethical issues” (13th March), was one of the most well-attended and rigorously engaging we have been to at Leeds so far this year, and if the audience Q&A afterwards was any indication, it seems many others would agree. Dr Michelle Morris (University Academic Fellow in Health Data Analytics, affiliate of the CDRC and Leeds Institute for Data Analytics) began proceedings with a talk which presented the case for using Big Data to fill in the gaps in the narrative for public health research. Then Dr Jon Fistein (Associate Professor in Clinical Informatics, Division of Health Services Research, LIHS) took up the Big Data in public health baton to expertly shine a light on some of the critical legal and ethical questions surrounding data usage for public health research.

Dr Morris opened with a cautionary warning that Big Data is not a “magic pill” which is going to single-handedly solve all of the problems endemic in public health research, nor did she suggest that more ‘traditional’ data like the National Diet and Nutrition Survey or GP questionnaire data be replaced by Big Data. By examining Big Data for public health research through the lens of the current obesity epidemic (simply put, resulting from the imbalance of diet and physical activity), she presented a nuanced reasoning of the ways in which Big Data can help to supply the deficits in public health research created by using exclusively so-called traditional data. For example, she pointed out that in the instance of the National Diet and Nutrition Survey, the data is never going to be as representative as researchers would like due to the limited sample size (6000 people every 6 years), despite great efforts made to recruit a representative sample. She described this data as ‘made’ data where the burden of data generation is on the participants and researchers, and where there is likely to be a bias inherent because the data is being generated for a specific purpose. This contrasts with Big Data which is ‘found’ data, such as transactional data available through supermarket storecards.

The example of Bounts’ physical activity app data, where users elect to share data and the scale of data available is vast, demonstrated the potential of Big Data in public health. Here, the generation of data is intuitive, the data itself arguably more objective because it is captured directly from the source and not biased by recall from a faulty memory, and the burden of generation sits with businesses. Dr Morris’s specific example from 2016 Bounts data also illustrated some of the potential pitfalls in interpreting this kind of Big Data in isolation – i.e. peaks and troughs in activity were deemed to be congruent with changes in British Daylight Saving Time, whereas Bounts suggested other promotional and contractual reasons for the peaks and troughs evident in the data. Expanding on some of the challenges that must be addressed when using Big Data, Dr Morris pointed out that supermarket storecard data can frustrate attempts to isolate the consumer patterns of individuals due to unintentional aggregation – i.e. one household’s shopping captured on one storecard. Nonetheless, the potential of engaging with businesses and consumers which generate Big Data and opening up a dialogue with them about data use were plain and compelling, especially in the age of new Big Data.

By asking the room a series of direct questions designed to make us think about our attitudes to sharing our own data for public health research, Dr Morris succeeded in pointing out that the collection of data and its applications in public health research are issues which affect us all, and about which we are bound to have an opinion. Perhaps the most interesting question was whether the audience felt that Big Data could be used exclusively to populate all of the domains identified on the Obesity System Map:

Map identifying all the main areas which have an impact on obesity.
Whole Systems Approach map illustrating the factors impacting on obesity.

 

The feeling in the room was weighted in favour of ‘no’ in response to this question. Dr Morris explained that, whilst Big Data cannot populate all of these specific nodes, it can populate the majority (as per findings [paper forthcoming], from the Obesity Strategic Network, of which Dr Morris is Director). She concluded that, when used in conjunction with existing traditional data, Big Data is a significant contributor in providing answers to questions of great moment in public health research. Finally, with the examples of two PhD projects by Emma Wilkins and Rachel Oldroyd respectively, it was further demonstrated how consumer-generated Big Data can be timelier, include more metadata and more representative than traditional data.

Having heard of the great potential of Big Data in public health, we the audience then enjoyed an exploration of the legal and ethical issues surrounding public health from Dr Fistein, who is trained as a medical doctor and barrister, and currently sits on the Independent Group that Advises NHS Digital on the Release of Data (IGARD). He agreed that there is more potential to link data in the era of Big Data (or new forms of data irrespective of size). By virtue of the fact that Big Data is ‘found not made’, the challenge for public health research ethically and legally speaking is in determining whether new uses of such data is within the expectations of those to whom it relates when they originally ‘provided’ it.

After outlining some of the broad issues of privacy and consent in the digital age, Dr Fistein invited the audience to give their opinion on whether public health research is in the ‘public interest’ which would provide a legal gateway for the use of data for public health. The response was tentative at first with people wondering if this was a question designed to catch them out, but there was general agreement that is was. Dr Fistein pointed out that, unfortunately, many legal definitions of ‘Public Interest’ do not include public health. He observed that there is therefore a tension between the legal position and that of public health practitioners, many of whom argue that public health activities generally are in the public interest, and this should enable data to be used. An example of such uses is illustrated by the Learning Healthcare System in the following PDF.

The Learning Healthcare System

He noted that one proposed solution to the issue of using data for public health is to anonymise it (as this could potentially make the data use lawful). However, he pointed out that there are several issues that render this approach problematic. One issue is data quality, as it may be impossible to reliably link datasets together when ‘anonymised’ or it may be necessary to use identifiable data to ensure reliable linkage. He also described the challenges related to the sense of ownership individuals feel towards data which is generated about them. Citing the well-known example of the catholic woman who uses a contraceptive pill in order to treat a health issue unrelated to contraception (from Pattinson, S., Medical Law and Ethics, Sweet and Maxwell), he pointed out she might be aggrieved were she later to learn that data about her usage of that pill were being used in a study on how to improve contraceptive treatments.  He related this to the concept of ‘context collapse’ (as described by the Wellcome Trust) – when the patient receives care in one explicit context and then finds that data about them is being used for another purpose which has not been made explicit. In respect of this, Dr Fistein reminded everyone of Dame Fiona Caldicott’s principle that: “There should be no surprises in data use.” Dame Fiona also said in the 2017 National Data Guardian Report, “the most praiseworthy attempts at innovation falter if they lose public trust.”

Taking this further, Dr Fistein concluded by saying that trust is meaningless without a notion of ‘trustworthiness’ (Onora O’Neill https://www.youtube.com/watch?v=XWwTYy9k5nc) and that public health researchers have an obligation to demonstrate trustworthiness, for example by being transparent about the work they are doing and its benefits. On a practical note, he pointed out that as a bare minimum, public health practitioners and researchers should also know the law and be able to defend their position in relation to it. He recommended that they should consult experts early when they are considering data use, in order to anticipate and address any potential ethical or legal issues.

The two presentations were followed by an animated Q&A session with many of the audience members (a significant number of whom were public health registrars) remarking on how engaging they’d found the speakers and the subjects.