2016 was an eventful year. The narrow votes in favour of Brexit in the UK and Trump in the US were a shock to many. You’ve probably heard commentators remark on the underlying causes for why people voted as they did. A familiar caricature is of blue collar disaffection (Leave and Trump) versus liberal, metropolitan values and relative affluence (Remain and Clinton). But is this true of the entirety of the UK and US?
The CDRC is pleased to launch a new training short course at Leeds this year, “Explaining Brexit and Trump with Tidy data graphics”, to tantalise the analytical imaginations of researchers interested in using data to examine the influences on human decision-making within the social sciences. Taking place on the 2nd May, this course will be delivered by Dr Roger Beecham, who will lead an exploration of the story told by the data behind the EU referendum vote and American presidential election. You’ll learn how to develop a family of data graphics (in R), each of which will reveal a bit more of the data puzzle behind the UK referendum and US election results.
A lot of theories have been ventured in popular media as to why both votes went the way they did – some ‘clickbait’, others more intriguing – but in this 1 day course you’ll have the opportunity to explore the hard data behind the votes; to look at these data in the specific contexts of socio-demographic variables; and to evaluate area-level variation in the votes. It is a course designed to elucidate and lead you toward more data-grounded answers to the questions of what happened in 2016 and why people voted the way they did.
In addition to understanding a little more about the political phenomena, you will:
learn how to data wrangle, reshape and curateTidy data in R
appreciate some key principles of good data visualization design
confidently generate data graphics using a consistent vocabulary (ggplot2)
develop an intuitive understanding of statistical modelling procedures
Sound like fun? You can find out more and book on the course here.
Want to take part in a Data Challenge on the subject of Brexit? The CDRC is currently challenging those planning to attend the GISRUK 2018 conference to use CDRC datasets to submit responses to the theory launched in The Economist article, “The immigration paradox Explaining the Brexit vote” – find out more here.
We’ve been seeing some interesting trends developing in researchers’ preferences for certain programming software over others.
When it comes to spatial modelling, building maps and visualising data, ArcGIS has long been considered the front runner. Its intuitive ‘point and click’ interface means that you can build professional looking outputs from scratch without the need for any programming experience.
Some of the stand-out merits of ArcGIS include:
Its ability to handle a large variety of spatial and non-spatial data formats
The layout view which allows creation of professional outputs
The ‘Add data’ button which recognises tables, rasters and all GIS formats
It’s a great tool for easy visualisation
The table joins in ArcGIS are intuitive, enabling the linking of spatial and other data
It handles Coordinate Reference Systems in a user friendly way
It gives you access to ArcGIS Online (AGOL): a great resource for sourcing a whole range of GIS datasets
A basic licence in ArcGIS still gives you access to a large number of tools
It has a variety of purchasable useful plug-ins such as crime analyst
You’re a shoe-in for creating beautiful maps where titles and legends are far easier to add
The ArcMap splash screen (i.e. start-up screen) displays all your latest documents
Of course, it is a subscription software and access to full ArcGIS features is determined by the level of your licence. Although the subscription fee can be expensive, ESRI offers award-winning support services and extensive help documentation. At the end of the day, its map-building and visualisation potential is vast and user-friendly.
But could there be another software contending in the race for slick visualisation and professional outputs?
R has been called a “a statistical powerhouse programming language”* which combines a framework of integrated processing, analysis and modelling. It’s perfect for the researcher as all outputs are easily reproducible and seemingly the most popular tool for data mining in business and academia. However, it has a steep learning curve in order to get up to speed with its command programming set-up, which may not be intuitive for non-programmers, though this is immeasurably helped by the use of RStudio.
Some of the hands-down merits of R include:
It’s open source and therefore supported by a vast online community of users happy to share their wisdom for free
It’s a superior data analysis tool as it is able to handle large amounts of data
Its base package includes all standard statistical tests, models and analyses
It’s versatile in allowing you to manipulate data
It’s designed to be used with spatial and statistical data
You can use R to solve complex data science, machine learning and statistical problems
Geographically weighted regression and spatial interaction models can be custom built around your spatial data in R
The key difference between R and ArcGIS though when it comes to spatial visualisation and mapping is that operations in R are command-centred and therefore visualisations may only be created and edited by altering command codes. R does have an extensive graphics library but creating a professional output can be very time-consuming for a beginner. Once you get the hang of it, this is great, but it is not necessarily as intuitive as ArcGIS. There’s also no dynamic canvas with which to pan and zoom. But on the other hand it is free!
Of course, you can combine R with ArcGIS and get the best of both worlds to become an absolute spatial wizard! There is now the Esri R-ArcGIS bridge which enables you to increase the capabilities of analyses across different disciplines. In real terms it allows you to transfer your data between ArcGIS and R without losing any of the functionality and formatting. Learn more here. Or you might prefer to use QGIS as an alternative to ArcGIS – learn more about the relative merits of ArcGIS and QGIS here.
The Consumer Data Research Centre (CDRC) is proud to announce that we will be supporting the Financial Conduct Authority (FCA) in an upcoming Data-Sprint event. Spread across two days on the 20th – 21st March 2018, at the Data-Sprint teams of individuals with varying skill sets will generate innovative solutions for tackling organisational challenges. Data for the event will be provided by the FCA through the CDRC data platform and made available to participants the week before.
The Data-Sprint will focus around cases relating to the recent Financial Lives survey. For the published report, questionnaire and data tables, click here.
The FCA has collated an extraordinary volume of consumer information and will experiment with this, in order to present findings that are insightful and/ or creative. Some of the key questions to be tackled during the event are likely to be:
What tools can the FCA use to best visualise these data and make the survey findings both insightful and easy to use?
How the FCA can create insights to inform specific organisational decisions? This part of the Data-Sprint will include a focus on the wealth of data on the financial products consumers have that the survey provides.
Can the Financial Lives data be linked with, or reported alongside, other data, to create enhanced insight? These other data could be FCA proprietary data, other proprietary data or data in the public domain.
Who can attend? The FCA would like to invite you along to participate in the Data-Sprint. This will be a great opportunity to help shape the way the FCA thinks about data and collaborates with external experts.
How to apply? If you’d like to attend, please email Ed Towers and Jimmy Galloway at ‘theanalyticscommunity@fca.org.uk’ by 3rd March 2018, including a brief description of your current role, your technical capability and what it is about the Data-Sprint that appeals to you. Places are limited and will be allocated based on skills and availability.
For further information about the event click here.
The Waterloo Institute for Complexity & Innovation (WICI) has announced the theme of this year’s Specialist Conference: “Modelling complex urban environments”. The conference takes place on the 21st-22nd July 2018 at the University of Waterloo (Ontario, Canada) with the intention of bringing together researchers from multiple disciplines with experience and interest in modelling complex environments, from smart cities to urban planning.
The call for abstracts is now open with a deadline of 1st March 2018 (see below for specific guidelines).
Integrating “big data” and “smart cities” data with urban modelling
This theme is designed to interrogate the role which increasingly available so-called ‘Big’ data has to play in altering the ways in which spatial, statistical and geographical analysis is conducted. It will look at the ways in which new methodologies have been fostered by the arrival of Big data in order to better understand how urban systems and infrastructure behave, with an emphasis on how they may be useful in the drive towards sustainability and efficiency. The goal of the session is to bring research together on the topic in order to produce a journal issue.
Within this theme, WICI are particularly interested in papers that engage with the following:
Integrating urban analytics and agent/individual-based modelling
Machine learning for urban analytics
Innovations in consumer data analytics for understanding urban systems
Real-time model calibration and data assimilation
Spatio-temporal data analysis
New data, case studies, demonstrators, and tools for urban analytics
Geographic data mining and visualisation
Frequentist and Bayesian approaches to modelling cities.
Paper proposals should include:
a short but descriptive title
a list of all contributing authors and their affiliations
an abstract of no more than 250 words
a list of 3-5 keywords
and an identification of the theme to which the proposal is submitted, if applicable.
Session proposals should include:
a short but descriptive title
a session abstract of no more than 250 words
a list of organizers and their affiliations
3-5 keywords
and a list of potential paper contributions, following the format from above.
All proposals should be directed to Noelle Hakim (noelle.valeriotehakim@uwaterloo.ca). For full details on the conference and all proposed themes, see here.
The call for abstracts is now open with a deadline of 30th April 2018 (see below for specific guidelines).
With this year’s focus on sustainability and change, the conference especially encourages papers that will challenge the status quo in corporate attitudes towards sustainability and poses the question: how do we get business and consumers truly engaged in addressing the grand challenges of the present ecological and sociocultural crisis? Two sub-themes may be of particular interest, chaired by researchers from our very own Leeds Institute for Data Analytics:
Sustainability and Big Data: Dr Phani Kumar Chintakayala, University of Leeds
From the CRRC:
“The conference is looking for theoretically informed and practically relevant papers on business and consumer involvement for sustainable change. It welcomes contributions from different disciplines and fields of study, including literatures on corporate responsibility, corporate sustainability, sustainable consumption, sustainable development, business and society, business ethics, ethical consumption, sustainable entrepreneurship, and organisation and the environment.”
The requirements for initial abstracts are as follows:
Files should be sent in MS Word format, and the file name should be first author’s surname. Please include names, affiliations and contact details of all authors.
Please use a maximum of 500 words, answering the following questions:
Research Question: What is the research question that the submission aims to answer?
Theoretical Framework: What are the main concepts, models or theories used in the paper? Include 3-4 central references.
Method: Which method is used for the research work?
Findings: What are the main outcomes and results of the paper?
Which sub-theme is your paper aimed at or is it for the open call?
Abstracts will be reviewed and selected by the scientific committee of the conference and authors will be notified of acceptance by 15th May when the conference registration opens.
Huge quantities of networked sensors have appeared in cities across the world in recent years. These include cameras and sensors that count the number of passers by, devices to sense air quality, traffic flow detectors, and even bee hive monitors. There are also large amounts of information about how people use cities on social media services such as Twitter and foursquare.
Citizens are even making their own sensors – often using smart phones – to monitor their environment and share the information with others; for example, crowd-sourced noise pollution maps are becoming popular. All this information can be used by city leaders to create policies, with the aim of making cities “smarter” and more sustainable.
But these data only tell half the story. While sensors can provide a rich picture of the physical city, they don’t tell us much about the social city: how people move around and use the spaces, what they think about their cities, why they prefer some areas over others, and so on. For instance, while sensors can collect data from travel cards to measure how many people travel into a city every day, they cannot reveal the purpose of their trip, or their experience of the city.
With a better understanding of both social and physical data, researchers could begin to answer tough questions about why some communities end up segregated, how areas become deprived, and where traffic congestion is likely to occur.
Difficult questions
Determining how and why such patterns will emerge is extremely difficult. Traffic congestion happens as a result of personal decisions about how to get from A to B, based on factors such as your stage of life, your distance from the workplace, school or shops, your level of income, your knowledge of the roads and so on.
Congestion can build locally at pinch points, placing certain sections of the city’s transport networks under severe strain. This can lead to high levels of air pollution, which in turn has a severe impact on the health of the population. For city leaders, the big question is, which actions – imposing congestion charges, pedestrianising areas or improving local infrastructure – would lead to the biggest improvements in both congestion, and public health.
The irony is, although modern technology has the power to collect vast amounts of data, it doesn’t always provide the means to analyse it. This means that scientists don’t have the tools they need to understand how different factors influence the way cities function and grow. Here, the technique of agent-based modelling could come to the rescue.
The simulated city
Agent-based modelling is a type of computer simulation, which models the behaviour of individual people as they move around and interact inside a virtual world. An agent-based model of a city could include virtual commuters, pedestrians, taxi drivers, shoppers and so on. Each of these individuals has their own characteristics and “rules”, programmed by researchers, based on theories and data about how people behave.
After combining vast urban datasets with an agent-based model of people, scientists will have the capacity to tweak and re-run the model, until they detect the phenomena they’re wanting to study – whether it’s traffic jams or social segregation. When they eventually get the model right, they’ll be able to look back on the characteristics and rules of their virtual citizens, to better understand why some of these problems emerge, and hopefully begin to find ways to resolve them.
For example, scientists might use urban data in an agent-based model to better understand the characteristics of the people who contribute to traffic jams – where they have come from, why they are travelling, what other modes of transport they might be willing to take. From there, they might be able to identify some effective ways of encouraging people to take different routes or modes of transport.
Seeing the future
Also, if the model works well in the present time, then it might be able to produce short-term forecasts. This would allow scientists to develop ways of reacting to changes in cities, in real time. Using live urban data to simulate the city in real-time could help to inform the managers of key services during periods of major disruption, such as severe weather, infrastructure failure or evacuation.
Using real-time data adds another layer of complexity. But fortunately, other scientific disciplines have also been making advances in this area. Over decades, the field of meteorology has developed cutting-edge mathematical methods, which allow their weather and climate models to respond to new weather data, as they arise in real time.
There’s a lot more work to be done before these methods from meteorology can be adapted to work for agent-based models of cities. But if they’re successful, these advancements will allow scientists to build city simulations which are driven by people – and not just the data they produce.
The University of Leeds currently has a number of PhD opportunities, in collaboration with The Alan Turing Institute, which are supervised by Prof Alison Heppenstall.
Project 1: Understanding the inner-workings of city-level agent-based models
Project 2: Uncovering hidden patterns and processes in social systems