Home » Uncategorised » Page 19

CDRC co-investigator awarded a Gold Medal

Professor Michael Batty, co-investigator of the Consumer Data Research Centre (CDRC), was awarded with the Royal Town Planning Institute’s (RTPI) highest honour – the Gold Medal.

The Gold Medal is only awarded once every 2 years to individuals who have made an outstanding contribution to the field of planning.

Mike was presented with the Gold Medal on 8 November by the RTPI Vice President, Stephen Wilkinson, after delivering the annual Nathaniel Lichfield Lecture.  Mike’s lecture was entitled “The Planning Balance Sheet 60 Years On: Evaluation in the Digital Age”. Click here for a downloadable version of the lecture.

Big Data Week 2016

We hosted and participated in a series of events for Big Data Week 2016 (24-29 October). Events varied from public exhibitions to lectures and were attended by a range of local, national and international attendees.

Big Data Here
CDRC UCL kicked off their week with ‘Big Data Here’, a week long exhibition of live data providing an alternative, digital view of the physical space around UCL. Data sources included transit feeds from Transport for London (TfL) and demographic information from the Office for National Statistics (ONS). Throughout the week a different live data feed projected through the window of UCL’s North Lodge, giving passer by’s an opportunity to view the visualisations culminating from the feeds.

The most popular big screen visualisation proved to be the near live traffic camera feeds from TfL. Members of the public were free to interact with the exhibition each afternoon with a CDRC team member on hand to provide a more detailed overview of the concept behind Big Data Here. This saw an excellent turnout of local, national and international attendees keen to learn more about the exhibits.

Big Data Here was led by Oliver O’Brien, creator of CDRC Maps. Read his summary of Big Data Here.

Big Data for Marketers
The CDRC team, along with a number of students from our MSc Consumer Analytics and Marketing Strategy course, participated in the ‘Big Data for Marketers’ event in Leeds.  Hosted by Alscient, the city sponsor for Leeds Big Data Week, the event focused on the importance of Big Data in Marketing, taking stock of current analytical practices in the industry and looking at ways to improve in the future.  A common theme in the presentations and subsequent discussion was the need for graduates to have solid analytics skills and an understanding of Big Data when entering the industry.

24 Hour Climathon – How can we deliver domestic carbon reduction in an age of austerity?
CDRC students took part in a 24 hour Climathon event in Leeds which aimed to develop new solutions to reduce carbon emissions  The event, held by the Priestley International Centre for Climate, enabled students and stakeholders in Leeds to join citizens in world cities such as Paris and Shanghai simultaneously taking part in the hackathon-style Climathon event.  In Leeds, 28 participants took part from 12 noon on 28 October until noon the following day and focused on an energy efficiency challenge set by Leeds City Council. Read more about the event.

Empty Housing Innovation Lab
The final event of the week took place at the University of Liverpool in London, in conjunction with Student Data Labs. The event, titled ‘Empty Housing Innovation Lab’, took an interactive approach to tackling the housing crisis in the UK, using big data. The event had an excellent turnout of students from local universities including London School of Economics (LSE), Imperial College and UCL . CDRC project manager Sarah Sheppard provided an opening talk, with emphasis on some of the data sources already available to students (and others) via our datastore that can be utilised for research; this was followed by Anastasia Ushakova, CDRC Phd Student, providing a lightning talk on her own research on energy consumption and how this ties in with the housing crisis. A range of interactive activities followed, with the R workshop proving extremely popular amongst attendees.

So you want to be a data scientist?
The momentum continued beyond Big Data Week in Leeds, with the ‘So you want to be a data scientist?’ event falling on the 2nd November.  Organised by Alscient, and attended by students from Leeds Data Science Society and the MSc Consumer Analytics and Marketing Strategy course, the session offered practical advice and mentoring from industry based data scientists, as well as the opportunity to take part in a mini analytics challenge.

Masters Research Dissertation Programme 2017: Call for Industry Projects

Following another successful year of the Consumer Data Research Centre (CDRC) Masters Research Dissertation Programme, we are now seeking proposals from businesses for new projects due to commence in spring 2017.

The CDRC are aiming to open the application process for masters students in January and to therefore encourage applicants from a wider breadth of disciplines and institutions. The application process will be facilitated on the CDRC website and businesses are encouraged to interview/select successful applicants in early 2017.

This presents a great opportunity to get a Masters student to help you to make progress with:

  • Major current issues, such as multi-channel marketing, customer insight, store networks, transport, surveys, social media, brand insight, predictive modelling and many others.
  • ‘horizon scanning’ projects that are not of the highest priority in the day-to-day work schedules
  • Working with data – both your own in-house data, and also maximising the value to be obtained from Open Data or data from government or administrative sources (available through the CDRC)

It also publicises your company’s interest in students with data skills. Previous experience illustrates the main features of the initiative:

If you have a project in mind, please complete a project proposal form 2017 and email it to Guy Lansley at g.lansley@ucl.ac.uk by the start of December 2016.

All proposals need to be approved before they can be publicised to ensure that the students can maximise the academic potential from their dissertations. Alternatively, feel free to email Guy informally if you would rather discuss rough ideas for projects at this stage.

For more details please visit the Information for Retailers page.

Founder and President of Esri at UCL

The Consumer Data Research Centre (CDRC) are pleased to announce that Jack Dangermond, Founder and President of Esri, will be visiting University College London (UCL) and giving a talk on his vision for how GIS and Geography can help address the global challenges we face today to shape the future for our world. Esri, founded in 1969, are a global leader in GIS software sales and consulting, operating across various public, private and third sectors. Esri’s solutions encompass not only desktop GIS but are continually expanding in web and mobile GIS applications, enabling new applications of GIS in research and beyond.

The talk will take place on 9 November 2016 at 4:15pm at UCL (Pearson Building G22) with a short presentation from key UCL research groups (CDRC, SpaceTimeLab, ExCiteS, CASA) on some of the high profile projects that Esri have helped to support, followed by Jack’s presentation on Esri’s vision as a company. Afterwards, a small reception, sponsored by Esri, will be held in the Pearson Building G07, where people can meet Esri staff and UCL students to continue discussions on future possibilities with GIS.

Places are limited so please book early and if you find you are unable to attend please let us know so your place can be given to someone else.

To make a booking click here.

 

Demographic User Group Conference – Presentations now available

On 13th October 2016 the CDRC sponsored and supported the 13th annual Demographics User Group Conference (DUG), hosted at the Royal Society.

The conference focussed upon “Empowerment, Enquiry and Empathy – Reducing the soft skills gap” and bought together people from DUG’s 15 member companies, with special attendees from guests in government and universities, to spread knowledge and stimulate new ideas.

There were some great presentations and stimulating audience discussion over the course of the day around the conference’s key questions:

1. ‘What difference would it make to you, if analysts had clear paths along which they could develop their careers?’
2.’ In your own context what is the current balance between organisation and individual needs when considering analytical delivery?’

Presentations

The presentations are now available to watch online:

Chair’s Introduction
Tim Drye, Director of Demographic User Group

Data Science and real science: Narrative and decision making in the academy and the ‘real world’
Seth Spielman, Associate Professor of Geography, University of Colorado

Update on the Consumer Data Research Centre and Master’s Dissertation Prize Giving
Paul Longley, Professor of Geography, UCL

Fishbowls & Fabulous Failures: Are you curious?
Neil Wooding, Director of Strategic Planning and Performance, ONS

A snowball of failures – Workshop
Neil Wooding, Director of Strategic Planning and Performance, ONS

Democracy, Meaning and Negotiation: Empowering analysts both amateur and professional
James Morgan, Director of MI, British Gas

Team building by doing data for good
Fran Bennett, Trustee of DataKind & Pete Williams, Head of Enterprise Analytics, M&S

Panel Discussion: The Opportunities and Challenges of an analytical career within the commercial sector

Chair’s Final Reflections and the DUG Award

2016 Masters Research Dissertation Programme Winners Announced

The winners of the CDRC’s 2016 Masters Research Dissertation Programme were recently announced at the Demographics User Group Conference.

The CDRC led programme provides the opportunity for students to work directly with an industrial partner and links students’ research to important retail and ‘open data’ sources.  Once again the standard of the projects was extremely high this year, with students working with a range of partners, including Sainsbury’s, Shop Direct, Boots and E.ON.

The Winners

Prize winner: Luis Francisco Mejia and Movement Strategies

Luis’ research used temporal geodata collected from the mobile phones of attendees at a festival to model movements across a festival site. In particular, he looked at using complex machine learning techniques such as artificial neural networks to model and predict and when each participant is likely to visit catering facilities across the festival site. His models were then tested with a random selection of data which were not used in the original analysis and were found to be very successful.

The judges felt that this was a well-executed study with very clear aims and objectives. They also felt that the commercial relevance of the work is well communicated.

View Project Abstract

Runner up: Ffion Carney and E.On

Ffion’s study aimed to identify areas that contain a high proportion of vulnerable households that should be targeted as part of the ECO, by taking into account demographic and property characteristics alongside average annual energy consumption data.

The judges highlighted that this project tackles a very interesting research area and commented that Ffion had devised an appropriate methodology which could clearly address the research questions.

View Project Abstract

Runner up: Mariflor Vega and Sainsbury’s 

The aim of Mariflor’s study was to develop a means to understand the different types of customers based purely on the content of their baskets. She used a range of text mining techniques to harvest the data and group the customers.

The judges felt that this was a comprehensive analysis which was completed, explained, interpreted and presented well.

View Project Abstract

Other projects completed this year included:

  • Modelling Multi-Channel Adoption at Sainsbury’s – Sainsbury’s
  • An investigation of what triggers customer activiation of credit facilities – Shop Direct
  • An analysis of Argos concession store performance located in Homebase and Sainsbury’s stores across the UK – Argos
  • How does competitor presence influence the performance of click and collect sites?- Sainsbury’s
  • Identifying drivers of full price sales of clothing and footwear for an online retailer – Shop Direct
  • The performance of Argos concessions in other stores – Argos
  • Can interactive data visualizations enable a retailer to identify new insights about customer purchase behaviour? – Sainsbury’s
  • Youths Spending & Geodemographics – goHenry
  • Topic extraction and document classification on textual survey data with unsupervised modelling techniques. – CACI
  • An empirical study in to Co-op On-the-Go Stores’ turn-in rate using a scorecard approach – The Co-operative Food
  • An investigation into the potential of Bluetooth Beacons to monitor the movement of people on public transport: A preliminary case study of the Norwich Bus Network – Movement Strategies
  • Customer Segmentation using spatio-temporal data – Boots

 View all previous projects

2017 Retail Masters Dissertation Programme

We are now seeking proposals from businesses for new projects due to commence next spring (2017). Further information.

We hope to advertise the 2017 opportunities towards the end of the year.

Should you have any queries relating to the programme, please contact Guy Lansley.

Demographic User Group Conference 2016

On 13th October 2016 the CDRC sponsored and supported the 13th annual Demographics User Group Conference (DUG), hosted at the Royal Society.

The conference focussed upon “Empowerment, Enquiry and Empathy – Reducing the soft skills gap” and bought together people from DUG’s 15 member companies, with special attendees from guests in government and universities, to spread knowledge and stimulate new ideas.

There were some great presentations and stimulating audience discussion over the course of the day around the conference’s key questions:

1. ‘What difference would it make to you, if analysts had clear paths along which they could develop their careers?’
2.’ In your own context what is the current balance between organisation and individual needs when considering analytical delivery?’

Presentations included:

  • Data Science and real science: Narrative and decision making in the academy and the ‘real world’ – Seth Spielman, Associate Professor of Geography, University of Colorado
  • Update on the Consumer Data Research Centre – Prof Paul Longley, CDRC Director
  • Fishbowls & Fabulous Failures: Are you curious? – Neil Wooding, Director of Strategic Planning and Performance, ONS
  • Democracy, Meaning and Negotiation: Empowering analysts both amateur and professional – James Morgan, Director of MI, British Gas
  • Team building by doing data for good – James Morgan, Director of MI, British Gas

Prof Paul Longley also announced the winners of the CDRC’s 2016 Masters Research Dissertation Programme:

Prize winner: Luis Francisco Mejia and Movement Strategies
Runner up: Ffion Carney and E.On
Runner up: Mariflor Vega and Sainsbury’s

Find out more about the Masters Research Dissertation Programme.

If you missed the conference, we will be making the videos available shortly and in the meantime you can view our Storify for an overview of the day.

Blog: Geostat2016, Albacete

In late September, CDRC Research Fellow Robin Lovelace attended Geostat2016 in Albacete.  He provided the below write up on his return.

In late September I went to GEOSTAT 2016. Given the amount of fun had at GEOSTAT 2015, expectations were high. The local organisers did not disappoint, with a week of lectures, workshops, spatial data competitions and of course lots of Geostatistics. It would be unwise to try to systematically document such a diverse range of activities, and the GEOSTAT website provides much further info. Instead this ‘miniwriteup’ is designed to summarise some of my memories from the event, and encourage you to get involved for GEOSTAT 2017.

To put things in context, the first session was a brief overview of the history of GEOSTAT. This is the 12th GEOSTAT summer school. In some ways GEOSTAT can be seen as a physical manifestation of the lively R-SIG-GEO email list. That may not sound very exciting. But there is a strong community spirit at the event and, unlike other academic conferences, the focus is on practical learning rather than transmitting research findings or theories. And the event was so much more than that.

There were 5 action packed days covering many topics within the broad field of Geostatistics. What follows is an overview of each that I went to (there were 2 streams), with links to the source material. It is hoped that this will be of use to people who were not present in person.

Day 1

After an introduction to the course and spatial data by Tom Hengl, Roger Bivand delivered a technical and applied webinar onbridges between R and other GIS software. With a focus on GRASS, we learned how R could be used as a ‘front end’ to other programs. An example using the famous ‘Cholera pump’ data mapped by John Snow was used to demonstrate the potential benefits of ‘bridging’ to other software. The data can be downloaded and partially plotted in R as follows:

u = "http://geostat-course.org/system/files/data_0.zip"
download.file(u, "data_0.zip")
unzip("data_0.zip")
old = setwd("~/repos/geostat2016-rl/")
library(raster)
## Loading required package: sp
bbo = shapefile("data/bbo.shp")
buildings = shapefile("data/buildings.shp")
deaths = shapefile("data/deaths.shp")
b_pump = shapefile("data/b_pump.shp")
nb_pump = shapefile("data/nb_pump.shp")
plot(buildings)

setwd(old)

In the afternoon Robert Hijmans gave a high level overview of software for spatial data analysis, with a discussion of the Diva GIS software he developed and why he now uses R for most of his geospatial analysis.

The talk touched on the gdistance package, and many others. Robert showcased the power of R for understanding major civilisational problems such as the impacts of climate change on agriculture. His animated global maps of agricultural productivity and precipitation showed how R can scale to tackle large datasets, up to the global level involving spatial and temporal data simultaneously.

There were a few political asides. Robert mentioned how agrotech giant Monsanto paid almost $1 billion for a weather prediction company. He detoured deftly through a discussion of ‘big data’, making the observation that often ensembles of models can provide better predictions than any single model working on its own, with political analogies about the importance of democracy.

More examples included health and estimates of dietary deficiencies at high levels of geographic resolution. A paper showing fish and fruit consumption across Rwanda illustrated how map making in R, used intelligently, can save lives.

It was revealing to learn how Robert got into R. While he was working at the International Rice Research Institute. “It forces you to write scripts.” This is good for ensuring reproducibility, a critical component of scientific research. It encourages you to focus on and understand the data primarily, rather than visualising it. On the other hand, R is not always the fastest way to do things, although “people often worry too much about this”. Your time is more important than your computers, so setting an analysis running is fine. Plus there are ways to make things run faster, as mentioned in a book that I’m working on, Efficient R Programming.

R is great if you use it every data, but if you only use it less than once a week it becomes difficult.

If you just want a one-off spatial analysis data program, Robert recommended QGIS. After a brief overview of spatial data in R, Robert moved on to talk about the raster package, which he developed. This package was developed to overcome some of the limitations with sp, the foundational package for spatial data in R.

A final resource that Robert promoted was RSpatial.org, a free online resource for teaching R as a command line GIS.

Edzer Pebesmer delivered the final session of the first day, on Free and Open Source Software (FOSS) for Geoinformatics and Geosciences. After the highly technical final C++ examples from the previous talk, I was expecting a high level overview of the landscape. Instead Edzer went straight in to talk about source code, the raw material that defines all software. The fundamental feature of open source software is that its source code is free, and will remain free.

Day 2

The second day of the course was divided in two: stream A focussed on environmental modelling and stream B compositional data. I attended the environmental modelling course taught by Robert Hijmans. The course was based on his teaching material at rspatial.org and can be found online.

We started off by looking at the fundamental data structures underlying spatial data in R. Why? It’s useful to be able to create simple example datasets from scratch, to understand them.

library(sp)
x <- c(4,7,3,8)
y <- c(9,6,12,11)
xy <- data.frame(x, y)
SpatialPoints(xy)
## class       : SpatialPoints 
## features    : 4 
## extent      : 3, 8, 6, 12  (xmin, xmax, ymin, ymax)
## coord. ref. : NA
d = data.frame(v1 = 1:4, v2 = LETTERS[1:4])
spd = SpatialPointsDataFrame(coords = xy, data = d)
plot(spd)

The basic functions of the raster package are similar.

library(raster)
r = raster(nc = 10, nr = 10)
values(r) = 1:ncell(r)
plot(r)

as.matrix(r)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    2    3    4    5    6    7    8    9    10
##  [2,]   11   12   13   14   15   16   17   18   19    20
##  [3,]   21   22   23   24   25   26   27   28   29    30
##  [4,]   31   32   33   34   35   36   37   38   39    40
##  [5,]   41   42   43   44   45   46   47   48   49    50
##  [6,]   51   52   53   54   55   56   57   58   59    60
##  [7,]   61   62   63   64   65   66   67   68   69    70
##  [8,]   71   72   73   74   75   76   77   78   79    80
##  [9,]   81   82   83   84   85   86   87   88   89    90
## [10,]   91   92   93   94   95   96   97   98   99   100
q = sqrt(r)
plot(q)

x = q + r
s = stack(r, q, x)
ss = s * r # r is recycled so each layer is multiplied by r
1:3 * 2 # here 2 is recycled
## [1] 2 4 6

Raster also provides simple yet powerful functions for manipulating and analysing raster data, including crop(), merge() for manipulation and predict(), focal() and distance(). predict() is particularly interesting as it allows raster values to be estimated using any of R’s powerful statistical methods.

library(dismo)
g = gmap("Albacete, Spain", scale = T, lonlat = T)
## Loading required namespace: XML
plot(g, interpolate = T)
dismo::geocode("Universidad Castilla la Mancha")
##                    originalPlace
## 1 Universidad Castilla la Mancha
##                                          interpretedPlace longitude
## 1 Paseo Universidad, 13005 Ciudad Real, Cdad. Real, Spain -3.921711
##   latitude      xmin      xmax     ymin     ymax uncertainty
## 1 38.99035 -3.922007 -3.919309 38.98919 38.99189         131

Day 3

The third day started with a live R demo by Edzer Pebesmer on space-time data. Refreshingly for a conference primarily on spatial data, it started with an in-depth discussion of time. While base R natively supports temporal units (knowing the difference between days and seconds, for example) it does not know the difference between metres and miles.

This led to the creation of the units library, an taster of which is shown below:

install.packages("units")
library(units)
m = with(ud_units,  m)
s = with(ud_units,  s)
km = with(ud_units, km)
h = with(ud_units,  h)
x = 1:3 * m/s

The rest of the day was spent analysing a range of spatio-temporal datasets using spacetime, trajectories and rgl for interactive 3d spacetime plots.

In the parallel session there were sessions on CARTO and the R gvSIG bridge.

Day 4

Day 4 was a highlight for me as I’ve wanted to learn how to use the INLA package for ages. It was explained lucidly by Marta Blangiardo and Michela Cameletti, who have written an excellent book on the subject, which has a website that I recommend checking out. Their materials can be found here: http://geostat-course.org/node/1330 .

In parallel to this there was a session on Spatial and Spatiotemporal point process analysis in R data in R by Virgilio Gomez Rubio and one on automated spatial prediction and visualisation by Tom Hengl.

Day 5

After all that intense geospatial analysis and programming activity, and a night out in Albacete for some participants, we were relieved to learn that this final day of learning was more relaxed. Furthermore, by tradition, it was largely participant-led. I gave a talk on Efficient R Programming, a book I’ve written in collaboration with Colin Gillespie; Teresa Rojos gave a fascinating talk about her research into the spatial distribution of cancer rates in Peru; and S.J. Norder gave us the low-down on the Biogeography of islands with R.

One of the most exciting sessions was the revelation of the results of the spatial prediction game. Interestingly, a team using a relatively simple approach with randomForestSRC (and ggRandomForests for visualisation) one against others who had spent hours training complex multi-level models.

Summary

Overall it was an amazing event and inspiring to spend time with so many researchers using open geospatial software for tackling pressing real world issues. Furthermore, it was great fun. I strongly recommend people dipping their toes in the sea of spatial capabilities provided by R check out the GEOSTAT website, not least for the excellent video resources to be found there.

I look forward to hearing plans for future GEOSTATs and recommend the event, and associated materials, to researchers interested in using free geospatial software for the greater good.

Find out more about Robin’s work at http://robinlovelace.net/

 

Using Big Data To Map A City’s ‘Heartbeat’

CDRC researcher Oliver O’Brien recently developed the Tube Heartbeat, which shows the London Underground pulsing as passengers make their way around the city over the course of a typical weekday.  The visualisation combines the power of the HERE Maps API for JavaScript with data from Transport for London.

 

Nearly five million journeys were tracked in a single day to create the data visualisation, but users can also select individual Tube stations to see how passenger traffic varies from place to place.  O’Brien highlights a number of interesting examples:

  • Peak time at Leicester Square is after 10pm – the tube is a popular way to get back to homes and hotels after a night at the theatre.
  • Closing museums cause an early peak in South Kensington, while shoppers on Oxford Street can also be seen in the stats.
  • School kids cause spikes in usage across certain quieter stations, particularly in outer London
  • West ham’s morning peak entry is an hour before everyone else.  Other stations have two morning peaks.
  • Some places are changing character.  Stratford now has almost as many people arriving as leaving in the morning peak

Articles on the Tube Heartbeat can be found on Forbes, Wired and the TfL Digital Blog.

CDRC Intern Shortlisted for Information is Beautiful Award

CDRC Data Visualisation Intern, Herwig Scherabon, has been shortlisted for two categories in the KANTAR Information is Beautiful Awards.  The visualisations which Herwig developed, whilst studying for an MDes in Graphic Design at Glasgow School of Art, have been nominated for awards in the ‘Data Visualisation’ and ‘Interactive Visualisation’ Categories.

Affordability Explorer

herwig-2

See visualisation and vote

The Affordability Explorer is an interactive app that maps data about the housing affordability of 584 cities. The main intention is to inform about relation between house price, income and affordability.

Income Inequality in Los Angeles and Chicago 

herwig

See visualisation and vote

The two large prints (150x75cm) are visualizations of income inequality in Los Angeles and Chicago.  The images are abstract diagrams of these cities and show a high resolution matrix of blocks. The height of these blocks corresponds to the income in the respective output area.

Herwig is currently completing an internship at the CDRC and Leeds Institute for Data Analytics.  We will be sharing more of his visualisations over the coming months.