Home » Archives for October 2016

Founder and President of Esri at UCL

The Consumer Data Research Centre (CDRC) are pleased to announce that Jack Dangermond, Founder and President of Esri, will be visiting University College London (UCL) and giving a talk on his vision for how GIS and Geography can help address the global challenges we face today to shape the future for our world. Esri, founded in 1969, are a global leader in GIS software sales and consulting, operating across various public, private and third sectors. Esri’s solutions encompass not only desktop GIS but are continually expanding in web and mobile GIS applications, enabling new applications of GIS in research and beyond.

The talk will take place on 9 November 2016 at 4:15pm at UCL (Pearson Building G22) with a short presentation from key UCL research groups (CDRC, SpaceTimeLab, ExCiteS, CASA) on some of the high profile projects that Esri have helped to support, followed by Jack’s presentation on Esri’s vision as a company. Afterwards, a small reception, sponsored by Esri, will be held in the Pearson Building G07, where people can meet Esri staff and UCL students to continue discussions on future possibilities with GIS.

Places are limited so please book early and if you find you are unable to attend please let us know so your place can be given to someone else.

To make a booking click here.

 

Demographic User Group Conference – Presentations now available

On 13th October 2016 the CDRC sponsored and supported the 13th annual Demographics User Group Conference (DUG), hosted at the Royal Society.

The conference focussed upon “Empowerment, Enquiry and Empathy – Reducing the soft skills gap” and bought together people from DUG’s 15 member companies, with special attendees from guests in government and universities, to spread knowledge and stimulate new ideas.

There were some great presentations and stimulating audience discussion over the course of the day around the conference’s key questions:

1. ‘What difference would it make to you, if analysts had clear paths along which they could develop their careers?’
2.’ In your own context what is the current balance between organisation and individual needs when considering analytical delivery?’

Presentations

The presentations are now available to watch online:

Chair’s Introduction
Tim Drye, Director of Demographic User Group

Data Science and real science: Narrative and decision making in the academy and the ‘real world’
Seth Spielman, Associate Professor of Geography, University of Colorado

Update on the Consumer Data Research Centre and Master’s Dissertation Prize Giving
Paul Longley, Professor of Geography, UCL

Fishbowls & Fabulous Failures: Are you curious?
Neil Wooding, Director of Strategic Planning and Performance, ONS

A snowball of failures – Workshop
Neil Wooding, Director of Strategic Planning and Performance, ONS

Democracy, Meaning and Negotiation: Empowering analysts both amateur and professional
James Morgan, Director of MI, British Gas

Team building by doing data for good
Fran Bennett, Trustee of DataKind & Pete Williams, Head of Enterprise Analytics, M&S

Panel Discussion: The Opportunities and Challenges of an analytical career within the commercial sector

Chair’s Final Reflections and the DUG Award

2016 Masters Research Dissertation Programme Winners Announced

The winners of the CDRC’s 2016 Masters Research Dissertation Programme were recently announced at the Demographics User Group Conference.

The CDRC led programme provides the opportunity for students to work directly with an industrial partner and links students’ research to important retail and ‘open data’ sources.  Once again the standard of the projects was extremely high this year, with students working with a range of partners, including Sainsbury’s, Shop Direct, Boots and E.ON.

The Winners

Prize winner: Luis Francisco Mejia and Movement Strategies

Luis’ research used temporal geodata collected from the mobile phones of attendees at a festival to model movements across a festival site. In particular, he looked at using complex machine learning techniques such as artificial neural networks to model and predict and when each participant is likely to visit catering facilities across the festival site. His models were then tested with a random selection of data which were not used in the original analysis and were found to be very successful.

The judges felt that this was a well-executed study with very clear aims and objectives. They also felt that the commercial relevance of the work is well communicated.

View Project Abstract

Runner up: Ffion Carney and E.On

Ffion’s study aimed to identify areas that contain a high proportion of vulnerable households that should be targeted as part of the ECO, by taking into account demographic and property characteristics alongside average annual energy consumption data.

The judges highlighted that this project tackles a very interesting research area and commented that Ffion had devised an appropriate methodology which could clearly address the research questions.

View Project Abstract

Runner up: Mariflor Vega and Sainsbury’s 

The aim of Mariflor’s study was to develop a means to understand the different types of customers based purely on the content of their baskets. She used a range of text mining techniques to harvest the data and group the customers.

The judges felt that this was a comprehensive analysis which was completed, explained, interpreted and presented well.

View Project Abstract

Other projects completed this year included:

  • Modelling Multi-Channel Adoption at Sainsbury’s – Sainsbury’s
  • An investigation of what triggers customer activiation of credit facilities – Shop Direct
  • An analysis of Argos concession store performance located in Homebase and Sainsbury’s stores across the UK – Argos
  • How does competitor presence influence the performance of click and collect sites?- Sainsbury’s
  • Identifying drivers of full price sales of clothing and footwear for an online retailer – Shop Direct
  • The performance of Argos concessions in other stores – Argos
  • Can interactive data visualizations enable a retailer to identify new insights about customer purchase behaviour? – Sainsbury’s
  • Youths Spending & Geodemographics – goHenry
  • Topic extraction and document classification on textual survey data with unsupervised modelling techniques. – CACI
  • An empirical study in to Co-op On-the-Go Stores’ turn-in rate using a scorecard approach – The Co-operative Food
  • An investigation into the potential of Bluetooth Beacons to monitor the movement of people on public transport: A preliminary case study of the Norwich Bus Network – Movement Strategies
  • Customer Segmentation using spatio-temporal data – Boots

 View all previous projects

2017 Retail Masters Dissertation Programme

We are now seeking proposals from businesses for new projects due to commence next spring (2017). Further information.

We hope to advertise the 2017 opportunities towards the end of the year.

Should you have any queries relating to the programme, please contact Guy Lansley.

Demographic User Group Conference 2016

On 13th October 2016 the CDRC sponsored and supported the 13th annual Demographics User Group Conference (DUG), hosted at the Royal Society.

The conference focussed upon “Empowerment, Enquiry and Empathy – Reducing the soft skills gap” and bought together people from DUG’s 15 member companies, with special attendees from guests in government and universities, to spread knowledge and stimulate new ideas.

There were some great presentations and stimulating audience discussion over the course of the day around the conference’s key questions:

1. ‘What difference would it make to you, if analysts had clear paths along which they could develop their careers?’
2.’ In your own context what is the current balance between organisation and individual needs when considering analytical delivery?’

Presentations included:

  • Data Science and real science: Narrative and decision making in the academy and the ‘real world’ – Seth Spielman, Associate Professor of Geography, University of Colorado
  • Update on the Consumer Data Research Centre – Prof Paul Longley, CDRC Director
  • Fishbowls & Fabulous Failures: Are you curious? – Neil Wooding, Director of Strategic Planning and Performance, ONS
  • Democracy, Meaning and Negotiation: Empowering analysts both amateur and professional – James Morgan, Director of MI, British Gas
  • Team building by doing data for good – James Morgan, Director of MI, British Gas

Prof Paul Longley also announced the winners of the CDRC’s 2016 Masters Research Dissertation Programme:

Prize winner: Luis Francisco Mejia and Movement Strategies
Runner up: Ffion Carney and E.On
Runner up: Mariflor Vega and Sainsbury’s

Find out more about the Masters Research Dissertation Programme.

If you missed the conference, we will be making the videos available shortly and in the meantime you can view our Storify for an overview of the day.

Blog: Geostat2016, Albacete

In late September, CDRC Research Fellow Robin Lovelace attended Geostat2016 in Albacete.  He provided the below write up on his return.

In late September I went to GEOSTAT 2016. Given the amount of fun had at GEOSTAT 2015, expectations were high. The local organisers did not disappoint, with a week of lectures, workshops, spatial data competitions and of course lots of Geostatistics. It would be unwise to try to systematically document such a diverse range of activities, and the GEOSTAT website provides much further info. Instead this ‘miniwriteup’ is designed to summarise some of my memories from the event, and encourage you to get involved for GEOSTAT 2017.

To put things in context, the first session was a brief overview of the history of GEOSTAT. This is the 12th GEOSTAT summer school. In some ways GEOSTAT can be seen as a physical manifestation of the lively R-SIG-GEO email list. That may not sound very exciting. But there is a strong community spirit at the event and, unlike other academic conferences, the focus is on practical learning rather than transmitting research findings or theories. And the event was so much more than that.

There were 5 action packed days covering many topics within the broad field of Geostatistics. What follows is an overview of each that I went to (there were 2 streams), with links to the source material. It is hoped that this will be of use to people who were not present in person.

Day 1

After an introduction to the course and spatial data by Tom Hengl, Roger Bivand delivered a technical and applied webinar onbridges between R and other GIS software. With a focus on GRASS, we learned how R could be used as a ‘front end’ to other programs. An example using the famous ‘Cholera pump’ data mapped by John Snow was used to demonstrate the potential benefits of ‘bridging’ to other software. The data can be downloaded and partially plotted in R as follows:

u = "http://geostat-course.org/system/files/data_0.zip"
download.file(u, "data_0.zip")
unzip("data_0.zip")
old = setwd("~/repos/geostat2016-rl/")
library(raster)
## Loading required package: sp
bbo = shapefile("data/bbo.shp")
buildings = shapefile("data/buildings.shp")
deaths = shapefile("data/deaths.shp")
b_pump = shapefile("data/b_pump.shp")
nb_pump = shapefile("data/nb_pump.shp")
plot(buildings)

setwd(old)

In the afternoon Robert Hijmans gave a high level overview of software for spatial data analysis, with a discussion of the Diva GIS software he developed and why he now uses R for most of his geospatial analysis.

The talk touched on the gdistance package, and many others. Robert showcased the power of R for understanding major civilisational problems such as the impacts of climate change on agriculture. His animated global maps of agricultural productivity and precipitation showed how R can scale to tackle large datasets, up to the global level involving spatial and temporal data simultaneously.

There were a few political asides. Robert mentioned how agrotech giant Monsanto paid almost $1 billion for a weather prediction company. He detoured deftly through a discussion of ‘big data’, making the observation that often ensembles of models can provide better predictions than any single model working on its own, with political analogies about the importance of democracy.

More examples included health and estimates of dietary deficiencies at high levels of geographic resolution. A paper showing fish and fruit consumption across Rwanda illustrated how map making in R, used intelligently, can save lives.

It was revealing to learn how Robert got into R. While he was working at the International Rice Research Institute. “It forces you to write scripts.” This is good for ensuring reproducibility, a critical component of scientific research. It encourages you to focus on and understand the data primarily, rather than visualising it. On the other hand, R is not always the fastest way to do things, although “people often worry too much about this”. Your time is more important than your computers, so setting an analysis running is fine. Plus there are ways to make things run faster, as mentioned in a book that I’m working on, Efficient R Programming.

R is great if you use it every data, but if you only use it less than once a week it becomes difficult.

If you just want a one-off spatial analysis data program, Robert recommended QGIS. After a brief overview of spatial data in R, Robert moved on to talk about the raster package, which he developed. This package was developed to overcome some of the limitations with sp, the foundational package for spatial data in R.

A final resource that Robert promoted was RSpatial.org, a free online resource for teaching R as a command line GIS.

Edzer Pebesmer delivered the final session of the first day, on Free and Open Source Software (FOSS) for Geoinformatics and Geosciences. After the highly technical final C++ examples from the previous talk, I was expecting a high level overview of the landscape. Instead Edzer went straight in to talk about source code, the raw material that defines all software. The fundamental feature of open source software is that its source code is free, and will remain free.

Day 2

The second day of the course was divided in two: stream A focussed on environmental modelling and stream B compositional data. I attended the environmental modelling course taught by Robert Hijmans. The course was based on his teaching material at rspatial.org and can be found online.

We started off by looking at the fundamental data structures underlying spatial data in R. Why? It’s useful to be able to create simple example datasets from scratch, to understand them.

library(sp)
x <- c(4,7,3,8)
y <- c(9,6,12,11)
xy <- data.frame(x, y)
SpatialPoints(xy)
## class       : SpatialPoints 
## features    : 4 
## extent      : 3, 8, 6, 12  (xmin, xmax, ymin, ymax)
## coord. ref. : NA
d = data.frame(v1 = 1:4, v2 = LETTERS[1:4])
spd = SpatialPointsDataFrame(coords = xy, data = d)
plot(spd)

The basic functions of the raster package are similar.

library(raster)
r = raster(nc = 10, nr = 10)
values(r) = 1:ncell(r)
plot(r)

as.matrix(r)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    2    3    4    5    6    7    8    9    10
##  [2,]   11   12   13   14   15   16   17   18   19    20
##  [3,]   21   22   23   24   25   26   27   28   29    30
##  [4,]   31   32   33   34   35   36   37   38   39    40
##  [5,]   41   42   43   44   45   46   47   48   49    50
##  [6,]   51   52   53   54   55   56   57   58   59    60
##  [7,]   61   62   63   64   65   66   67   68   69    70
##  [8,]   71   72   73   74   75   76   77   78   79    80
##  [9,]   81   82   83   84   85   86   87   88   89    90
## [10,]   91   92   93   94   95   96   97   98   99   100
q = sqrt(r)
plot(q)

x = q + r
s = stack(r, q, x)
ss = s * r # r is recycled so each layer is multiplied by r
1:3 * 2 # here 2 is recycled
## [1] 2 4 6

Raster also provides simple yet powerful functions for manipulating and analysing raster data, including crop(), merge() for manipulation and predict(), focal() and distance(). predict() is particularly interesting as it allows raster values to be estimated using any of R’s powerful statistical methods.

library(dismo)
g = gmap("Albacete, Spain", scale = T, lonlat = T)
## Loading required namespace: XML
plot(g, interpolate = T)
dismo::geocode("Universidad Castilla la Mancha")
##                    originalPlace
## 1 Universidad Castilla la Mancha
##                                          interpretedPlace longitude
## 1 Paseo Universidad, 13005 Ciudad Real, Cdad. Real, Spain -3.921711
##   latitude      xmin      xmax     ymin     ymax uncertainty
## 1 38.99035 -3.922007 -3.919309 38.98919 38.99189         131

Day 3

The third day started with a live R demo by Edzer Pebesmer on space-time data. Refreshingly for a conference primarily on spatial data, it started with an in-depth discussion of time. While base R natively supports temporal units (knowing the difference between days and seconds, for example) it does not know the difference between metres and miles.

This led to the creation of the units library, an taster of which is shown below:

install.packages("units")
library(units)
m = with(ud_units,  m)
s = with(ud_units,  s)
km = with(ud_units, km)
h = with(ud_units,  h)
x = 1:3 * m/s

The rest of the day was spent analysing a range of spatio-temporal datasets using spacetime, trajectories and rgl for interactive 3d spacetime plots.

In the parallel session there were sessions on CARTO and the R gvSIG bridge.

Day 4

Day 4 was a highlight for me as I’ve wanted to learn how to use the INLA package for ages. It was explained lucidly by Marta Blangiardo and Michela Cameletti, who have written an excellent book on the subject, which has a website that I recommend checking out. Their materials can be found here: http://geostat-course.org/node/1330 .

In parallel to this there was a session on Spatial and Spatiotemporal point process analysis in R data in R by Virgilio Gomez Rubio and one on automated spatial prediction and visualisation by Tom Hengl.

Day 5

After all that intense geospatial analysis and programming activity, and a night out in Albacete for some participants, we were relieved to learn that this final day of learning was more relaxed. Furthermore, by tradition, it was largely participant-led. I gave a talk on Efficient R Programming, a book I’ve written in collaboration with Colin Gillespie; Teresa Rojos gave a fascinating talk about her research into the spatial distribution of cancer rates in Peru; and S.J. Norder gave us the low-down on the Biogeography of islands with R.

One of the most exciting sessions was the revelation of the results of the spatial prediction game. Interestingly, a team using a relatively simple approach with randomForestSRC (and ggRandomForests for visualisation) one against others who had spent hours training complex multi-level models.

Summary

Overall it was an amazing event and inspiring to spend time with so many researchers using open geospatial software for tackling pressing real world issues. Furthermore, it was great fun. I strongly recommend people dipping their toes in the sea of spatial capabilities provided by R check out the GEOSTAT website, not least for the excellent video resources to be found there.

I look forward to hearing plans for future GEOSTATs and recommend the event, and associated materials, to researchers interested in using free geospatial software for the greater good.

Find out more about Robin’s work at http://robinlovelace.net/

 

Using Big Data To Map A City’s ‘Heartbeat’

CDRC researcher Oliver O’Brien recently developed the Tube Heartbeat, which shows the London Underground pulsing as passengers make their way around the city over the course of a typical weekday.  The visualisation combines the power of the HERE Maps API for JavaScript with data from Transport for London.

 

Nearly five million journeys were tracked in a single day to create the data visualisation, but users can also select individual Tube stations to see how passenger traffic varies from place to place.  O’Brien highlights a number of interesting examples:

  • Peak time at Leicester Square is after 10pm – the tube is a popular way to get back to homes and hotels after a night at the theatre.
  • Closing museums cause an early peak in South Kensington, while shoppers on Oxford Street can also be seen in the stats.
  • School kids cause spikes in usage across certain quieter stations, particularly in outer London
  • West ham’s morning peak entry is an hour before everyone else.  Other stations have two morning peaks.
  • Some places are changing character.  Stratford now has almost as many people arriving as leaving in the morning peak

Articles on the Tube Heartbeat can be found on Forbes, Wired and the TfL Digital Blog.