Home » Uncategorised » Page 14

ArcGIS and R – Uneasy Bedfellows?

We’ve been seeing some interesting trends developing in researchers’ preferences for certain programming software over others.

When it comes to spatial modelling, building maps and visualising data, ArcGIS has long been considered the front runner. Its intuitive ‘point and click’ interface means that you can build professional looking outputs from scratch without the need for any programming experience.

Some of the stand-out merits of ArcGIS include:

  • Its ability to handle a large variety of spatial and non-spatial data formats
  • The layout view which allows creation of professional outputs
  • The ‘Add data’ button which recognises tables, rasters and all GIS formats
  • It’s a great tool for easy visualisation
  • The table joins in ArcGIS are intuitive, enabling the linking of spatial and other data
  • It handles Coordinate Reference Systems in a user friendly way
  • It gives you access to ArcGIS Online (AGOL): a great resource for sourcing a whole range of GIS datasets
  • A basic licence in ArcGIS still gives you access to a large number of tools
  • It has a variety of purchasable useful plug-ins such as crime analyst
  • You’re a shoe-in for creating beautiful maps where titles and legends are far easier to add
  • The ArcMap splash screen (i.e. start-up screen) displays all your latest documents

Of course, it is a subscription software and access to full ArcGIS features is determined by the level of your licence. Although the subscription fee can be expensive, ESRI offers award-winning support services and extensive help documentation. At the end of the day, its map-building and visualisation potential is vast and user-friendly.

But could there be another software contending in the race for slick visualisation and professional outputs?

R has been called a “a statistical powerhouse programming language”* which combines a framework of integrated processing, analysis and modelling. It’s perfect for the researcher as all outputs are easily reproducible and seemingly the most popular tool for data mining in business and academia. However, it has a steep learning curve in order to get up to speed with its command programming set-up, which may not be intuitive for non-programmers, though this is immeasurably helped by the use of RStudio.

Some of the hands-down merits of R include:

  • It’s open source and therefore supported by a vast online community of users happy to share their wisdom for free
  • It’s a superior data analysis tool as it is able to handle large amounts of data
  • Its base package includes all standard statistical tests, models and analyses
  • It’s versatile in allowing you to manipulate data
  • It’s designed to be used with spatial and statistical data
  • You can use R to solve complex data science, machine learning and statistical problems
  • Geographically weighted regression and spatial interaction models can be custom built around your spatial data in R

The key difference between R and ArcGIS though when it comes to spatial visualisation and mapping is that operations in R are command-centred and therefore visualisations may only be created and edited by altering command codes. R does have an extensive graphics library but creating a professional output can be very time-consuming for a beginner. Once you get the hang of it, this is great, but it is not necessarily as intuitive as ArcGIS. There’s also no dynamic canvas with which to pan and zoom. But on the other hand it is free!

Of course, you can combine R with ArcGIS and get the best of both worlds to become an absolute spatial wizard! There is now the Esri R-ArcGIS bridge which enables you to increase the capabilities of analyses across different disciplines. In real terms it allows you to transfer your data between ArcGIS and R without losing any of the functionality and formatting. Learn more here. Or you might prefer to use QGIS as an alternative to ArcGIS – learn more about the relative merits of ArcGIS and QGIS here.

Co-authored by Rachel Oldroyd

 

Want to learn more about ArcGIS? Book on our training short course on 19th March with expert Rachel Oldroyd to find out more: https://www.cdrc.ac.uk/events/19725/

Want to learn more about R? Book on our training short course on 16th April with expert Richard Hodgett to find out more: https://www.cdrc.ac.uk/events/introduction-to-r-2/

Continue the debate with us! Let us know what you think @CDRC with the hashtag #ArcGISvsR

 

*citation: https://blogs.esri.com/esri/arcgis/2016/07/21/put-the-r-in-arcgis-2/

Data-Sprint 2018

The Consumer Data Research Centre (CDRC) is proud to announce that we will be supporting the Financial Conduct Authority (FCA) in an upcoming Data-Sprint event. Spread across two days on the 20th – 21st March 2018, at the Data-Sprint teams of individuals with varying skill sets will generate innovative solutions for tackling organisational challenges. Data for the event will be provided by the FCA through the CDRC data platform and made available to participants the week before.

The Data-Sprint will focus around cases relating to the recent Financial Lives survey. For the published report, questionnaire and data tables, click here

The FCA has collated an extraordinary volume of consumer information and will experiment with this, in order to present findings that are insightful and/ or creative. Some of the key questions to be tackled during the event are likely to be: 

  •   What tools can the FCA use to best visualise these data and make the survey findings both insightful and easy to use?
  • How the FCA can create insights to inform specific organisational decisions? This part of the Data-Sprint will include a focus on the wealth of data on the financial products consumers have that the survey provides.
  • Can the Financial Lives data be linked with, or reported alongside, other data, to create enhanced insight? These other data could be FCA proprietary data, other proprietary data or data in the public domain.

Who can attend?
The FCA would like to invite you along to participate in the Data-Sprint. This will be a great opportunity to help shape the way the FCA thinks about data and collaborates with external experts.

How to apply?
If you’d like to attend, please email Ed Towers and Jimmy Galloway at ‘[email protected]’ by 3rd March 2018, including a brief description of your current role, your technical capability and what it is about the Data-Sprint that appeals to you.
Places are limited and will be allocated based on skills and availability.

For further information about the event click here.

WICI Specialist Conference: Call for Abstracts Currently Open

The Waterloo Institute for Complexity & Innovation (WICI) has announced the theme of this year’s Specialist Conference: “Modelling complex urban environments”. The conference takes place on the 21st-22nd July 2018 at the University of Waterloo (Ontario, Canada) with the intention of bringing together researchers from multiple disciplines with experience and interest in modelling complex environments, from smart cities to urban planning.

The call for abstracts is now open with a deadline of 1st March 2018 (see below for specific guidelines).

One theme may be of particular interest, organised and led by Alison Heppenstall of the Leeds Institute for Data Analytics:

Integrating “big data” and “smart cities” data with urban modelling

This theme is designed to interrogate the role which increasingly available so-called ‘Big’ data has to play in altering the ways in which spatial, statistical and geographical analysis is conducted. It will look at the ways in which new methodologies have been fostered by the arrival of Big data in order to better understand how urban systems and infrastructure behave, with an emphasis on how they may be useful in the drive towards sustainability and efficiency. The goal of the session is to bring research together on the topic in order to produce a journal issue.

Within this theme, WICI are particularly interested in papers that engage with the following:

  • Integrating urban analytics and agent/individual-based modelling
  • Machine learning for urban analytics
  • Innovations in consumer data analytics for understanding urban systems
  • Real-time model calibration and data assimilation
  • Spatio-temporal data analysis
  • New data, case studies, demonstrators, and tools for urban analytics
  • Geographic data mining and visualisation
  • Frequentist and Bayesian approaches to modelling cities.

Paper proposals should include:

  • a short but descriptive title
  • a list of all contributing authors and their affiliations
  • an abstract of no more than 250 words
  • a list of 3-5 keywords
  • and an identification of the theme to which the proposal is submitted, if applicable.

Session proposals should include:

  • a short but descriptive title
  • a session abstract of no more than 250 words
  • a list of organizers and their affiliations
  • 3-5 keywords
  • and a list of potential paper contributions, following the format from above.

All proposals should be directed to Noelle Hakim ([email protected]). For full details on the conference and all proposed themes, see here.

Corporate Responsibility Research Conference: Call for Abstracts Currently Open

The 13th annual Corporate Responsibility & Research Conference takes place on the 11th-12th September 2018 with the theme “Engaging Business and Consumers for Sustainable Change”. The conference is being hosted by the Sustainability Research Institute (SRI) and Business and Organisations for Sustainable Societies research group (BOSS) at the University of Leeds and promises a rigorously dynamic environment in which to experiment with new ideas, test theories and challenge perceived norms in the fields of corporate responsibility and research ethics.

The call for abstracts is now open with a deadline of 30th April 2018 (see below for specific guidelines).

With this year’s focus on sustainability and change, the conference especially encourages papers that will challenge the status quo in corporate attitudes towards sustainability and poses the question: how do we get business and consumers truly engaged in addressing the grand challenges of the present ecological and sociocultural crisis? Two sub-themes may be of particular interest, chaired by researchers from our very own Leeds Institute for Data Analytics:

From the CRRC:

“The conference is looking for theoretically informed and practically relevant papers on business and consumer involvement for sustainable change. It welcomes contributions from different disciplines and fields of study, including literatures on corporate responsibility, corporate sustainability, sustainable consumption, sustainable development, business and society, business ethics, ethical consumption, sustainable entrepreneurship, and organisation and the environment.”

The requirements for initial abstracts are as follows:

  • Files should be sent in MS Word format, and the file name should be first author’s surname. Please include names, affiliations and contact details of all authors.
  • Please use a maximum of 500 words, answering the following questions:
  • Research Question: What is the research question that the submission aims to answer?
  • Theoretical Framework: What are the main concepts, models or theories used in the paper? Include 3-4 central references.
  • Method: Which method is used for the research work?
  • Findings: What are the main outcomes and results of the paper?
  • Which sub-theme is your paper aimed at or is it for the open call?
  • Abstracts should be emailed to [email protected]
Abstracts will be reviewed and selected by the scientific committee of the conference and authors will be notified of acceptance by 15th May when the conference registration opens.
For details on conference costs, see here.

Smart cities need to be more human, so we’re creating Sims-style virtual worlds

Nick Malleson, University of Leeds and Alison Heppenstall, University of Leeds

Huge quantities of networked sensors have appeared in cities across the world in recent years. These include cameras and sensors that count the number of passers by, devices to sense air quality, traffic flow detectors, and even bee hive monitors. There are also large amounts of information about how people use cities on social media services such as Twitter and foursquare.

Citizens are even making their own sensors – often using smart phones – to monitor their environment and share the information with others; for example, crowd-sourced noise pollution maps are becoming popular. All this information can be used by city leaders to create policies, with the aim of making cities “smarter” and more sustainable.

But these data only tell half the story. While sensors can provide a rich picture of the physical city, they don’t tell us much about the social city: how people move around and use the spaces, what they think about their cities, why they prefer some areas over others, and so on. For instance, while sensors can collect data from travel cards to measure how many people travel into a city every day, they cannot reveal the purpose of their trip, or their experience of the city.

With a better understanding of both social and physical data, researchers could begin to answer tough questions about why some communities end up segregated, how areas become deprived, and where traffic congestion is likely to occur.

Difficult questions

Determining how and why such patterns will emerge is extremely difficult. Traffic congestion happens as a result of personal decisions about how to get from A to B, based on factors such as your stage of life, your distance from the workplace, school or shops, your level of income, your knowledge of the roads and so on.

Congestion can build locally at pinch points, placing certain sections of the city’s transport networks under severe strain. This can lead to high levels of air pollution, which in turn has a severe impact on the health of the population. For city leaders, the big question is, which actions – imposing congestion charges, pedestrianising areas or improving local infrastructure – would lead to the biggest improvements in both congestion, and public health.

We know where – but why?
worldoflard/Flickr, CC BY-NC

The irony is, although modern technology has the power to collect vast amounts of data, it doesn’t always provide the means to analyse it. This means that scientists don’t have the tools they need to understand how different factors influence the way cities function and grow. Here, the technique of agent-based modelling could come to the rescue.

The simulated city

Agent-based modelling is a type of computer simulation, which models the behaviour of individual people as they move around and interact inside a virtual world. An agent-based model of a city could include virtual commuters, pedestrians, taxi drivers, shoppers and so on. Each of these individuals has their own characteristics and “rules”, programmed by researchers, based on theories and data about how people behave.

After combining vast urban datasets with an agent-based model of people, scientists will have the capacity to tweak and re-run the model, until they detect the phenomena they’re wanting to study – whether it’s traffic jams or social segregation. When they eventually get the model right, they’ll be able to look back on the characteristics and rules of their virtual citizens, to better understand why some of these problems emerge, and hopefully begin to find ways to resolve them.

For example, scientists might use urban data in an agent-based model to better understand the characteristics of the people who contribute to traffic jams – where they have come from, why they are travelling, what other modes of transport they might be willing to take. From there, they might be able to identify some effective ways of encouraging people to take different routes or modes of transport.

Seeing the future

Also, if the model works well in the present time, then it might be able to produce short-term forecasts. This would allow scientists to develop ways of reacting to changes in cities, in real time. Using live urban data to simulate the city in real-time could help to inform the managers of key services during periods of major disruption, such as severe weather, infrastructure failure or evacuation.

Using real-time data adds another layer of complexity. But fortunately, other scientific disciplines have also been making advances in this area. Over decades, the field of meteorology has developed cutting-edge mathematical methods, which allow their weather and climate models to respond to new weather data, as they arise in real time.

There’s a lot more work to be done before these methods from meteorology can be adapted to work for agent-based models of cities. But if they’re successful, these advancements will allow scientists to build city simulations which are driven by people – and not just the data they produce.

Nick Malleson, Associate Professor of Geographical Information Systems, University of Leeds and Alison Heppenstall, Professor in Geocomputation, University of Leeds

This article was originally published on The Conversation. Read the original article.

 

The University of Leeds currently has a number of PhD opportunities, in collaboration with The Alan Turing Institute, which are supervised by Prof Alison Heppenstall.

Project 1: Understanding the inner-workings of city-level agent-based models

Project 2: Uncovering hidden patterns and processes in social systems

Find out more and apply online.

Tube Creature – an interactive data map

CDRC researcher Oliver O’Brien recently launched ‘Tube Creature‘. This interactive map is based around London’s tube network. Having proved extremely popular across social media channels, we have republished Oliver’s blog post below, detailing the nuances of the map. Read on for a unique insight into ‘Tube Creature’!

Railway Station Numbers

The ORR publishes station entry/exit numbers on an annual basis, on a “best guess” basis, using ticket sales, gate information and modelling. The data is split by ticket type – full fare, reduced fare (off-peak tickets, tickets bought with railcards, advance tickets, child tickets etc) and season tickets. They make this data available as an Excel spreadsheet, so I’ve crunched it and have produced a couple of maps based on this data. I have also consolidated the total counts and ticket type counts data on CDRC Data.

The first shows the total numbers of entries/exits across the last year that the data is available for (2016-7), with a blended colour, with different red/green/blue strengths proportional to the % numbers for season tickets (red), full fare (blue) and reduced fare (green) entering/exiting National Rail services at that station. The area of the circle is proportional to the total numbers, combined across the ticket types. I’m using a minimum circle size, as otherwise some stations would be practically invisible on the map, as they can see days go by without any passengers – or trains.

Some interesting patterns – blues for many of the airport stations, where off peak tickets generally aren’t available, and most people don’t think to get advance tickets, such as to/from Stansted:

…and almost no-one pays full fare for some of the remotest stations:

Purples on the Welsh valleys lines, showing mainly commuters and peak time users:

Bright greens for stations serving major destinations where advance tickets are readily available, such Newcastle-upon-Tyne:

…popular tourist places, where many people will be visiting outside of the rush hours and at weekends, such as Oxford and Bicester Village retail outlet:

…and areas well covered by discounted travelcards, like Liverpool’s Merseyrail:

Reds where the season ticket holders dominate, such as Chelmsford and Colchester to the north-east of London:

Browns showing an “urban mix” of season ticket commuters and travelcard local journey makers, like in Stratford, London:

See this map on TubeCreature.

The second map looks at the change in numbers between 2015/6 and 2016/7 (a major methodological change means I cannot use data from earlier years, for a more complete time series). You can view the absolute numbers for both years, but what is of more interest is looking at the changes. The circle fill colour is the % change (with 100% green for a doubling of numbers and 100% for a halving of numbers). The area of the circle represents the absolute change in numbers. The border colour emphasises whether the change is an increase or decrease. Stations with little change will show up as small circles. The biggest trends are the new lines to Oxford via Bicester, and from Edinburgh to Tweedbank. In both cases, the lines were only open for part of the first year, so an increase would be expected even if the day-by-day numbers were flat:

Big drops show in parts of London – the Goblin line having been closed for much of 2016/7 due to a bungled overhead line installation:

There is also a big drop at Kensington Olympia’s however the source reports says this is due to a methodological change – i.e. it may not have actually been a significant drop at all. This is somewhat puzzling, as there are ticket gates at this station, so in/out numbers should be pretty solid, but it may be relating to due to many fewer people, than previously thought, transferring in-barrier to the sparse District line services at this station. When they do this, they are no longer considered to be National Rail passengers and so have “exited” the station here, from a National Rail perspective.

Most parts of the country see a steady increase (light greens):

The big exception being area served by Southern trains – with them being on strike for much of the second year, the fall in numbers in this region is almost universal:

See this map on TubeCreature. You can also download all the total counts and ticket type counts data from CDRC Data.

Launched now! Masters Research Dissertation Programme 2018

We’ve launched this year’s Masters Research Dissertation Programme.

A host of retailers, businesses and organisations have provided details of projects and are now inviting applications from potential Masters students to carry out research on a range of exciting topics.

The programme offers an excellent opportunity to work directly with an industrial partner and to link students’ research to important retail and ‘open data’ sources. The project titles are devised by retailers and are open to students from a wide range of disciplines. In previous years, we have worked with students from Geography (and GIS), Computer Science, Business Analytics, Economics and Statistics, but projects are by no means limited to these areas.

All students will be in with a chance to present their research at an academic conference (date tbc) with three projects selected to win prizes.

Further details about projects and the application process can be found here.

Co-funded PhD studentships available now!

ESRC UCL, Bloomsbury, East London Doctoral Training Partnership Co-funded PhD studentships at the Consumer Data Research Centre (CDRC)

The Consumer Data Research Centre have two co-funded PhD studentships in quantitative social science based in UCL’s Department of Geography. The awards will be administered through the UBEL Doctoral Training Partnership. Projects are available working with ARUP and Kantar Worldpanel, commencing September 2018.

These awards are open to applicants with backgrounds in quantitative social science and related disciplines, such as geography, social statistics, political science, economics, applied mathematics, computer science, planning, psychology or sociology. Students will be expected to work with consumer data as part of an exciting multidisciplinary research centre.

The studentships will cover

  • Tuition fees per year – for either three years (Ph.D. only) or 1+3 years (including a preparatory year’s Masters study in a quantitative social science course).
  • Annual maintenance stipend full-time: the stipend for 2017/18 was £16,553.

Further details about both studentships are available here. Alternatively please contact Sarah Sheppard [email protected] for further information

Building your skills set; building your future

The CDRC offers a range of training courses aimed at enhancing capacity in data analytics and data visualisation methods.  We have a number of courses coming up over the next few months in Leeds and London. If you’re looking to grow your skills set in these areas, consider booking now as places are filling fast.

Tableau Workshop

22nd February 2018 @ 9:30 am – 4:30 pm
Leeds Institute for Data Analytics, University of Leeds 
Fancy learning about data visualisation best practices and receiving hands-on training delivered by Tableau experts? Then this is the course for you with its mix of practical demonstrations and instruction. (Please note that only emails ending .ac.uk can be taken into account for registration on this course).

Introduction to ArcGIS

19th March 2018 @ 9:30 am – 4:30 pm
LIDA, University of Leeds
This course provides an introduction to Geographical Information Systems (GIS) using ESRI’s ArcGIS version 10.2 software. It will give you the opportunity to familiarise yourself with using and navigating the software, as well as focussing on the skills of data entry, data manipulation, editing, analysis and mapping.

Introduction to R

16th April 2018 @ 1:00 pm – 4:00 pm
Leeds Institute for Data Analytics, University of Leeds 
Always wanted to learn about the programming language R? During the course you will learn about the benefits of R, how R handles different data types, and how you can begin to use R to solve complex data science, machine learning and statistical problems.

Introduction to Spatial Data & Using R as a GIS – London

23rd April 2018 @ 10:00 am – 4:30 pm
University of Liverpool (London Campus)
The course will cover an introduction to R, how to load and manage spatial data and how to create maps using R and RStudio. We will show you appropriate ways of using classifications for choropleth maps, using loops in R to create multiple maps and some basic spatial analysis.

Confident Spatial Analysis and Statistics in R & GeoDa – London

24th April 2018 @ 10:00 am – 4:30 pm
University of Liverpool (London Campus)
In this course you will cover how to prepare and analyse spatial data in RStudio & GeoDa. You will also use RStudio to perform spatial overlay techniques (such as union, intersection and buffers). By the end of the course you will understand how RStudio manages spatial data and be able to use it for a range of spatial analyses.
Keen to find out more or book? Follow this link to access our training page.