Home » Transport Data Science with R

Transport Data Science with R

Date(s) - 05/04/2019
9:00 am - 4:30 pm

Categories No Categories

This course teaches two skill-sets that are fundamental in modern transport research: programming and data analytics. Combining these enables powerful transport planning and analysis workflows for tackling a wide range of problems, including:

  • How to find, download and import a range of transport datasets?
  • How to develop automated and reproducible transport planning workflows?
  • How can increasingly available datasets on air quality, traffic and active travel be used to inform policy?
  • How to visualise results in an attractive and potentially on-line and interactive manner?

This course will provide tools, code, data and, above all, face-to-face teaching to answer these questions and more, with the statistical programming language R. The data science approach opens a world of possibilities for generating insight from your transport datasets. The course is suitable for researchers in the public sector, academia and industry.

Learning objectives

By the end of the course you will be able to:

  • Find, download and import a variety of transport datasets, including from OpenStreetMap and government data portals
  • Work with, analyse and model transport data with spatial, temporal and demographic attributes
  • Work with air polution data in R and compare with transport behaviours
  • Generate and analyse route networks for transport planning with reference to:
    • Origin-destination (OD) data
    • Geographic desire lines
    • Route allocation using different routing services
    • Route network generation and analysis

Course tutors

Robin Lovelace is a researcher at the Leeds Institute for Transport Studies (ITS) and the Leeds Institute for Data Analytics (LIDA). Robin has many years of experience of using R for academic research and has taught numerous R courses at all levels. He has developed popular R resources including the popular books Efficient R Programming (Gillespie and Lovelace 2016), Spatial Microsimulation with R (Lovelace and Dumont 2016), and Geocomputation with R (Lovelace et al. 2019).

These skills have been applied on a number of projects with real-world applications, including the Propensity to Cycle Tool, a nationally scalable interactive online mapping application, and the stplanr package.

James Tate is a vehicle emissions and air quality expert focussing on the impacts of road transport on the environment. He has developed and deployed new approaches to survey and model the emission performance of the UK/ EU road transport fleet. James has been using R as the primary tool in his data analysis workflow for a decade and has developed popular modules teaching R to Master’s students in ITS.


Prior experience with transport datasets is a prerequisite for the course. Attendees are expected to:

  • Be comfortable with the use of R, using it for everyday data analysis tasks (you will find DataCamp’s free Introduction to R easy)
  • Have experience with transport datasets and understand their structure (you will be familiar with the contents of the Transport chapter in Geocomputation with R)

Participants are expected to brush-up on their knowledge before the course, for example by completing the exercises linked-to in the bullet points above.

Computers with RStudio installed will be available for course attendees. However, for maximum benefit, we recommend participants bring their own laptops, with a recent version of R installed (3.5.0 or later). Steps to set-up a suitable R/RStudio environment are described in sections 2.3 and 2.5 of the book Efficient R Programming. The following packages should be installed prior to attending the course:



  • Registration and refreshments (09:00 – 09:20)
  • Getting set-up in the cluster (09:20 – 09:30)
  • Finding, downloading, importing transport data (09:30 – 11:00)
    • An overview of data portals
    • Origin-destination data
    • OpenStreetMap data
    • Other data sources

11:00 – 11:10 Coffee break

  • Working with spatio-temporal data (11:10 – 12:30)
    • Introduction to STATS19
    • Temporal analysis
    • Spatial analysis
    • Analysis and modelling

LUNCH: 12:30 – 13:30

  • Traffic data and pollution analysis with R (13:30 – 15:30, delivered by Dr James Tate)
    • An introduction to the openair package
    • Traffic count data
    • Meteorological data
    • Air pollution data: daily, weekly and seasonal variability
    • Visualising air pollution data and next steps

15:30 – 15:45 Refreshments

  • From desire lines to route networks (15:45 – 16:45)
    • Handling OD data
    • Creating ‘desire lines’ from OD and zone data
    • Route allocation and route network creation
    • Route network analysis (comparing with other datasets)
  • Discussion and applying the methods to your data (16:00 onwards)

Background reading

It would be useful if participants could acquaint themselves with the following resources.

  • Efficient R Programming: (ERP for short, with section numbers linked e.g. ERP 1.5.2) is a book and online resource (at github.io/efficientR) on using R effectively (Gillespie and Lovelace 2016).
  • Introducing stplanr: an introductory vignette on stplanr, accessible via the following command (assuming stplanr is installed):


  • R for Data Science (R4DS): A book and online resource we use to teach R objects (also of wider interest): http://r4ds.had.co.nz

Further information

The course will be held in the Leeds Institute for Data Analytics, computer cluster 11.06. It is open to students, academic staff and external delegates. Please note the fee includes learning materials, lunch and refreshments during the course.

The course is also available as bespoke or in-company training.

For enquiries please contact Kylie Norman.


Early bird prices (valid until 1st March)

Student: £200
Academic, public sector and charitable sector: £300
External: £400

Price (valid 1st March – 3rd April)

Student: £250
Academic, public sector and charitable sector: £350
External: £450