Home » Research » Population, Housing and Infrastructure » How and why do General Practice registers and ONS population estimates for Leeds differ?

How and why do General Practice registers and ONS population estimates for Leeds differ?

In 2017, GP Registers for Leeds counted 60,000 more than the Office for National Statistics estimated to be living in Leeds. This project has assessed how these discrepancies occur and gave further understanding as to why they are occurring.

Project overview

Since the 2011 Census, data gathered concerning population estimates of Leeds through counts of people registered with GPs has largely differed with the population estimates obtained from Mid-Year Estimates (MYEs) published by the Office for National Statistics (ONS). The importance of large discrepancies across particular areas in Leeds has implications for many areas involving city planning such as health planning, transport planning, and election preparation. A single agreed version is highly desirable.

This project has used Geographical Information Systems and classification methods to assess where the discrepancies exist within Leeds and gave further understanding as to why these discrepancies are occurring, indicating potential recent changes in the population composition of Leeds which are unaccounted for by the MYEs

Data and methods

A classification was conducted of the UK, using variables derived from the 2011 Census outputs, to recognise demographic patterns across the UK and how these influence the disparity between population estimates from MYEs and GP registers.

Variables were selected to reflect themes including: Age, Ethnic Group, UK Migration and Social Grade. Highly correlated variables were removed to reduce multiple collinearity. Counts were covered to percentages for Lower Super Output Area (LSOA).

K-means was performed to produce 7 clusters of the data using the selected 10 variables as inputs. The city of London was not included in the final classification model due to the unique attributes that solely occur in London influencing other clusters. This optimal number of clusters for creating the final clustering was determined using the elbow method.

Geographical Information Systems were used to map cluster locations across the UK at LSOA level. GIS was also used to analyse patterns of difference percentage between population estimate counts across Leeds.


An optimum number of 7 clusters was found using k-means; these clusters were visualised using heatmaps of percentages of the variables. Each cluster displayed distinct characteristics of the population.

The classification of the UK presented here has highlighted a reoccurring pattern of higher GP counts occurring across the UK, which appear to be more pronounced in diverse clusters. This indicates that differences between population estimates is a wider problem occurring across the UK.

Leeds, however, is unique to the rest of the UK as it displays a higher frequency of LSOAs which contain demographics that could be driving the disparity between MYEs and GP registration counts, suggesting that ONS methods of collecting population estimates in certain areas require reviewing. As cluster distribution across Leeds reflects patterns of discrepancies of population estimates, this gives some indications that particular groups may have a larger influence over population estimates.

Map of Leeds - showing Geographic Clusters

Figure 1 – Geographic locations of clusters in Leeds

Map showing Outlier Percentage difference between the 2017 MYE and 2017 GP across the whole population of Leeds

Figure 2 – Outlier Percentage difference between the 2017 MYE and 2017 GP across the whole population of Leeds

Value of research

Understanding where the differences in population estimates occur is able to contribute to aiding a single agreed version of population estimates required for aiding city planning services.

Research Team

Rizwana Uddin, Leeds Institute for Data Analytics, University of Leeds
Dr Nik Lomax, CDRC and School of Geography, University of Leeds
Dr Nick Hood, CDRC and School of Geography, University of Leeds


Leeds City Council
The Alan Turing Institute

This project was undertaken as part of the LIDA Data Scientist Internship Programme.

This work was supported by Wave 1 of The UKRI Strategic Priorities Fund under the EPSRC Grant EP/T001569/1, particularly the “Digital Twins: Urban Analytics” theme within that grant & The Alan Turing Institute.