Home » News » Diverse data scientists make for more representative data science 

Diverse data scientists make for more representative data science 

CDRC is a key funder of the LIDA Data Scientist Development Programme (DSDP) which gives early career data scientists the opportunity to use real-world data to solve real-world data challenges. By contributing to projects through funding, data and expertise, CDRC meets its objective of creating highly skilled data scientists with an understanding and ethos for open research, open methods and the current landscape of data science. In a recent case study about the Programme’s award-winning work on building a research culture which promotes Equality, Diversity and Inclusion, CDRC’s Kylie Norman discusses how more representative recruitment in data science leads to more representative data science insights.  

The DSDP’s mission statement is ‘data science for public good’, and this mission also underpins the ethos of CDRC in creating innovative research which has a clear social good function, and which breaks down barriers, whether geographic, socio-economic, between subject disciplines, or between the advantaged and the disadvantaged. For example, this year’s cohort of data scientists are working with CDRC researchers: to improve the representation of accessible pedestrian networks in OpenStreetMaps; to reduce waste in supermarket supply chain stock flow; and to better understand the geodemographic factors affecting urgent cancer referrals in the NHS.

Through its three pillars of practice – research innovation, building data science capacity, and sharing data for collaborative research through its Data Service – CDRC is able to support the DSDP, not only through funding projects for social good, but also by providing the in-house and stakeholder expertise to build the essential skills of an early career data scientist. This is owing to CDRC’s shared commitment to building resilience, robust methodologies and their understanding of ways of working in its early career data scientists. During their induction period, each cohort of LIDA data scientists receives valuable contact time with CDRC staff through lectures, seminars and workshops on hard and soft skills ranging from: reproducible coding practises and data management; GIS and spatial analysis for beginners; how to conduct data science for the greatest engagement and impact; how to work with partners; understanding their own ways of working (WoW) and the WoW of those in their teams; and what pitfalls are inherent in the practice of data science; as well as, and perhaps most importantly, how to fail well. 

We all want to do good work, but deep down, I believe we also want our work to have meaning and to drive positive change.”  

Eric Wanjau Muriithi, LIDA Data Scientist 2021-22 

CDRC has seen the benefit of this holistic approach to developing a diversity of data science talent in more representative data insights. For example, Simon Leech, data scientist 2020-21, worked with local authorities and other stakeholders on CDRC Local Data Spaces to determine the optimal locations for COVID-19 testing sites in Liverpool, based on analyses of geodemographic factors affecting susceptibility to the virus.

It has also been able to better visualise and tell the data story of the gaps in provision of government support services to those hit hardest by the cost of living crisis, e.g. in Alex Dalton and Tom Albone’s Free School Meals Uptake blog series. Ifeanyi Chukwu’s work with CDRC’s Nik Lomax on the Impact of COVID-19 on Cancer Referrals, concluded that considering geodemographic contexts “helps to identify the segments of patients who were most vulnerable and thus may require more attention in receiving faster cancer referrals and improved prognosis.” The more representative our data scientists, the more representative the data science.

Three of the Programme’s data scientists working on CDRC research projects, Alex Dalton, Diogo Ann Onuselogu and Rosalind Martin, were nominated for the University’s Engaged for Impact Awards 2022, with two winning in their award categories, and one coming in runner-up position.  

In a research culture which asks the question, “but why?”, “who do these census data overlook?”, or “what biases are inherent in these data?”, data insights are bound to be more representative because the data scientists asking those questions often have first-hand experience of falling into the so-called ‘Data Gap’. Read on to find out more about how LIDA is improving diversity and representation in its DSDP, and how CDRC is supporting this work.