Harnessing the Power of Census Data for Advanced Smart Data Research
Census Data and its Importance
Every decade, the UK conducts the Census, a meticulous process that collates data about the population's characteristics, their professions and their living environments. This article explores how this rich data repository enhances Smart Data Research.
For more information about the major challenges of harnessing consumer data to produce valid spatial representations of the population at large, we wote this paper a couple of years ago:
Lansley G, Cheshire C (2018) Challenges to Representing the Population from New Forms of Consumer Data. Geography Compass. https://doi.org/10.1111/gec3.12374
Open Census Geodemographic Classifications: Scaffolding for Smart Data Research
For many years, geodemographic classifications have been constructed from Census data. Collaborations between the Consumer Data Research Centre (CDRC), Office for National Statistics , and Greater London Authority have led to the creation of comprehensive UK and Greater London classification for 2011 and 2021.
A key use of geodemographics is that they can be appended to smart data to provide both national and local estimates for attributes of interest. For example, last month we launched the 2021 London Output Area Classification which was used to profile CDRC-ULO data from PDV. Pairing them with the LOAC enables detailed local-level maps to be created. An illustrative example is the geographic distribution showcasing preferences for Aldi versus Waitrose.
You can read all about how we built OAC and LOAC in our published papers:
Wyszomierski J, Longley PA, Singleton AD, Gale C, O'Brien O? (2023) A Neighbourhood Output Area Classification from the 2021 and 2022 UK Censuses. Geographical Journal. https://doi.org/10.1111/geoj.12550
领英推荐
Singleton, A. D., & Longley, P. (2015). The internal structure of?Greater?London: A comparison of national and regional geodemographic models.?Geo: Geography and Environment,?2(1), 69–87.?https://doi.org/10.1002/geo2.7
Census Data Extending the Value of Smart Data Through Linkage and Modelling
The Internet User Classification is a bespoke geodemographic created by CDRC-ULO that describes the geography of internet user behaviour. Although Census data aren’t an input to the classification, they are used in a machine learning model, which enables the attributes commercial smart data surveys to be estimated for all local areas. These are then input into the geodemographic model.
You can read all about how we built this classification in our paper:
Singleton, A., Alexiou, A., & Savani, R. (2020). Mapping the geodemographics of digital inequality in?Great?Britain:?An?integration of machine learning into small area estimation.?Computers, Environment and Urban Systems,?82, 101486.?https://doi.org/10.1016/j.compenvurbsys.2020.101486
Census Data for Testing the Assumptions of Smart Data
Smart Data are often collected for or generated by administrative purposes in public and private sector organisations. As Smart Data Research UK has shown, they have wide ranging purposes beyond the use for which they were originally collected. However, there needs to be robust ways of telling how representative they are of different groups of people or places. CDRC-ULO use Census data to provide such benchmarks, which feed into the metadata that are supplied with each analysis ready data product.
For example, recent research used Census data from the 2021 England and Wales Census to offer a foundational benchmark to understanding shifts in working from home patterns, allowing for the validation and calibration of smart data, such as Google Community Mobility Reports, to ensure their accuracy in reflecting WFH trends.
Chief Commercial Officer at Addresscloud
1 年Thanks for sharing Alex, mind cogs turning!
Freelance Analytics Consultant, Advisor and Interim Lead
1 年Super useful - thanks Alex!