ukcensus is an R package designed to simplify the retrieval of Census 2021 data from England and Wales. It allows you to create custom datasets of the census, outputting them to tibble format.
You can install the development version of ukcensus using the devtools
package:
# Install devtools if not already installed
if (!require(devtools)) {
install.packages("devtools")
}
# Install ukcensus from GitHub
devtools::install_github("dataknit/ukcensus")
ukcensus is built around the create_custom_dataset()
function. For
example, to create a dataset of the age of all usual residents by MSOA,
you could write:
library(ukcensus)
create_custom_dataset("msoa", "resident_age_6a") |>
head()
#> 7264 out of 7264 areas available
#> # A tibble: 6 × 3
#> msoa resident_age_6a n
#> <chr> <chr> <int>
#> 1 City of London 001 Aged 15 years and under 579
#> 2 City of London 001 Aged 16 to 24 years 1149
#> 3 City of London 001 Aged 25 to 34 years 2215
#> 4 City of London 001 Aged 35 to 49 years 1817
#> 5 City of London 001 Aged 50 to 64 years 1617
#> 6 City of London 001 Aged 65 years and over 1206
But how do we know what to put into this query? That’s where the
get_available_
set of functions come in handy. You can use them to get
available:
- populations: (e.g. households and usual residents)
- area types: the level of geographic specificity (e.g. country, Local Authority and LSOA)
- areas: what areas are available for a given area type
- variables: what sociodemographic variables (e.g. age, English language proficiency, disability status) data is available for
- categorisations: given a variable, how to divide the values of it (e.g. age into 101 categories or 2)
Each function outputs the available options as a tibble.
get_available_variables()
#> # A tibble: 44 × 5
#> id label description total_count quality_statement_text
#> <chr> <chr> <chr> <int> <chr>
#> 1 activity_last_week Econ… "This vari… 8 "As Census 2021 was d…
#> 2 age_arrival_uk_23a Age … "The date … 23 ""
#> 3 alternative_address_ind… Seco… "An addres… 3 "The true number of p…
#> 4 country_of_birth_190a Coun… "The count… 190 ""
#> 5 country_of_birth_60a Coun… "The count… 60 ""
#> 6 disability Disa… "People wh… 5 ""
#> 7 economic_activity_statu… Econ… "People ag… 12 "As Census 2021 was d…
#> 8 english_proficiency Prof… "How well … 6 ""
#> 9 ethnic_group_tb_20b Ethn… "The ethni… 20 ""
#> 10 has_ever_worked Empl… "Classifie… 4 "As Census 2021 was d…
#> # ℹ 34 more rows
A typical workflow is to go through the get_available_
functions to
determine what values you will put in create_custom_dataset
.
We welcome contributions to the ukcensus package. If you have suggestions, bug reports, or want to contribute code, please open an issue or submit a pull request on the GitHub repository.
The ukcensus package is open-source and distributed under the MIT License. See the LICENSE file for more details.
The data accessed through the ukcensus package is sourced from the Office for National Statistics (ONS) of England and Wales. This data is made available through the Open Government License (OGL).
The ukcensus package is made possible through England and Wales’ Office for National Statistics (ONS) work on designing and collecting the census and making it available through an open API. This package uses the Create a Custom Dataset functionality.