Skip to content

A R package to work with ACS data from {tidycensus}

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

elipousson/getACS

Repository files navigation

getACS

Lifecycle: experimental License: MIT Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

The goal of getACS is to make it easier to work with American Community Survey data from the tidycensus package by Kyle Walker and others.

This package includes:

  • Functions that extend tidycensus::get_acs() to support multiple tables, geographies, or years
  • Functions for creating formatted tables from ACS data using the gt package

As of April 2024, this package uses a development version of {tigris}, available at https://github.com/elipousson/tigris.

Installation

You can install the development version of getACS from GitHub with:

# install.packages("pak")
pak::pkg_install("elipousson/getACS")

Usage

library(getACS)
library(gt)
library(ggplot2)

The main feature of {getACS} is support for returning multiple tables, geographies, and years.

acs_data <- get_acs_geographies(
  geography = c("county", "state"),
  county = "Baltimore city",
  state = "MD",
  table = "B08134",
  year = 2022,
  quiet = TRUE
)

The package also includes utility functions for filtering data and selecting columns to support the creation of tables using the {gt} package:

tbl_data <- filter_acs(acs_data, indent == 1, line_number <= 10)
tbl_data <- select_acs(tbl_data)

commute_tbl <- gt_acs(
  tbl_data,
  groupname_col = "NAME",
  column_title_label = "Commute time",
  table = "B08134"
)

as_raw_html(commute_tbl)
Commute time Est. % share
Baltimore city, Maryland
Less than 10 minutes 16,140 ± 1,096 7% ± 0%
10 to 14 minutes 20,798 ± 1,312 9% ± 1%
15 to 19 minutes 36,667 ± 1,740 16% ± 1%
20 to 24 minutes 39,803 ± 1,834 17% ± 1%
25 to 29 minutes 16,404 ± 970 7% ± 0%
30 to 34 minutes 40,744 ± 1,695 17% ± 1%
35 to 44 minutes 16,880 ± 1,160 7% ± 0%
45 to 59 minutes 20,318 ± 1,296 9% ± 1%
60 or more minutes 26,662 ± 1,207 11% ± 0%
Maryland
Less than 10 minutes 203,738 ± 3,511 8% ± 0%
10 to 14 minutes 255,052 ± 4,240 10% ± 0%
15 to 19 minutes 333,717 ± 5,269 13% ± 0%
20 to 24 minutes 342,189 ± 4,777 13% ± 0%
25 to 29 minutes 177,597 ± 3,129 7% ± 0%
30 to 34 minutes 400,919 ± 5,980 15% ± 0%
35 to 44 minutes 249,413 ± 4,443 9% ± 0%
45 to 59 minutes 312,390 ± 4,394 12% ± 0%
60 or more minutes 371,252 ± 4,828 14% ± 0%
Source: 2018-2022 ACS 5-year Estimates, Table B08134.

The gt_acs_compare() function also allows side-by-side comparison of geographies:

commute_tbl_compare <- gt_acs_compare(
  data = tbl_data,
  id_cols = "column_title",
  column_title_label = "Commute time",
  table = "B08134"
)

as_raw_html(commute_tbl_compare)
Commute time Baltimore city, Maryland Maryland
Est. % share Est. % share
Less than 10 minutes 16,140 ± 1,096 7% ± 0% 203,738 ± 3,511 8% ± 0%
10 to 14 minutes 20,798 ± 1,312 9% ± 1% 255,052 ± 4,240 10% ± 0%
15 to 19 minutes 36,667 ± 1,740 16% ± 1% 333,717 ± 5,269 13% ± 0%
20 to 24 minutes 39,803 ± 1,834 17% ± 1% 342,189 ± 4,777 13% ± 0%
25 to 29 minutes 16,404 ± 970 7% ± 0% 177,597 ± 3,129 7% ± 0%
30 to 34 minutes 40,744 ± 1,695 17% ± 1% 400,919 ± 5,980 15% ± 0%
35 to 44 minutes 16,880 ± 1,160 7% ± 0% 249,413 ± 4,443 9% ± 0%
45 to 59 minutes 20,318 ± 1,296 9% ± 1% 312,390 ± 4,394 12% ± 0%
60 or more minutes 26,662 ± 1,207 11% ± 0% 371,252 ± 4,828 14% ± 0%
Source: 2018-2022 ACS 5-year Estimates, Table B08134.

gt_acs_compare_vars() is a variant on gt_acs_compare() where the default values support comparisons with values in columns and geographical areas in rows:

commute_tbl_compare_vars <- acs_data |>
  filter_acs(indent == 1, line_number > 10) |>
  gt_acs_compare_vars(
    table = acs_data$table_id
  )

as_raw_html(commute_tbl_compare_vars)
NAME Car, truck, or van Public transportation (excluding taxicab) Walked Taxicab, motorcycle, bicycle, or other means
Baltimore city, Maryland 176,543 ± 2,817 34,640 ± 1,637 14,954 ± 1,007 8,279 ± 901
Maryland 2,357,924 ± 11,085 171,785 ± 3,655 59,507 ± 1,858 57,051 ± 2,213
Source: 2018-2022 ACS 5-year Estimates, Table B08134.

The package also includes several functions to support creating plots with the {ggplot2} package including geom_acs_col() and labs_acs_survey():

plot_data <- acs_data |>
  filter_acs(indent == 1, line_number > 10) |>
  select_acs() |>
  fmt_acs_county(state = "Maryland")

plot_data |>
  ggplot() +
  geom_acs_col(
    fill = "NAME",
    position = "dodge",
    color = NA,
    alpha = 0.75,
    perc = TRUE,
    errorbar_params = list(position = "dodge", linewidth = 0.25)
  ) +
  scale_y_discrete("Means of transportation to work") +
  scale_fill_viridis_d("Geography") +
  labs_acs_survey(
    .data = acs_data
  ) +
  theme_minimal()

The geom_acs_col() function calls geom_acs_errorbar() (passing the errorbar_params argument as additional parameters) and scale_x_acs() or scale_y_acs() (depending on whether orientation = "y" or the default value of NA).

For more information on working with Census data in R read the book Analyzing US Census Data: Methods, Maps, and Models in R (February 2023).

Related projects

Related R packages and analysis projects

  • {easycensus}: Quickly Extract and Marginalize U.S. Census Tables
  • {cwi}: Functions to speed up and standardize Census ACS data analysis for multiple staff people at DataHaven, preview trends and patterns, and get data in more layperson-friendly
  • {camiller}: A set of convenience functions, functions for working with ACS data via {tidycensus}
  • {psrccensus}: A set of tools developed for PSRC (Puget Sound Regional Council) staff to pull, process, and visualize Census Data for geographies in the Central Puget Sound Region.
  • {CTPPr}: A R package for loading and working with the US Census CTPP survey data.
  • {lehdr}: a package to grab LEHD data in support of city and regional planning economic and transportation analysis
  • {mapreliability}: A R package for map classification reliability calculator
  • Studying Neighborhoods With Uncertain Census Data: Code to create and visualize demographic clusters for the US with data from the American Community Survey

Related Python libraries

  • census-data-aggregator: A Python library from the L.A. Times data desk to help “combine U.S. census data responsibly”
  • census-table-metadata: Tools for generating metadata about tables and fields in a Census release based on sequence lookup and table shell files. (Note: the pre-computed data from this repository is used to label ACS data by label_acs_metadata())

About

A R package to work with ACS data from {tidycensus}

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published