forked from kjhealy/fips-codes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
51 lines (33 loc) · 2.86 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# County and State FIPS codes
Kieran Healy (`@kjhealy`)
## Summary
- Three CSV files with some basic FIPS identifying information for US States (`state_fips_master.csv`), Counties (`county_fips_master.csv`), and both together (`state_and_county_fips_master.csv`). I got sick of constantly having to write code to match on one or other of these identifiers in order to merge data files (e.g. for maps). So this can serve as a basis for harmonizing files that use one, some, or some variant of these identifiers. For example, sometimes leading zeros are omitted in the FIPS, sometimes not; sometimes the FIPS is coded in data as one number, sometimes as a character-vector of digits, sometimes as two separate state and county numbers, and so on. The Census also has its own supra-state units (regions and divisions). These files make it easier to merge and match to data indexed in one or other of these ways.
```{r, echo=FALSE, message=FALSE}
county <- readr::read_csv("county_fips_master.csv")
state <- readr::read_csv("state_fips_master.csv")
state_and_county <- readr::read_csv("state_and_county_fips_master.csv")
```
## Files
The `county_fips_master.csv` file looks like this:
```{r, echo=FALSE, message=FALSE}
knitr::kable(head(county))
```
- `crosswalk` is just a concatenation of region, division, state, and county identifiers, sometimes useful for merging with files that do not include a full numeric FIPS code, but instead split it up by state and county.
The `state_fips_master.csv` file looks like this:
```{r, echo=FALSE, message=FALSE}
knitr::kable(head(state))
```
And the `state_and_county_fips_master.csv` file looks like this:
```{r, echo=FALSE, message=FALSE}
knitr::kable(head(state_and_county))
```
- This file includes the US overall FIPS code (00000) and each of the state codes as five digit numbers, in order. The country's name and the names of the states are in all-caps.
## Some Gotchas
When merging and filtering county-level data, there are a few things that might bite you, as they have bitten me.
- Beware of confusion that may arise when trying to merge FIPS encoded as a character (e.g. State code "04") and FIPS encoded as an integer (4).
- Counties or Parishes with "Saint" or "St" in their names are sometimes inconsistently spelled or abbreviated.
- Some places have "La" in their name. It's LaSalle Parish (LA), but La Salle County (TX), but LaSalle County (IL). These are sometimes written incorrectly.
- Doña Ana County, NM, has the only "ñ" in America's county names. It can get mangled in the character encoding of text files (especially old ones), or converted to a mere "n".
- It's "District of Columbia", not "Washington DC".
- The District of Columbia has a State-like FIPS code (11) *and* a County-like one (11001).
In any case, you are never trying to match on character strings when merging data that should have unique numerical identifiers. Right?