Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obfuscation of true zeros #21

Closed
meghutch opened this issue Feb 5, 2021 · 2 comments
Closed

Obfuscation of true zeros #21

meghutch opened this issue Feb 5, 2021 · 2 comments

Comments

@meghutch
Copy link

meghutch commented Feb 5, 2021

The HHS dataset is such a nice resource along with the FAQ here. However, I was wondering how obfuscation is applied. Is obfuscation only applied for counts 1-3? Is it possible that counts of 0 are ever obfuscated? Do sites themselves determine what should be considered obfuscated, or does this process happen automatically once data is aggregated?

Thank you - Meg

@ftrotter
Copy link
Contributor

ftrotter commented Mar 8, 2021

There is one standard obfuscation that is applied as a single consistent rule as data is released. The data that comes from the hospitals has a specific number inside.

Generally, there is a tremendous amount of focus on ensuring that single patients are very hard to identify in the data releases (not to say its impossible, but it's close) while ensuring that the data is still useful to analysts. If you replace 1-3 with '1' because you know it was at least one, it will introduce a slight underestimation bias into your analysis... or you could go with 2 or 3.. but and then have another simply bias introduced.

If you wanted to work to remove as much bias as possible, then you want to estimate a '1' for facilities/regions/etc where low scores and 'true zeros' are common, and then estimate a '3' for facilities/regions/etc where there was lots of patients in previous weeks reporting and lots of patients in subsequent weeks reporting, etc etc.

Whether a specific datasets has 'true zeros' should always be clarified in the data documentation that attends the dataset... and if it does not, open a new ticket here and we will run it down...

-FT

@meghutch
Copy link
Author

Thank you FT for the response! I could not find any information regarding whether or not the dataset has "true zeros". It may be helpful for other researchers if your helpful note here and information regarding true zeros is included in the documentation.

@ftrotter ftrotter changed the title Obfuscation Obfuscation of true zeros Dec 8, 2021
@ftrotter ftrotter closed this as completed Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants