-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Obfuscation of true zeros #21
Comments
There is one standard obfuscation that is applied as a single consistent rule as data is released. The data that comes from the hospitals has a specific number inside. Generally, there is a tremendous amount of focus on ensuring that single patients are very hard to identify in the data releases (not to say its impossible, but it's close) while ensuring that the data is still useful to analysts. If you replace 1-3 with '1' because you know it was at least one, it will introduce a slight underestimation bias into your analysis... or you could go with 2 or 3.. but and then have another simply bias introduced. If you wanted to work to remove as much bias as possible, then you want to estimate a '1' for facilities/regions/etc where low scores and 'true zeros' are common, and then estimate a '3' for facilities/regions/etc where there was lots of patients in previous weeks reporting and lots of patients in subsequent weeks reporting, etc etc. Whether a specific datasets has 'true zeros' should always be clarified in the data documentation that attends the dataset... and if it does not, open a new ticket here and we will run it down... -FT |
Thank you FT for the response! I could not find any information regarding whether or not the dataset has "true zeros". It may be helpful for other researchers if your helpful note here and information regarding true zeros is included in the documentation. |
The HHS dataset is such a nice resource along with the FAQ here. However, I was wondering how obfuscation is applied. Is obfuscation only applied for counts 1-3? Is it possible that counts of 0 are ever obfuscated? Do sites themselves determine what should be considered obfuscated, or does this process happen automatically once data is aggregated?
Thank you - Meg
The text was updated successfully, but these errors were encountered: