-
Notifications
You must be signed in to change notification settings - Fork 2
COHD KP
Homepage Link: https://cohd.io/about.html
Columbia Open Health Data (COHD) provides access to counts and patient prevalence (i.e., prevalence from electronic health records) of conditions, procedures, drug exposures, and patient demographics, and the co-occurrence frequencies between them. Count and frequency data were derived from the Columbia University Irving Medical Center's OHDSI database including inpatient and outpatient data. Counts are the number of patients with the concept, e.g., diagnosed with a condition, exposed to a drug, or who had a procedure. Frequencies are the number of patients with the concept divided by the total number of patients in the dataset. Clinical concepts (e.g., conditions, procedures, drugs) are coded by their standard concept ID in the OMOP Common Data Model. To protect patient privacy, all concepts and pairs of concepts where the count ≤ 10 were excluded, and counts were randomized by the Poisson distribution.
Outpatient and inpatient EHR data extracted from Columbia University Irving Medical Center's clinical data warehouse
COHD provides the following association metrics and their statistical measures of significance captured inside of biolink:StudyResult
structures:
- Raw counts of each concept and concept pair co-occurrence -
biolink:ConceptCountAnalysisResult
- Chi-squared analysis (Bonferonni adjusted p-value) -
biolink:ChiSquaredAnalysisResult
- Relative frequency (99% confidence interval) -
biolink:RelativeFrequencyAnalysisResult
- Observed-expected frequency ratio (99% confidence interval) -
biolink:ObservedExpectedFrequencyAnalysisResult
Example values:Strength of association Condition Drug ln ratio 99% CI Strong positive type 2 diabetes metformin 2.570 2.554, 2.586 Weak or no association Sprain of knee pneumococcal polysaccharide vaccine 0.078 -0.094, 0.231 Negative birth isotretinoin -1.337 -2.590, -0.644
COHD contains the following data sets:
- 5-year non-hierarchical dataset: Includes clinical data from 2013-2017
- lifetime non-hierarchical dataset: Includes clinical data from all dates
- 5-year hierarchical dataset: Counts for each concept include patients from descendant concepts. Includes clinical data from 2013-2017.
- Temporal beta: Quantifies temporal relations between all concept pairs. Includes clinical data from all dates.
While the lifetime dataset captures a larger patient population and range of concepts, the 5-year dataset has better underlying data consistency. In the 5-year hierarchical data set, the counts for each concept include the patients from all descendant concepts. For example, the count for ibuprofen (ID 1177480) includes patients with Ibuprofen 600 MG Oral Tablet (ID 19019073 patients), Ibuprofen 400 MG Oral Tablet (ID 19019072), Ibuprofen 20 MG/ML Oral Suspension (ID 19019050), etc. The COHD KP automatically chooses the most appropriate COHD data set depending on the concepts being queried.
More details about the COHD dataset can be found in the Clinical Data Provider prototype Kick-off presentation PDF
Columbia Open Health Data for COVID-19 Research (COHD-COVID) is similar to COHD but adjusts the analysis and cohorts to facilitate COVID-19 research. COHD-COVID provides access to counts and visit prevalence (i.e., prevalence from electronic health records) of conditions, procedures, drug exposures, and the co-occurrence frequencies between them. Count and frequency data were derived from the Columbia University Irving Medical Center's OHDSI database including inpatient data. Counts are the number of visits with the concept, e.g., diagnosed with a condition, exposed to a drug, or a procedure was performed. Frequencies are the number of visits with the concept divided by the total number of visits in the dataset. Clinical concepts (e.g., conditions, procedures, drugs) are coded by their standard concept ID in the OMOP Common Data Model. To protect patient privacy, all concepts and pairs of concepts where the count ≤ 10 were excluded, and counts were randomized by the Poisson distribution.
Inpatient EHR data extracted from Columbia University Irving Medical Center's clinical data warehouse
The same association metrics described above are provided.
Datasets from three primary cohorts are available:
- COVID-19: Hospitalized patients aged 18 or older with a COVID-19 related condition diagnosis and/or a confirmed positive COVID-19 test during their hospitalization period or within the prior 21 days. Date range: March 1, 2020 to September 1, 2020. This cohort is also further stratified by sex (male and female) and age (adult: 18-64, senior: 65+).
- General inpatient: All hospitalized patients aged 18 or older. Date range: January 1, 2014 to December 31, 2019.
- Influenza: Hospitalized patients aged 18 or older who had at least one occurrence of influenza conditions or pre-coordinated positive measurements or positive influenza testing in the prior 21 days or during their hospitalization period. Date range: January 1, 2014 to December 31, 2019.
Mode of Access
-
Columbia Open Health Data
- Translator Reasoner API, registered on SmartAPI at http://cohd.smart-api.info
- Bespoke OpenAPI at http://cohd.io/api
-
Columbia Open Health Data for COVID-19 Research
- Translator Reasoner API, registered on SmartAPI at http://cohdcovid.smart-api.info
- Bespoke OpenAPI at http://covid.cohd.io/api
Use Cases
- Notebook Examples:
Knowledge Sources Accessed
Source Code
- COHD API: https://github.com/WengLab-InformaticsResearch/cohd_api
- COHD-COVID API (branch of COHD API): https://github.com/WengLab-InformaticsResearch/cohd_api/tree/cohd_covid
References
- Ta CN, Dumontier M, Hripcsak G, Tatonetti NP, Weng C. Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records. Scientific Data. 5:180273; 2018. doi:10.1038/sdata.2018.273
- Lee J, Kim JH, Liu C, Hripcsak G, Natarajan K, Ta CN, Weng C. Columbia Open Health Data for COVID-19 Research: Database Analysis. Journal of Medical Internet Research. 23(9):e31122; 2021. doi:10.2196/31122
External Documentation
- https://github.com/WengLab-InformaticsResearch/cohd_api
- https://cohd.io/about.html
- https://covid.cohd.io/about.html
- https://research.columbia.edu/covid/devices/openhealth
Contact
- The COHD KP is maintained by the Clinical Data Knowledge Provider
- Issues can be consulted and created in the
cohd_api
GitHub repository: https://github.com/WengLab-InformaticsResearch/cohd_api/issues - Contact Casey Ta for other questions