Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update data flow #20

Merged
merged 1 commit into from
Nov 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ Data and licenses should be acquired from [UMLS Terminology Services](https://ut
## Table of Contents
- [Download pre-made Dutch MedCAT models](#download-pre-made-dutch-medcat-models)
- [Folder structure](#folder-structure)
- [Output format](#output-format)
- [Data-flow](#data-flow)
- [Generate UMLS concept table](#generate-umls-concept-table)
- [1. Obtain license and download complete UMLS](#1-obtain-license-and-download-complete-umls)
- [2. Decompress and install MetamorphoSys](#2-decompress-and-install-metamorphosys)
Expand All @@ -39,11 +41,7 @@ dutch-medical-concepts
└───05_CustomChanges
```

## Generate UMLS concept table
![Data Flow](data-flow.png)

Output CSV format will look like this:

## Output format
| cui | name | ontologies | name_status | type_ids |
|----------|--------------------------|----------------------|-------------|----------|
| C0000001 | kanker | ONTOLOGY1\|ONTOLOGY2 | P | T001 |
Expand All @@ -59,6 +57,10 @@ See https://github.com/CogStack/MedCAT/tree/master/examples for a detailed expla

I'm not sure whether the UMLS license allows for publishing snippets of UMLS data for demonstration purposes, so this repository uses mock data in the examples.

## Data-flow
![Data-flow](data-flow.jpg)

## Generate UMLS concept table
### 1. Obtain license and download complete UMLS
To download UMLS, visit the [NIH National Library of Medicine website](https://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html). You'll have to apply for a license before you can download the files on https://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html. In the following description I downloaded `Full Release (umls-2022AB-full.zip)`. The advantage over `UMLS Metathesaurus Full Subset` is that the Full Release includes MetamorphoSys which makes it possible to create a subset of UMLS prior loading the data in a SQL database. This significantly decreases the required disk space and processing time.

Expand Down
Binary file added data-flow.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed data-flow.png
Binary file not shown.