Skip to content

Commit

Permalink
Merge pull request #20 from umcu/update-data-flow
Browse files Browse the repository at this point in the history
Update data flow
  • Loading branch information
sandertan authored Nov 17, 2022
2 parents c55aa32 + 01a99a1 commit 5d105f7
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 5 deletions.
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ Data and licenses should be acquired from [UMLS Terminology Services](https://ut
## Table of Contents
- [Download pre-made Dutch MedCAT models](#download-pre-made-dutch-medcat-models)
- [Folder structure](#folder-structure)
- [Output format](#output-format)
- [Data-flow](#data-flow)
- [Generate UMLS concept table](#generate-umls-concept-table)
- [1. Obtain license and download complete UMLS](#1-obtain-license-and-download-complete-umls)
- [2. Decompress and install MetamorphoSys](#2-decompress-and-install-metamorphosys)
Expand All @@ -39,11 +41,7 @@ dutch-medical-concepts
└───05_CustomChanges
```

## Generate UMLS concept table
![Data Flow](data-flow.png)

Output CSV format will look like this:

## Output format
| cui | name | ontologies | name_status | type_ids |
|----------|--------------------------|----------------------|-------------|----------|
| C0000001 | kanker | ONTOLOGY1\|ONTOLOGY2 | P | T001 |
Expand All @@ -59,6 +57,10 @@ See https://github.com/CogStack/MedCAT/tree/master/examples for a detailed expla

I'm not sure whether the UMLS license allows for publishing snippets of UMLS data for demonstration purposes, so this repository uses mock data in the examples.

## Data-flow
![Data-flow](data-flow.jpg)

## Generate UMLS concept table
### 1. Obtain license and download complete UMLS
To download UMLS, visit the [NIH National Library of Medicine website](https://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html). You'll have to apply for a license before you can download the files on https://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html. In the following description I downloaded `Full Release (umls-2022AB-full.zip)`. The advantage over `UMLS Metathesaurus Full Subset` is that the Full Release includes MetamorphoSys which makes it possible to create a subset of UMLS prior loading the data in a SQL database. This significantly decreases the required disk space and processing time.

Expand Down
Binary file added data-flow.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed data-flow.png
Binary file not shown.

0 comments on commit 5d105f7

Please sign in to comment.