KG-COVID19Publications

Publications about COVID-19 are rapidly growing at this moment. In order to provide an efficient and easy way for researchers to be kept updated and have an overview of the current publications, I created a knowledge graph for COVID-19 publications. The knowledge graph will be updated every week.

Until 20-04-2020, 299,652 triples (based on 12116 publications) are in the current KG.

Data resource

Publication dataset (xlsx file) is from Dimensions. The data file contains all relevant publications, datasets and clinical trials from Dimensions that are related to COVID-19. Please download the dataset from its original source (Dimensions). This repo will not provide any data files.
Dimensions is updated once every 24 hours, so the latest research can be viewed alongside existing information.
Content is identified in Dimensions with the search query:

PUBLICATION YEAR: 2020; FULL DATA SEARCH: "2019-nCoV" OR "COVID-19" OR “SARS-CoV-2” OR (("coronavirus" OR "corona virus") AND (Wuhan OR China))

Select key information of publications

The dataset of COVID-19 publications has many missing values. The following picture shows the completeness of the dataset (updated at 14-04-2020) using check_missing function in publicationsData.ipynb

We only keep the key information for converting xlsx file to RDF using publicationsData.ipynb
- Title
- DOI
- Authors
- Abstract
- Langauge
- Publication Type (Book, article, preprint)
- Publication Date
- Source title (the source name of article)
- Publisher
- Paper URL in Dimensions

Extract keywords from titles

To efficiently find relavant publications, keywords play an important role. Unfortunately, keywords were not collected in the Dimensions dataset. Therefore, I decided to extract keywords from titles.
Title is most direct and brief context telling the topics of the articles. Title is one of the most complete attributed (95%) in Dimensions dataset.
I used NLTK python tool to
- Lemmatize titles (treats,treating,treated --> treat)
- Tokenize words
- POS tagging (Noun, Verb, Adj)
So far, 92460 keywords have been extracted from 12116 publications titles.

Similarity of keywords

To find the links/relations among publications in an efficient way, I made use of the similarity among keywords. Gensim was applied to calculate the similarity between keywords.
For example:
- COVID-19 links to its similar keywords sars-cov-2, 2019-ncov, coronavirus
- Surgery links to operation, surgeon, surgical, treatment, precaution
- Fatality links to ascertainment, mortality, comorbidity
- Alzheimer links to memory, noncommunicable, parkinson, proteostasis

Convert CSV to RDF

Dublin Core Metadata Initiative Metadata Terms is mainly used as the vocabulary.
RML (rmlmapper.jar) is the mapping tool to convert CSV file to RDF. Please follow the Knowledge Graph Course Material provided by Institute of Data Science at Maastricht University (Github repo) to learn the convertion.
Converted RDF data are uploaded to GraphDB triple store.

Query publications using SPARQL

Example 1: Query the authors ordered by the number of their publications related to COVID-19

PREFIX pav: <http://purl.org/pav/> 
select ?author (count(distinct ?paper) as ?count)  where { 
	?paper pav:authoredBy ?author .
} GROUP BY ?author
ORDER BY DESC(?count)

Example 2: Query all COVID-19 publications related to diabetes and heart diseases

PREFIX covidPub: <http://covid19publication.org/keyword/>
PREFIX dc: <http://purl.org/dc/elements/1.1/> 
select ?paper ?title where { 
	?paper dc:subject covidPub:diabetes .
	?paper dc:subject covidPub:heart .
    	?paper dc:title ?title . 
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.ipynb_checkpoints		.ipynb_checkpoints
img		img
mapping		mapping
triples		triples
.gitignore		.gitignore
README.md		README.md
publicationsData.ipynb		publicationsData.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KG-COVID19Publications

Data resource

Select key information of publications

Extract keywords from titles

Similarity of keywords

Convert CSV to RDF

Query publications using SPARQL

About

Releases

Packages

Languages

sunchang0124/KG-COVID19Publications

Folders and files

Latest commit

History

Repository files navigation

KG-COVID19Publications

Data resource

Select key information of publications

Extract keywords from titles

Similarity of keywords

Convert CSV to RDF

Query publications using SPARQL

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages