FAERS-DataPreprocesser

This repository stores some script tools to cleanse the FAERS ASCII data. We use this dataset to build iADRs (http://iadr.csie.nuk.edu.tw/), an online web-based analytical system for detecting and analyzing suspected signals of adverse drug reactions and drug-interactions.

Our changes to the original data

We process not only FAERS data files but also the older AERS data. All records in these files are delimitated by newline (\n), and attributes by dollar sign ($), as illustrated in the following example snapshot. However, we found two peculiar cases that need special care, i.e., newline character in a record and abnormal attribute delimiter.

Newline character in a record

Some records contain newline characters (\n), like drug name or PT value. We guess FAERS did not examine every end-user's report. This character would wrongly divide a single record into two rows. As such, we checked each record to make sure the correct number of attributes, and resolved those with incorrect newline characters.

Abnormal attribute delimiter

According to FAERS data format, the first row defines the names of attributes. Consider the following example, which contains two attributes (PRIMARYID and PT), so every row should have two values delimitated by only one delimiter ($). But in the older AERS data (04Q1~12Q3), all records except those in the INDI table are ended by delimiter ($). This causes inconsistence, so we delete the extra dollar signs to make the older version consistent with the new FAERS data.

Attribute changes

FAERS attributes have been changed several times, which are summarized in the following table. An empty cell represents absence of the attribute in that time. For example, CASEVERSION was not introduced till 2012Q4, so it is absent before that time.

This summarization also indicates the evolution of attribute names, which are highlighted with red color. For example, CASEID has been changed from CASE since 12Q3 and PRIMARYID changed from ISR.

For those attributes that are still “active” in the current release, we always adopt the newest name, disregarding the time the attribute is introduced or changed, while for those “inactive” attributes, i.e., they are no longer used by FAERS, we still keep them but filled with missing values.

Another noticed issue is some attribute names are collided with SQL keywords. We append a baseline to distinguish them from SQL keywords. For example, the "ROUTE" attribute in the DRUG table is replaced by "ROUTE_":

You can find a database's meta data in scripts/metadata.py

New introduced attributes

We introduced several new attributes in the DEMO and DRUG tables.

DEMO (DEMOGRAPHIC)
- -WT_KG: This weight attribute is calculated from WT and WT_COD, with unit in KG
- -AGE_TYPE: A discretization of AGE and AGE_COD attribute into 10 tags, based on the Age group in MeSH. You can check detail in this table.
DRUG
- RXCUI: We transform the DRUGNAME attribute into rxcui code.

Directory Structure

Each quarter directory contains 7 files, including DEMO (DEMOGRAPHIC), DRUG (DRUG), REAC (REACTION), OUTC (OUTCOME), RPSR (REPORT), THER (THERAPY), and INDI (INDICATIONS). For example,

.\data
    |>2004Q1
        |>DEMO04Q1.TXT
        |>DRUG04Q1.TXT
        |> ...
    |>2004Q2
    |>2004Q3
    |> ...

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FAERS-DataPreprocesser

Our changes to the original data

Newline character in a record

Abnormal attribute delimiter

Attribute changes

New introduced attributes

Directory Structure

About

Releases

Packages

Languages

Phate334/FAERS-DataPreprocesser

Folders and files

Latest commit

History

Repository files navigation

FAERS-DataPreprocesser

Our changes to the original data

Newline character in a record

Abnormal attribute delimiter

Attribute changes

New introduced attributes

Directory Structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages