This repository contains code, data, and documentation for a project analyzing data from the OpenFDA API. The goal of the project is to understand the relationship between drug ingredients and the occurrence of adverse events.
This project focuses on data from the past ten years and examines the relationship between drug ingredients (both active and inactive) and the reporting of adverse events. The main objective is to understand if, and how, certain ingredients might increase the risk of adverse events.
-
Data Acquisition and Preprocessing: Extract data from the OpenFDA API and other relevant sources like the CDC or WHO if needed. Preprocess and clean the data to a suitable format for analysis.
-
Exploratory Data Analysis: Analyze the data to understand its characteristics, including the distribution of adverse events across different drugs. Identify any apparent relationships between ingredients and adverse events.
-
Ingredient-Event Relationship Analysis: Investigate the relationship between the presence of certain ingredients and the likelihood of an adverse event. This will involve statistical analyses or machine learning models to identify the most influential ingredients.
-
Adverse Event Prediction: Develop a machine learning model that predicts the likelihood of an adverse event based on a drug's ingredients.
-
Visualization and Communication: Communicate the findings through intuitive, interactive visualizations and dashboards using Tableau. This could involve visualizing the frequency of adverse events, the most risky ingredients, and the performance of the prediction model.
-
Documentation: Document all the analysis steps, methodologies, results, and conclusions. This includes both technical documentation (code, etc.) and non-technical documentation (interpretations, conclusions, etc.).
- Data Acquisition and Preprocessing: Python, OpenFDA API
- Data Analysis and Modeling: Python, Apache Spark, PySpark MLlib
- Data Visualization: Tableau
- Version Control: Git, GitHub
- Documentation: Jupyter Notebook
Instructions on setting up the project, including environment setup, data acquisition, and initial data analysis will be provided here.
This project is maintained by Jason Robinson. For any questions or concerns, please reach out at Email.