Dashbirth - Data Shack 2024

Team Members Yuqin (Bailey) Bai, Luca Bottani, Sara Merengo, and Li Yao

Project Objectives This project proposes to utilize advanced language models to analyze postpartum surveys completed by thousands of mothers. The aim is to derive actionable insights that can further enhance the TeamBirth program, particularly focusing on reducing maternal mortality rates and addressing mistreatment and disparities in maternal health.

To deploy language models in summarizing and analyzing postpartum surveys from mothers who have participated in the TeamBirth program.
To identify areas where TeamBirth can be refined to further reduce maternal mortality and mistreatment.
To develop targeted interventions based on the insights gleaned from the survey analysis.

Patient Data Privacy

Following the Ethical Principles & Guidelines for Research Involving Human Subjects, maintaining the confidentiality of patient data is paramount. In our efforts to utilize advanced language models for analyzing surveys, we are committed to ensuring that no patient-related data is exposed to external APIs, including ChatGPT. We understand the risks associated with data privacy and have implemented robust solutions to safeguard information.

To address different data storage and usage needs while ensuring data privacy, we offer the following solutions:

Local Hosting

For a straightforward setup, we provide the option to locally host the dashboard. This setup includes all modules except for the open feedback and chatbot features.

It is highly suggested to store any patient-related data in a dedicated folder outside the Git repository. This practice prevents the accidental upload of confidential data to version control systems where it could be exposed to unauthorized access.

Setup Instructions

Ensure you have Docker installed on your system.
Have your data stored securely as suggested, outside of your project’s Git repository.
Follow the instructions in our Frontend Simple Container Setup to deploy locally.

Advanced Deployment with Google Cloud Platform

For enhanced security and control, we recommend deploying our solution using the Google Cloud Platform (GCP). This approach utilizes Google Cloud's robust security measures, including isolated storage and controlled access.

Why Choose Google Cloud Storage (GCS) and GCP Virtual Machine (VM)?

GCS buckets offer advanced security features that make it a preferable choice over options like Google Drive. GCS provides fine-grained access controls and is designed for high durability and availability.
GCP VM instances provide a secure environment for hosting our custom language models, eliminating the risks associated with using external APIs.

Proposed Solution

An interactive dashboard website that TeamBirth can use in order to visualize the data and share it to the hospitals.

Prerequisites

Have Docker installed

Frontend Simple Container

Store & config your data files in </absolute/path/to/your/data> following the instructions

Build & run container by running the commands with the absolute path to your data folder

cd src/frontend_simple

docker build -t frontend_simple -f Dockerfile .
docker run -d -p 5001:5000 -v </absolute/path/to/your/data>:/data frontend_simple

Go to page http://localhost:5001/

GCS Buckets

  ├── data_ma
  ├── data_nj
  ├── data_ok
  └── data_wa

Local Secrets Folder

Create a local src/secrets folder because we do not include any secure information in Git. Rename and add the GCP service account private key, data-server-account.json, to this folder.

Vector Database Container

The Vector DB container is designed to handle vector embeddings for different states, processing data from designated GCS buckets.

generate_embeddings.py - This script contains the primary logic for generating vector embeddings from data files.

--name (-n): Flag specifies the name of the GCS bucket from which the script should retrieve data. Default is set to WA.
--clean_up (-c): Flag controls whether the script should clean up (delete) the local metadata directories after processing is complete.

Navigate to Vector DB directory
```
cd src/vec_db
```
Config the Dockerfile for state selection: Modify the last CMD line to pass the --name flag. For example, replace CMD ["data_wa"] with CMD ["data_nj"] for NJ state, or CMD ["data_nj", "-c", "False"] to preserve metadata.
Build & run container
```
./docker-shell.sh
```

Frontend Chatbot Container

Change
Build & run container
```
docker-compose build
```

Retrival Container

Large Language Model Container

This container is configured to serve a Large Language Model (LLM), specifically Llama2-7B-chat, on a Google Cloud Platform (GCP) Virtual Machine (VM). We opted to deploy the LLM in a cloud environment to address data privacy concerns and because hosting it locally is challenging. The model requires 15GB of memory and a T4 GPU to function optimally. Below, you will find instructions on how to set up a similar server on your own.

Build Docker Image

Navigate to LLM directory in your terminal and run the following command to build the Docker image:
```
cd src/llm_server

docker build -t teambirth-llm-server .
```
Tag the Docker Image

After the build is complete, tag your Docker image with the Google Cloud Registry path:
```
docker tag teambirth-llm-server gcr.io/<project_id>/teambirth-llm-server
```
Replace <project_id> with your Google Cloud project ID.
Authenticate Docker to GCR

Authenticate Docker to the Google Cloud Registry using the following command:
```
gcloud auth configure-docker
```
Push Docker Image to GCR

Push the Docker image to the Google Cloud Registry:
```
docker push gcr.io/<project_id>/teambirth-llm-server
```
Create a Google Cloud VM

Navigate to the Google Cloud Console and create a new Virtual Machine instance. Make sure to select the appropriate region, machine type, and other settings according to your requirements.

e.g. Region: us-east1, Zone: us-east1-c, GPU type: NVIDIA T4, Machine Type: n1-standard-4 (4 vCPU, 2 core, 15 GB memory), VM provisioning model: Spot (Save you some money but loss some stability)
Install Docker on the VM

SSH into your VM and install Docker:
```
sudo apt-get update
sudo apt-get install docker.io
```
Pull Docker Image from GCR

Pull the Docker image from the Google Cloud Registry to your VM:
```
docker pull gcr.io/<project_id>/teambirth-llm-server
```
Now, you need to add other packages and drivers to communicate with the GPUs.
Install GPU drivers on VM

Follow the intstruction in the Google Cloud website: https://cloud.google.com/compute/docs/gpus/install-drivers-gpu to install GPU drivers on the VM. You cannot use your GPUs without the driver. Verify the GPU driver install by this command:
```
sudo nvidia-smi
```
Download the NVIDIA CUDA Toolkit

Follow the instruction in the NVIDIA website to download the NVIDIA CUDA Toolkit (section 2.7) https://docs.nvidia.com/cuda/cuda-installation-guide-linux/
Run Docker Container on VM

Finally, run the Docker container on your VM:
```
docker run -d -p <host_port>:<container_port> gcr.io/<project_id>/teambirth-llm-server
```
Replace <host_port> and <container_port> with the ports you want to expose on the host and container, respectively.

Docker Cleanup

Make sure we do not have any running containers and clear up an unused images.

Run docker container ls
Stop any container that is running
Run docker system prune
Run docker image ls

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
documents		documents
images		images
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dashbirth - Data Shack 2024

Patient Data Privacy

Local Hosting

Advanced Deployment with Google Cloud Platform

Proposed Solution

Prerequisites

Frontend Simple Container

Local Secrets Folder

Vector Database Container

Frontend Chatbot Container

Retrival Container

Large Language Model Container

Docker Cleanup

About

Releases

Packages

Contributors 2

Languages

License

yuqinbailey/Data-Shack-TeamBirth-2024

Folders and files

Latest commit

History

Repository files navigation

Dashbirth - Data Shack 2024

Patient Data Privacy

Local Hosting

Advanced Deployment with Google Cloud Platform

Proposed Solution

Prerequisites

Frontend Simple Container

Local Secrets Folder

Vector Database Container

Frontend Chatbot Container

Retrival Container

Large Language Model Container

Docker Cleanup

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages