Skip to content

Running PDCi2b2

Jean Louis Raisaro edited this page Sep 13, 2017 · 20 revisions

Running PDCi2b2

PDCi2b2 is a client-server application that requires to be deployed on two distinct and independent cloud providers. The first cloud provider (also storage cloud provider - SCP) is responsible for storing encrypted aggregate-level i2b2 data. The second cloud provider (also proxy cloud provider - PCP) is responsible for helping SCP during the re-encryption phase so that the query result is only .

Cloud Servers Setup

  1. Depending on the architecture and OS of the servers being used (default is amd64) execute:

./compileLinux.sh or ./compileMac.sh or ./compileWindows.sh

  1. Copy the executable to each cloud server (both SCP and PCP)

scp $GOPATH/src/github.com/JLRgithub/PDCi2b2/app/PDCi2b2 .

  1. Copy the database configuration (db.toml) to the SCP sever and modify it according to your own database settings

scp $GOPATH/src/github.com/JLRgithub/PDCi2b2/app/db.toml .

  1. For each server run a "server setup" and follow the installation guide

./PDCi2b2 server setup

  1. Create in your client machine (the one that will act as your client) a group.toml and append the content of all the public.toml files created during each setup.

  2. Start each cloud server

./PDCi2b2 server -c private.toml or..

./PDCi2b2 -d 1 server -c private.toml (in debug mode)

Extraction Transformation and Loading (ETL) Phase

  1. Extract a CSV file from an i2b2 instance or generate a new one with the following columns (e.g., testdata.csv):
  • location_cd: this column stores the code of the location X at which aggregate-level data has been collected
  • time: this column stores the time unit Y of the data collection, which can be set at any interval (i.e., month, quarter, year) as long as the unit is consistent across the dataset;
  • concept_cd: this column stores the ontology code Z, e.g., a clinical condition, a medication, a lab result, a procedure, etc.;
  • totalnum: this column stores the number of patients at location X and time unit Y who have an observation specified by the ontology concept Z.
  1. Encrypt the extracted (or generated) CSV file (input.csv) in order to obtain another csv file as output (output.csv):

./PDCi2b2 encryptCSV -f group.toml -a totalnum -i input.csv -o output.csv

  1. Load the encrypted CSV file in the PostgreSQL database at the storage cloud provider server

Run Query

github.com/lca1/unlynx/services/data/handle_data_test.go

  1. For each server run a "server setup" and follow the installation guide

./unlynx server setup

  1. Create in your client machine (the one that will act as your client) a group.toml and append the content of all the public.toml files created during each setup.

  2. Start each UnLynx conode

./unlynx server -c private.toml

  1. Run a query, for example:

./unlynx -d 1 -f group.toml -s "{s0, s1}" -w "{w0, 1, w1, 1}" -p "(v0 == v1 && v2 == v3)" -g "{g0, g1, g2}"

What each flag stands for:

-d = debug level; -f = group definition file; -s = select attributes; -w = where attributes + values; -p = query predicate; -g = group by attributes

Clone this wiki locally