-
Notifications
You must be signed in to change notification settings - Fork 1
Running PDCi2b2
PDCi2b2 is a client-server application that requires to be deployed on two distinct and independent cloud providers. The first cloud provider (also storage cloud provider - SCP) is responsible for storing encrypted aggregate-level i2b2 data. The second cloud provider (also proxy cloud provider - PCP) is responsible for helping SCP during the re-encryption phase so that the query result can only be decrypted by the user. An example of a deployment (local) can be found at test_local_deployment.
- Depending on the architecture and OS of the servers being used (default is amd64) build the PDCi2b2 executable:
./compileLinux.sh
or ./compileMac.sh
or ./compileWindows.sh
- Copy the executable to each cloud server (both SCP and PCP):
scp $GOPATH/src/github.com/JLRgithub/PDCi2b2/app/PDCi2b2 .
- Copy the database configuration (db.toml) to the SCP sever and update the parameters' values according to your own database settings:
scp $GOPATH/src/github.com/JLRgithub/PDCi2b2/app/db.toml .
- For each server run a "server setup" and follow the installation guide:
./PDCi2b2 server setup
-
Create in your client machine (the one that will act as your client) a group.toml and append the content of all the public.toml files created during each setup.
-
Start each cloud server:
./PDCi2b2 server -c private.toml
or..
./PDCi2b2 -d 1 server -c private.toml
(in debug mode)
- Extract a CSV file from an i2b2 instance or generate a new one with the following columns (e.g., testdata.csv):
- location_cd: this column stores the code of the location X at which aggregate-level data has been collected
- time: this column stores the time unit Y of the data collection, which can be set at any interval (i.e., month, quarter, year) as long as the unit is consistent across the dataset;
- concept_cd: this column stores the ontology code Z, e.g., a clinical condition, a medication, a lab result, a procedure, etc.;
- totalnum: this column stores the number of patients at location X and time unit Y who have an observation specified by the ontology concept Z.
- Encrypt the extracted (or generated) CSV file (input.csv) in order to obtain another csv file as output (output.csv):
./PDCi2b2 encryptCSV -f group.toml -a totalnum -i input.csv -o output.csv
- Load the encrypted CSV file in the PostgreSQL database at the SCP server.
PDCi2b2 allows to run SQL aggregate queries of the form SELECT SUM(totalnum) FROM table WHERE {location_cd}* AND {time}* AND {concept_cd}* GROUP BY *
on encrypted data stored in the database at the SCP server.
For example, to do so, execute on the client machine:
./PDCi2b2 -d 1 run -f group.toml -l location1,location2,.. -t time1,time2,.. -c concept1,concept2,.. -g group1,group2,... -o result.csv
What each flag stands for:
*d = debug level;
*f = servers' definition FILE (default: "group.toml");
*l = location codes in the SQL-WHERE clause (a comma-separated list of location codes, e.g., location1,location2
, is equivalent to the Boolean expression location1 OR location2
);
*t = time frames in the SQL-WHERE clause (a comma-separated list of time frames, e.g., time1,time2
, is equivalent to the Boolean expression time1 OR time2
);
*c = concept codes in the SQL-WHERE clause (a comma-separated list of concept codes, e.g., concept1,concept2
, is equivalent to the Boolean expression concept1 OR concept2
);
*g = attributes in the SQL-GROUPBY clause. Possible values: 'location_cd', 'concept_cd', 'time'
*o = output CSV FILE containing the query result