Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
Tony Pan committed Sep 9, 2024
2 parents d9b3be5 + 403e92d commit 47b4ab1
Showing 1 changed file with 142 additions and 10 deletions.
152 changes: 142 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,31 +22,163 @@ Scripts and tools for organizing uploads to the CHoRUS central data repository
| MAYO | LOCAL - Windows |


# INSTALLATION

First install the python package "flit"
## Installation

The Upload Tool is a Python package. The package can be installed using pip. The package requires Python 3.7 or later. A virtual environment (venv or conda) is strongly recommended.

1. create and configure a conda environment
```
conda create --name chorus python=3.10.14
conda activate chorus
pip install flit
```

## Local Dev Installation
or alternatively with python virtual envionment
```
python -m venv {venv_directory}
source {venv_directory}/bin/activate
This will set up a symlink in the site-package directory to the source tree.
pip install flit
```

From the root of the source tree, run
2. get the software
```
git clone https://github.com/chorus-ai/chorus-extract-upload
cd chorus-extract-upload
```

3. install the software and dependencies
```
flit install
```

NOTE: for developers, you can instead run
```
flit install --symlink
```
which allows changes in the code directory to be immediately reflected in the python environment.

4. Configure /etc/hosts
You need to modify the `/etc/hosts` file on the system from which you will be running the upload tool.

You will need root access to edit this file. Add the following to the file:
```
xxx.xxx.xxx.xxx choruspilotstorage.blob.core.windows.net
```

If this is not configured, you may see error in AZ CLI like so:

```
The request may be blocked by network rules of storage account. Please check network rule set using 'az storage account show -n accountname --query networkRuleSet'.
If you want to change the default action to apply when no rule matches, please use 'az storage account update'.
```

And with the built-in azure python library:
```
HttpResponseError: Operation returned an invalid status 'This request is not authorized to perform this operation.'
ErrorCode:AuthorizationFailure
```


On windows, the `/etc/hosts` file is instead `C:\Windows\system32\drivers\etc\hosts`. Administrator privilege is needed to edit this file.

5. AZ CLI installation
You can configure the tool to use AZ CLI to upload files to the CHoRUS central cloud, or alternatively use the built in azure library for upload. If you will be using AZ CLI, please install AZ CLI according to Microsoft instructions:

[Install Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli)
[Install Azcopy](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10?tabs=dnf)

Also, please make sure that the following environment variable is set:

Windows
```
set AZURE_CLI_DISABLE_CONNECTION_VERIFICATION 1
set ADAL_PYTHON_SSL_NO_VERIFY 1
```

Linux
```
export AZURE_CLI_DISABLE_CONNECTION_VERIFICATION=1
export ADAL_PYTHON_SSL_NO_VERIFY=1
```


## Configuration File:
A `config.toml` file needs to be customized for each DGS.

```
[configuration]
# upload method can be one of "azcli" or "builtin"
upload_method = "builtin"
[journal]
path = "az://container/journal.db" # specify the journal file name. defaults to cloud storage
azure_account_name = "account_name"
azure_sas_token = "sastoken"
[central_path]
# specify the central (target) container/path, to which files are uploaded.
# This is also the default location for the journal file
path = "az://container/"
azure_account_name = "account_name"
azure_sas_token = "sastoken"
[site_path]
[site_path.default]
# specify the default site (source) path
path = "/mnt/data/site"
# each can have its own access credentials
[site_path.OMOP]
# optional: specific root paths for omop data
path = "/mnt/data/site"
[site_path.Images]
# optional: specific root paths for images
path = "s3://container/path"
aws_access_key_id = "access_key_id"
aws_secret_access_key = "secret_access"
[site_path.Waveforms]
path = "/mnt/another_datadir/site"
```


## Usage

The Upload Tool is named `chorus_upload_journal/upload_tools` in the subdirectory `upload_manifest` in `chorus-extract-upload`. The Upload Tool can be run as

```
python chorus_upload_journal/upload_tools [params] <subcommand> [subcommand params]
```

The `-h` parameter will display help information for the tool or each subcommand. Suppported commands include `update`, `upload`, `usage`, and `verify`

Different `config.toml` files can be specified by using the `-c` parameter



## Install Package
### Setting Azure credential

This will become available once the tool is released to the public. We expect the installation to involve
running
From the Azure Portal, navigate to `Storage Account` / `Containers`, and select your DGS container. Please make note of the account name (should be `choruspilotstorage`) and the container name (should be a short name for your institution). In the left menu, select `Settings` / `Shared access tokens`. Create a new SAS token with `Read`, `Add`, `Create`, `Write`, and `List` enabled, and optionally `Delete` if you intend to use the same sas token for deletion later. We do not need `Immutable Storage`. Copy the SAS token string and save it in a secure location. The SAS token will be used by the Upload Tool.

If you are transferring files from a cloud account to CHoRUS, please refer to you institution's documentation to retrieve credentials for other storage clouds. For a list of supported authentication mechanisms for each tested cloud providers, please see the `config.toml.template` file.

### Create or Update Manifest
To create or an update manifest, the required parameters are a manifest name, a `site-path`, and optionally the cloud credential if `site-path` is a cloud storage path. Optionally, the type of data (`OMOP`, `Images`, `Waveforms`) to use to update manifest may be specified. Multiple manifest updates may be performed before a data submission.

```
pip install upload_manifest
python chorus_upload_journal/upload_tools update-journal
```



### Upload files

File upload follows the same pattern as manifest update.

From local file system
```
python chorus_upload_journal/upload_tools upload-files
```

0 comments on commit 47b4ab1

Please sign in to comment.