Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Risk]: Processing large data volumes with CWL and Docker #14

Open
LucaCinquini opened this issue Apr 16, 2023 · 2 comments
Open

[Risk]: Processing large data volumes with CWL and Docker #14

LucaCinquini opened this issue Apr 16, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request U-SPS
Milestone

Comments

@LucaCinquini
Copy link
Collaborator

Who: U-SPS
When: April 2023
What: Copying data from PCM to Docker Container might cause issues: large data volumes require a lot of storage/time for data transfer. Might cause issues with CWL.

@LucaCinquini LucaCinquini added the enhancement New feature or request label Apr 16, 2023
@LucaCinquini LucaCinquini added this to the 23.2 milestone Apr 16, 2023
@LucaCinquini LucaCinquini self-assigned this Apr 16, 2023
@LucaCinquini
Copy link
Collaborator Author

TLDR: when executing multiple CWL steps via Docker containers, it seems like data is not copied every time from one container to the other, but rather referenced by mounting volumes into the successive containers. So when executing the CHIRP workflow we should make sure that:
a) The EKS node has enough storage to store all input and output data (a single copy only)
b) The CWL steps do NOT use the "staging" option, which would cause the input data to be copied to the current working directory. In other words, do NOT do something like this:
requirements:
InitialWorkDirRequirement:
listing:
- $(inputs.src)

@LucaCinquini
Copy link
Collaborator Author

The evidence: I run the L1A workflow which downloads data from DAPA, and uses ancillary data stored on EFS. The detailed steps of each Docker execution show the volumes been mounted onto successive containers.

cwl-runner ssips_L1a_workflow.cwl ssips_L1a_workflow_mcp_test.yml
....
INFO [job l1a-stage-in-2] /tmp/hrenmt53$ docker
run
-i
*--mount=type=bind,source=/tmp/hrenmt53,target=/NIGNgi *
--mount=type=bind,source=/tmp/60tmw3l0,target=/tmp
--workdir=/NIGNgi
--read-only=true
--log-driver=none
--user=1000:1000
--rm
--cidfile=/tmp/a0fzvbt4/20230420171306-696452.cid
--env=TMPDIR=/tmp
--env=HOME=/NIGNgi
--env=AWS_REGION=us-west-2
'--env=CLIENT_ID=(secret-537db025-591d-4397-a20a-405a57b025da)'
--env=COGNITO_URL=https://cognito-idp.us-west-2.amazonaws.com
--env=COLLECTION_ID=L0_SNPP_ATMS_SCIENCE___1
--env=DAPA_API=https://58nbcawrvb.execute-api.us-west-2.amazonaws.com/test
--env=DATE_FROM=2016-01-14T08:00:00Z
--env=DATE_TO=2016-01-14T11:59:59Z
--env=DOWNLOAD_DIR=/NIGNgi/atms_science
--env=LIMITS=100
--env=LOG_LEVEL=20
'--env=PASSWORD=(secret-4af20f79-7640-4c15-b158-39846b7c8680)'
--env=PASSWORD_TYPE=PARAM_STORE
'--env=USERNAME=(secret-f62c6163-fa2a-4d0b-8b4e-6cc82dc0f0c1)'
--env=VERIFY_SSL=FALSE
ghcr.io/unity-sds/unity-data-services:1.10.1
download > /tmp/hrenmt53/stdout_dapa_download.txt 2> /tmp/hrenmt53/stderr_dapa_download.txt
INFO [job l1a-stage-in-2] Max memory used: 0MiB
INFO [job l1a-stage-in-2] completed success
INFO [step l1a-stage-in-2] completed success
INFO [workflow ] starting step l1a-stage-in-1
INFO [step l1a-stage-in-1] start
INFO [job l1a-stage-in-1] /tmp/6pktu7pp$ docker
run
-i
*--mount=type=bind,source=/tmp/6pktu7pp,target=/NIGNgi *
--mount=type=bind,source=/tmp/q1mpumql,target=/tmp
--workdir=/NIGNgi
--read-only=true
--log-driver=none
--user=1000:1000
--rm
--cidfile=/tmp/4ssdpgrz/20230420171336-064311.cid
--env=TMPDIR=/tmp
--env=HOME=/NIGNgi
--env=AWS_REGION=us-west-2
'--env=CLIENT_ID=(secret-537db025-591d-4397-a20a-405a57b025da)'
--env=COGNITO_URL=https://cognito-idp.us-west-2.amazonaws.com
--env=COLLECTION_ID=L0_SNPP_EphAtt___1
--env=DAPA_API=https://58nbcawrvb.execute-api.us-west-2.amazonaws.com/test
--env=DATE_FROM=2016-01-14T08:00:00Z
--env=DATE_TO=2016-01-14T11:59:59Z
--env=DOWNLOAD_DIR=/NIGNgi/ephatt
--env=LIMITS=100
--env=LOG_LEVEL=20
'--env=PASSWORD=(secret-4af20f79-7640-4c15-b158-39846b7c8680)'
--env=PASSWORD_TYPE=PARAM_STORE
'--env=USERNAME=(secret-f62c6163-fa2a-4d0b-8b4e-6cc82dc0f0c1)'
--env=VERIFY_SSL=FALSE
ghcr.io/unity-sds/unity-data-services:1.10.1
download > /tmp/6pktu7pp/stdout_dapa_download.txt 2> /tmp/6pktu7pp/stderr_dapa_download.txt
INFO [job l1a-stage-in-1] Max memory used: 67MiB
INFO [job l1a-stage-in-1] completed success
INFO [step l1a-stage-in-1] completed success
INFO [workflow ] starting step l1a-run-pge
INFO [step l1a-run-pge] start
INFO [workflow l1a-run-pge] start
INFO [workflow l1a-run-pge] starting step l1a_process
INFO [step l1a_process] start
INFO ['docker', 'pull', 'public.ecr.aws/unity-ads/sounder_sips_l1a_pge:r0.2.0']
r0.2.0: Pulling from unity-ads/sounder_sips_l1a_pge
d7bfe07ed847: Pull complete
2e8eaf67b67e: Pull complete
732644f00cd7: Pull complete
4f4fb700ef54: Pull complete
d7413cb7e953: Pull complete
f5006e242035: Pull complete
4f57eff15618: Pull complete
035e8fad77be: Pull complete
d36fd955f407: Pull complete
d6d9af327181: Pull complete
2e34d8491065: Pull complete
28f635eb91af: Pull complete
9bd91e81ff3d: Pull complete
bccf2a8cadca: Pull complete
af54cd59bb64: Pull complete
f4618619ba24: Pull complete
199c46d5f1ec: Pull complete
bfaf7925739b: Pull complete
8a74aa4320c7: Pull complete
fefe7a6488d5: Pull complete
Digest: sha256:2079775e5581d693908f0b56b475898f9bfe7ce35f9177ab090ab7d733eef32a
Status: Downloaded newer image for public.ecr.aws/unity-ads/sounder_sips_l1a_pge:r0.2.0
INFO [job l1a_process] /tmp/55klcu9h$ docker
run
-i
--mount=type=bind,source=/tmp/55klcu9h,target=/NIGNgi
--mount=type=bind,source=/tmp/mmkno7ym,target=/tmp
*--mount=type=bind,source=/tmp/6pktu7pp/ephatt,target=/var/lib/cwl/stgad090a1d-ff63-4d1a-bb4b-d11c7b6c8f94/ephatt,readonly *
*--mount=type=bind,source=/tmp/hrenmt53/atms_science,target=/var/lib/cwl/stgf9a7818a-9b37-45b7-8e11-21be8d3f2081/atms_science,readonly *
*--mount=type=bind,source=/tmp/SOUNDER_SIPS/STATIC_DATA,target=/var/lib/cwl/stg683ed8e2-c96b-4ab7-bd27-24a6c384722d/STATIC_DATA,readonly *
--workdir=/NIGNgi
--read-only=true
--log-driver=none
--user=1000:1000
--rm
--cidfile=/tmp/3iinmgmz/20230420171453-362624.cid
--env=TMPDIR=/tmp
--env=HOME=/NIGNgi
public.ecr.aws/unity-ads/sounder_sips_l1a_pge:r0.2.0
/NIGNgi/processed_notebook.ipynb
-p
input_ephatt_path
/var/lib/cwl/stgad090a1d-ff63-4d1a-bb4b-d11c7b6c8f94/ephatt
-p
input_science_path
/var/lib/cwl/stgf9a7818a-9b37-45b7-8e11-21be8d3f2081/atms_science
-p
output_path
/NIGNgi
-p
data_static_path
/var/lib/cwl/stg683ed8e2-c96b-4ab7-bd27-24a6c384722d/STATIC_DATA
-p
start_datetime
2016-01-14T08:00:00Z
-p
end_datetime
2016-01-14T11:59:59Z > /tmp/55klcu9h/l1a_pge_stdout.txt 2> /tmp/55klcu9h/l1a_pge_stderr.txt
INFO [job l1a_process] Max memory used: 0MiB
INFO [job l1a_process] completed success
INFO [step l1a_process] completed success
INFO [workflow l1a-run-pge] completed success
INFO [step l1a-run-pge] completed success
INFO [workflow ] starting step l1a-stage-out
INFO [step l1a-stage-out] start
INFO [job l1a-stage-out] /tmp/fsk1wpdq$ docker
run
-i
--mount=type=bind,source=/tmp/fsk1wpdq,target=/NIGNgi
--mount=type=bind,source=/tmp/omxw_wcr,target=/tmp
--mount=type=bind,source=/tmp/55klcu9h,target=/NIGNgi/55klcu9h,readonly
--workdir=/NIGNgi
--read-only=true
--log-driver=none
--user=1000:1000
--rm
--cidfile=/tmp/cibjn0lo/20230420171810-262195.cid
--env=TMPDIR=/tmp
--env=HOME=/NIGNgi
--env=AWS_REGION=us-west-2
'--env=CLIENT_ID=(secret-537db025-591d-4397-a20a-405a57b025da)'
--env=COGNITO_URL=https://cognito-idp.us-west-2.amazonaws.com
--env=COLLECTION_ID=SNDR_SNPP_ATMS_L1A_OUTPUT___1
--env=DAPA_API=https://58nbcawrvb.execute-api.us-west-2.amazonaws.com/test
--env=DELETE_FILES=FALSE
--env=LOG_LEVEL=20
'--env=PASSWORD=(secret-4af20f79-7640-4c15-b158-39846b7c8680)'
--env=PASSWORD_TYPE=PARAM_STORE
--env=PROVIDER_ID=SNPP
--env=STAGING_BUCKET=uds-test-cumulus-staging
--env=UPLOAD_DIR=/NIGNgi/55klcu9h
'--env=USERNAME=(secret-f62c6163-fa2a-4d0b-8b4e-6cc82dc0f0c1)'
--env=VERIFY_SSL=FALSE
ghcr.io/unity-sds/unity-data-services:1.10.1
upload > /tmp/fsk1wpdq/stdout_dapa_upload.txt 2> /tmp/fsk1wpdq/stderr_dapa_upload.txt
INFO [job l1a-stage-out] Max memory used: 68MiB
INFO [job l1a-stage-out] completed success

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request U-SPS
Projects
Status: Done
Development

No branches or pull requests

2 participants