Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(s3): add session token support for S3 access #384

Merged

Conversation

mstysk
Copy link
Contributor

@mstysk mstysk commented Aug 20, 2024

Why

Failed to connect to S3 server.

How

Added support for AWS session tokens in the S3 file system initialization. The session token is retrieved from the environment variable DATACONTRACT_S3_SESSION_TOKEN.

Log

Error Log

docker run --rm    -v '/path/to/data-contract:/app'  -e DATACONTRACT_S3_REGION=ap-northeast-1 -e DATACONTRACT_S3_ACCESS_KEY_ID=**** -e DATACONTRACT_S3_SECRET_ACCESS_KEY=**** -e DATACONTRACT_S3_SESSION_TOKEN=****  -w /app   datacontract/cli:latest test --server production
Testing datacontract.yaml
ERROR:root:Exception occurred
Traceback (most recent call last):
  File "/opt/venv/lib/python3.11/site-packages/s3fs/core.py", line 723, in _lsdir
    async for c in self._iterdir(
  File "/opt/venv/lib/python3.11/site-packages/s3fs/core.py", line 773, in _iterdir
    async for i in it:
  File "/opt/venv/lib/python3.11/site-packages/aiobotocore/paginate.py", line 30, in __anext__
    response = await self._make_request(current_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/aiobotocore/client.py", line 411, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the ListObjectsV2 operation: The AWS Access Key Id you provided does not exist in our records.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/venv/lib/python3.11/site-packages/datacontract/data_contract.py", line 196, in test
    check_jsonschema(run, data_contract, server)
  File "/opt/venv/lib/python3.11/site-packages/datacontract/engines/fastjsonschema/check_jsonschema.py", line 160, in check_jsonschema
    process_s3_file(server, model_name, validate)
  File "/opt/venv/lib/python3.11/site-packages/datacontract/engines/fastjsonschema/check_jsonschema.py", line 104, in process_s3_file
    for file_content in yield_s3_files(s3_endpoint_url, s3_location):
  File "/opt/venv/lib/python3.11/site-packages/datacontract/engines/fastjsonschema/s3/s3_read_files.py", line 9, in yield_s3_files
    files = fs.glob(s3_location)
            ^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/opt/venv/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/s3fs/core.py", line 802, in _glob
    return await super()._glob(path, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/fsspec/asyn.py", line 804, in _glob
    allpaths = await self._find(
               ^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/s3fs/core.py", line 832, in _find
    return await super()._find(
           ^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/fsspec/asyn.py", line 846, in _find
    if withdirs and path != "" and await self._isdir(path):
                                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/s3fs/core.py", line 1483, in _isdir
    return bool(await self._lsdir(path))
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/s3fs/core.py", line 736, in _lsdir
    raise translate_boto_error(e)
PermissionError: The AWS Access Key Id you provided does not exist in our records.
ERROR:root:The AWS Access Key Id you provided does not exist in our records.
╭────────┬────────────────────┬───────┬────────────────────────────────────────╮
│ Result │ Check              │ Field │ Details                                │
├────────┼────────────────────┼───────┼────────────────────────────────────────┤
│ error  │ Test Data Contract │       │ The AWS Access Key Id you provided     │
│        │                    │       │ does not exist in our records.         │
╰────────┴────────────────────┴───────┴────────────────────────────────────────╯
🔴 data contract is invalid, found the following errors:
1) The AWS Access Key Id you provided does not exist in our records.
task: Failed to run task "contract": exit status 1

Resolved Log

docker run --rm    -v '/path/to/data-contract:/app'  -e DATACONTRACT_S3_REGION=ap-northeast-1 -e DATACONTRACT_S3_ACCESS_KEY_ID=****  -e DATACONTRACT_S3_SECRET_ACCESS_KEY=**** -e DATACONTRACT_S3_SESSION_TOKEN=**** -w /app   docker.io/library/datacontract-cli:local test --server production
Testing datacontract.yaml
╭────────┬───────────────────────────────┬───────┬─────────────────────────────╮
│ Result │ Check                         │ Field │ Details                     │
├────────┼───────────────────────────────┼───────┼─────────────────────────────┤
│ passed │ Check that JSON has valid     │       │ All JSON entries are valid. │
│        │ schema                        │       │                             │
│ passed │ Check that field id is        │       │                             │
│        │ present                       │       │                             │
│ passed │ Check that field title is     │       │                             │
│        │ present                       │       │                             │
╰────────┴───────────────────────────────┴───────┴─────────────────────────────╯
🟢 data contract is valid. Run 3 checks. Took 8.491452 seconds.
  • Tests pass
  • ruff format
  • README.md updated (if relevant)
  •  CHANGELOG.md entry added

Added support for AWS session tokens in the S3 file system
initialization. The session token is retrieved from the environment
variable `DATACONTRACT_S3_SESSION_TOKEN`.
@simonharrer
Copy link
Contributor

Great contribution. Can you document the variable in the README, and add a CHANGELOG entry? Thanks!

@mstysk
Copy link
Contributor Author

mstysk commented Aug 20, 2024

OK! Please wait a moment

@mstysk
Copy link
Contributor Author

mstysk commented Aug 20, 2024

The DATACONTRACT_S3_SESSION_TOKEN variable is already described in the README, so I only added CHANGELOG.

Is this okay?

@jochenchrist
Copy link
Contributor

Thanks for your contribution! LGTM. Happy to merge.

@jochenchrist jochenchrist merged commit 896520e into datacontract:main Sep 3, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants