-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dagster-aws] add Pipes cloudwatch message reader #23353
Conversation
747dce0
to
18aafe3
Compare
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @danielgafni and the rest of your teammates on Graphite |
Deploy preview for dagster-docs ready! Preview available at https://dagster-docs-pumhsafkc-elementl.vercel.app Direct link to changed pages: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great overall. Needs more tests and a more flesh-out test plan.
Can you also provide a more thorough "summary and motivation" section about why this and the tradeoffs involved. If memory serves there are a real cost tradeoffs here and it would be good to understand that and document them.
Side note: I discovered that Found this issue here getmoto/moto#1941) describing similar situations. Apparently, it's necessary to have at least some credentials configured. Should we provide testing credentails for all our pytest tests automatically? |
2e31171
to
d31a46d
Compare
Deploy preview for dagit-core-storybook ready! ✅ Preview Built with commit 2e31171. |
@schrockn hey, please take a look at I had to manually create CloudWatch logs (since I'm planning to move examples/docs changes into a separate PR tomorrow. |
examples/docs_snippets/docs_snippets/guides/dagster/dagster_pipes/glue/dagster_code.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some minor inline comments.
My big question here is what will appear in Dagster UI? We are now getting all the cloudwatch logs. Are they all going to show up in stdout? Where does stderr emitted from a glue job end up?
Paint me a picture of the end-to-end logging experience in Dagster in this case.
@schrockn As I suspected, AWS Glue routes both The following code: with open_dagster_pipes(
params_loader=PipesCliArgsParamsLoader(),
context_loader=PipesS3ContextLoader(client=boto3.client("s3"))
) as pipes:
pipes.log.info("Hello from external process!")
pipes.report_asset_materialization(
metadata={
"some_metric": {"raw_value": 0, "type": "int"}
},
data_version="alpha",
)
print("hello from stdout")
print("hello from stderr", file=sys.stderr) produces this: The line I don' think we can (or should) do anything about it. This is just how Glue is. |
Yup 100% agree. Can you document this behavior in both the PR summary and as a docblock? |
In terms of eliminating the glue-specific wrapper reader/writer classes, can you do a seperate PR stacked on this with colton included so we can discuss it separately? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
request for docs
6e80673
to
dc0da1c
Compare
cf82b96
to
6021a9f
Compare
6021a9f
to
89ec8ce
Compare
89ec8ce
to
3cf6d49
Compare
I believe this PR should be ready |
python_modules/libraries/dagster-aws/dagster_aws_tests/pipes_tests/test_pipes.py
Outdated
Show resolved
Hide resolved
af7fce7
to
afe07d1
Compare
afe07d1
to
b9a5c95
Compare
## Summary & Motivation Deleting dummy Pipes classes from AWS Pipes. These classes didn't provide any functionality and their introduction was questionable from the start. Context: - #23353 (comment) - #22968 I decided to keep `PipesGlueLambdaEventContextInjector`, because unlike the the other classes, it's actually used for a unique purpose: injecting variables into Lambda event input (it might be a bit confusing because it inherits from `PipesEnvContextInjector` but doesn't actually do anything with environment variables). @schrockn let me know if you think if it makes sense ## How I Tested These Changes Nothing was really using these classes
## Summary & Motivation Deleting dummy Pipes classes from AWS Pipes. These classes didn't provide any functionality and their introduction was questionable from the start. Context: - #23353 (comment) - #22968 I decided to keep `PipesGlueLambdaEventContextInjector`, because unlike the the other classes, it's actually used for a unique purpose: injecting variables into Lambda event input (it might be a bit confusing because it inherits from `PipesEnvContextInjector` but doesn't actually do anything with environment variables). @schrockn let me know if you think if it makes sense ## How I Tested These Changes Nothing was really using these classes
e22ac4e
to
430d2bf
Compare
Summary & Motivation
resolve #23056
Adds PipesCloudWatchMessageReader which can be used with different AWS services.
It's a solid default MessageReader since various AWS services already emit CloudWatch logs by default.
Right now it's reading the full CloudWatch log stream provided by the user and routing it to Dagster's stdout. That's the minimal functionality needed by Glue Pipes.
In the future, we can think about changing this behavior, mainly:
It's work mentioning that Glue routes both
stdout
andstderr
to/output
log group. Thus, we only need to read a single stream (with the job run id) from this group in order to receive all necessary information.How I Tested These Changes