Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in task logs when using AWS CloudWatch. Do not set start_time #33673

Merged
merged 2 commits into from
Aug 24, 2023

Conversation

vincbeck
Copy link
Contributor

Resolves #33634.

A bug has been introduced in #33231, when tasks are rerun, the task instance refers to the last one only. As a result, when you rerun a task, logs from the first run are not shown in the UI because task_instance.start_date is after the first run. Not setting start_time should solve the issue and should keep the optimization in place because, the optimization is based 100% on end_time.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@o-nikolas
Copy link
Contributor

Without start time, wouldn't we run into the issue again if someone reran a task several times over a long range of time or reran an old task? Or is a new log stream/file created each time?

@vincbeck
Copy link
Contributor Author

vincbeck commented Aug 23, 2023

Without start time, wouldn't we run into the issue again if someone reran a task several times over a long range of time or reran an old task? Or is a new log stream/file created each time?

No I think logs might take more time for this specific use case. But I dont have a solution for that :(. But providing the start_time would not help either. It would be faster but without providing all the logs the user expects.

At the end, providing start_time does not make much sense because we want logs from when the task instance is created, which is when the log stream is created.

@o-nikolas
Copy link
Contributor

Without start time, wouldn't we run into the issue again if someone reran a task several times over a long range of time or reran an old task? Or is a new log stream/file created each time?

No I think logs might take more time for this specific use case. But I dont have a solution for that :(. But providing the start_time would not help either. It would be faster but without providing all the logs the user expects.

At the end, providing start_time does not make much sense because we want logs from when the task instance is created, which is when the log stream is created.

Okay, fair enough. Code looks good otherwise, but there are some failing tests that need fixing

@potiuk potiuk merged commit 53a8973 into apache:main Aug 24, 2023
@vincbeck vincbeck deleted the vincbeck/fix_logs branch August 24, 2023 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to fetch CloudWatch Logs of previous run attempts
3 participants