Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Redis task handler #31855

Merged
merged 2 commits into from
Jul 23, 2023
Merged

Add Redis task handler #31855

merged 2 commits into from
Jul 23, 2023

Conversation

michalc
Copy link
Contributor

@michalc michalc commented Jun 12, 2023

This stores log lines in Redis up to a configured maximum log lines, always keeping the most recent, up to a configured TTL.

This deviates from other existing task handlers in that it accepts a connection ID. This allows it to be used in addition to other handlers, and so allows a graceful/reversible transition from one logging system to another.

This is particularly useful in situations that use Redis as a message broker, where additional infrastructure isn't desired.

closes: #31834

@boring-cyborg
Copy link

boring-cyborg bot commented Jun 12, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@michalc michalc force-pushed the redis-task-handler branch 14 times, most recently from 994a902 to 9e5e40f Compare June 12, 2023 17:47
@michalc michalc marked this pull request as ready for review June 12, 2023 18:09
@michalc michalc force-pushed the redis-task-handler branch 3 times, most recently from d289b21 to b675ebc Compare June 12, 2023 20:00
@michalc
Copy link
Contributor Author

michalc commented Jun 12, 2023

@hussein-awala This is a PR for #31834, and is ready for a first pass at a review I hope.

(I'm guessing at a good number of things...)

If omitted, this is the equivalent of 28 days.
:param conn_id:
Airflow connection ID for the Redis hook to use
If omitted or None, conf.get("logging", "REMOTE_LOG_CONN_ID") is used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conf.get("logging", "REMOTE_LOG_CONN_ID") is this supposed to be here?
If you intend to specify that airflow's logging config will be used, would be better to mention that explicitly. This probably won't make sense to users, unless they are aware of internals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have changed it to logging.remote_log_conn_id, which is how the documentation refers to the setting at https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#logging

Comment on lines 76 to 78
log_str = b"\n".join(self.conn.lrange(self._render_filename(ti, try_number), start=0, end=-1)).decode(
"utf-8"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log_str = b"\n".join(self.conn.lrange(self._render_filename(ti, try_number), start=0, end=-1)).decode(
"utf-8"
)
log_str = b"\n".join(self.conn.lrange(self._render_filename(ti, try_number), start=0, end=-1)).decode()

default encoding format is utf-8 ref - https://docs.python.org/3/library/stdtypes.html#bytes.decode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now removed

@@ -147,7 +147,7 @@ class FileTaskHandler(logging.Handler):

def __init__(self, base_log_folder: str, filename_template: str | None = None):
super().__init__()
self.handler: logging.FileHandler | None = None
self.handler: logging.Handler | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change look out of place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true - but it allows the lower level handler, _RedisHandler to not inherit from logging.FileHandler, which I think is appropriate since there is no file involved(?)

It looks like the codebase already doesn't enforce that the lower level handler extends from logging.FileHandler -

self.handler = watchtower.CloudWatchLogHandler(
which is defined in https://github.com/kislyuk/watchtower/blob/698ff2241befb6586269b6bbce0e305032c2d45c/watchtower/__init__.py#L139

This touches on the bits that I'm unsure about - it looks like FileTaskHandler has a lot of file-related things, but also it looks like it's expected to extend from it when not logging to a file. Not quite sure how to tackle that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have been more specific initially. :)

What I meant was that maybe this change is not required by this PR? since you have anyway overridden the self.handler in class RedisTaskHandler. Also, FileTaskHandler is used in many places, and changing it here might affect those implementations.

About inheriting from FileTaskHandler I'm not sure either, maybe @bolkedebruin might have more context.

Copy link
Contributor Author

@michalc michalc Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was that maybe this change is not required by this PR? since you have anyway overridden the self.handler in class RedisTaskHandler

Ah - so if I change it back to how it was before this PR, I get a type error:

airflow/providers/redis/log/redis_task_handler.py:61: error: Incompatible types
in assignment (expression has type "Optional[_RedisHandler]", base class
"FileTaskHandler" defined the type as "Optional[FileHandler]")  [assignment]

So something else would have to change too. The easiest way that I've come up with is to de-facto disable type checking on self.handler by removing type annotations in a few places, to make it more like the Cloudwatch handler, but that seems a bit... cheat-y.

_RedisHandler could really inherit from logging.FileHandler to satisfy type checking, but its constructor seems to require a path to a file, which seems odd in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is OK to me since the base class only really needs to inner handler to be a logging.Handler. It could be moved to a separate PR but since this is where the change becomes necessary (instead of just a generally good thing to do) it’s fine to be included here.

Copy link
Contributor

@eladkal eladkal Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this specific change is not tightly coupled to this PR I would suggest to do it in a separated PR.
this is because utils/log goes to core release cycle while the other files goes to provider release cycle.
@michalc Is it possible to separate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalc can you check my above comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladkal Sounds reasonable to separate to another PR. Will do that now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladkal Done in #32773

@michalc michalc force-pushed the redis-task-handler branch 2 times, most recently from d069378 to 7707b2c Compare June 13, 2023 05:23
@michalc michalc force-pushed the redis-task-handler branch 3 times, most recently from a281203 to 807a638 Compare July 22, 2023 15:38
@michalc michalc force-pushed the redis-task-handler branch 3 times, most recently from 2e43225 to 21f5a7d Compare July 22, 2023 17:07
@michalc
Copy link
Contributor Author

michalc commented Jul 22, 2023

@michalc can you rebase and resolve the error?

@eladkal Now done (hopefully...)

@eladkal
Copy link
Contributor

eladkal commented Jul 22, 2023

@eladkal Now done (hopefully...)

Cool. Running CI... lets wait to make sure its green

FileTaskHandler is the base class for logging handlers, including those that
don't log to files via delegating to logging.FileHandler, e.g. in the
CloudwatchTaskHandler at
https://github.com/apache/airflow/blob/2940b9fa55a6a72c60c2162e541631addec3d6b8/airflow/providers/amazon/aws/log/cloudwatch_task_handler.py#L67

It is suspected that type checking is not enabled in this part of the
CloudwatchTaskHandler otherwise it would have already been failing.

This change adjusts the base class so if type checking is enabled in the task
handler, if it delegates to a logging.Handler that is not a logging.FileHandler
as the CloudWatchHandler, then the type checking should pass.

This was originally part of apache#31855 and
split out.

related: apache#31834
This stores log lines in Redis up to a configured maximum log lines, always
keeping the most recent, up to a configured TTL.

This deviates from other existing task handlers in that it accepts a connection
ID. This allows it to be used in addition to other handlers, and so allows a
graceful/reversible transition from one logging system to another.

This is particularly useful in situations that use Redis as a message broker,
where additional infrastructure isn't desired.

closes: apache#31834
@michalc
Copy link
Contributor Author

michalc commented Jul 22, 2023

@eladkal I saw a failing test that I didn't think is related to changes here, so have rebased on main (and rebased #32773)

@eladkal eladkal merged commit 42b4b43 into apache:main Jul 23, 2023
42 checks passed
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 23, 2023

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@michalc michalc deleted the redis-task-handler branch July 23, 2023 06:50
michalc added a commit to uktrade/airflow that referenced this pull request Jul 23, 2023
FileTaskHandler is the base class for logging handlers, including those that
don't log to files via delegating to logging.FileHandler, e.g. in the
CloudwatchTaskHandler at
https://github.com/apache/airflow/blob/2940b9fa55a6a72c60c2162e541631addec3d6b8/airflow/providers/amazon/aws/log/cloudwatch_task_handler.py#L67

It is suspected that type checking is not enabled in this part of the
CloudwatchTaskHandler otherwise it would have already been failing.

This change adjusts the base class so if type checking is enabled in the task
handler, if it delegates to a logging.Handler that is not a logging.FileHandler
as the CloudWatchHandler, then the type checking should pass.

This was originally part of apache#31855 and
split out.

related: apache#31834
@eladkal eladkal added this to the Airflow 2.6.4 milestone Jul 23, 2023
@eladkal eladkal added the type:misc/internal Changelog: Misc changes that should appear in change log label Jul 23, 2023
@eladkal eladkal modified the milestones: Airflow 2.6.4, Airflow 2.7.0 Aug 1, 2023
ahidalgob pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request May 15, 2024
* Allow FileTaskHandler to delegate to instances of logging.Handler

FileTaskHandler is the base class for logging handlers, including those that
don't log to files via delegating to logging.FileHandler, e.g. in the
CloudwatchTaskHandler at
https://github.com/apache/airflow/blob/2940b9fa55a6a72c60c2162e541631addec3d6b8/airflow/providers/amazon/aws/log/cloudwatch_task_handler.py#L67

It is suspected that type checking is not enabled in this part of the
CloudwatchTaskHandler otherwise it would have already been failing.

This change adjusts the base class so if type checking is enabled in the task
handler, if it delegates to a logging.Handler that is not a logging.FileHandler
as the CloudWatchHandler, then the type checking should pass.

This was originally part of apache/airflow#31855 and
split out.

related: apache/airflow#31834

* Add Redis task handler

This stores log lines in Redis up to a configured maximum log lines, always
keeping the most recent, up to a configured TTL.

This deviates from other existing task handlers in that it accepts a connection
ID. This allows it to be used in addition to other handlers, and so allows a
graceful/reversible transition from one logging system to another.

This is particularly useful in situations that use Redis as a message broker,
where additional infrastructure isn't desired.

closes: apache/airflow#31834
GitOrigin-RevId: 42b4b43c4c2ccf0b6e7eaa105c982df495768d01
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 19, 2024
* Allow FileTaskHandler to delegate to instances of logging.Handler

FileTaskHandler is the base class for logging handlers, including those that
don't log to files via delegating to logging.FileHandler, e.g. in the
CloudwatchTaskHandler at
https://github.com/apache/airflow/blob/2940b9fa55a6a72c60c2162e541631addec3d6b8/airflow/providers/amazon/aws/log/cloudwatch_task_handler.py#L67

It is suspected that type checking is not enabled in this part of the
CloudwatchTaskHandler otherwise it would have already been failing.

This change adjusts the base class so if type checking is enabled in the task
handler, if it delegates to a logging.Handler that is not a logging.FileHandler
as the CloudWatchHandler, then the type checking should pass.

This was originally part of apache/airflow#31855 and
split out.

related: apache/airflow#31834

* Add Redis task handler

This stores log lines in Redis up to a configured maximum log lines, always
keeping the most recent, up to a configured TTL.

This deviates from other existing task handlers in that it accepts a connection
ID. This allows it to be used in addition to other handlers, and so allows a
graceful/reversible transition from one logging system to another.

This is particularly useful in situations that use Redis as a message broker,
where additional infrastructure isn't desired.

closes: apache/airflow#31834
GitOrigin-RevId: 42b4b43c4c2ccf0b6e7eaa105c982df495768d01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging area:providers provider:redis type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redis task handler for logs
6 participants