-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Redis task handler #31855
Add Redis task handler #31855
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
994a902
to
9e5e40f
Compare
d289b21
to
b675ebc
Compare
@hussein-awala This is a PR for #31834, and is ready for a first pass at a review I hope. (I'm guessing at a good number of things...) |
b675ebc
to
f110f0c
Compare
If omitted, this is the equivalent of 28 days. | ||
:param conn_id: | ||
Airflow connection ID for the Redis hook to use | ||
If omitted or None, conf.get("logging", "REMOTE_LOG_CONN_ID") is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conf.get("logging", "REMOTE_LOG_CONN_ID")
is this supposed to be here?
If you intend to specify that airflow's logging config will be used, would be better to mention that explicitly. This probably won't make sense to users, unless they are aware of internals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have changed it to logging.remote_log_conn_id
, which is how the documentation refers to the setting at https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#logging
log_str = b"\n".join(self.conn.lrange(self._render_filename(ti, try_number), start=0, end=-1)).decode( | ||
"utf-8" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log_str = b"\n".join(self.conn.lrange(self._render_filename(ti, try_number), start=0, end=-1)).decode( | |
"utf-8" | |
) | |
log_str = b"\n".join(self.conn.lrange(self._render_filename(ti, try_number), start=0, end=-1)).decode() |
default encoding format is utf-8
ref - https://docs.python.org/3/library/stdtypes.html#bytes.decode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now removed
@@ -147,7 +147,7 @@ class FileTaskHandler(logging.Handler): | |||
|
|||
def __init__(self, base_log_folder: str, filename_template: str | None = None): | |||
super().__init__() | |||
self.handler: logging.FileHandler | None = None | |||
self.handler: logging.Handler | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change look out of place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true - but it allows the lower level handler, _RedisHandler
to not inherit from logging.FileHandler
, which I think is appropriate since there is no file involved(?)
It looks like the codebase already doesn't enforce that the lower level handler extends from logging.FileHandler
-
self.handler = watchtower.CloudWatchLogHandler( |
This touches on the bits that I'm unsure about - it looks like FileTaskHandler
has a lot of file-related things, but also it looks like it's expected to extend from it when not logging to a file. Not quite sure how to tackle that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have been more specific initially. :)
What I meant was that maybe this change is not required by this PR? since you have anyway overridden the self.handler
in class RedisTaskHandler. Also, FileTaskHandler
is used in many places, and changing it here might affect those implementations.
About inheriting from FileTaskHandler
I'm not sure either, maybe @bolkedebruin might have more context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was that maybe this change is not required by this PR? since you have anyway overridden the self.handler in class RedisTaskHandler
Ah - so if I change it back to how it was before this PR, I get a type error:
airflow/providers/redis/log/redis_task_handler.py:61: error: Incompatible types
in assignment (expression has type "Optional[_RedisHandler]", base class
"FileTaskHandler" defined the type as "Optional[FileHandler]") [assignment]
So something else would have to change too. The easiest way that I've come up with is to de-facto disable type checking on self.handler
by removing type annotations in a few places, to make it more like the Cloudwatch handler, but that seems a bit... cheat-y.
_RedisHandler
could really inherit from logging.FileHandler
to satisfy type checking, but its constructor seems to require a path to a file, which seems odd in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is OK to me since the base class only really needs to inner handler to be a logging.Handler
. It could be moved to a separate PR but since this is where the change becomes necessary (instead of just a generally good thing to do) it’s fine to be included here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this specific change is not tightly coupled to this PR I would suggest to do it in a separated PR.
this is because utils/log goes to core release cycle while the other files goes to provider release cycle.
@michalc Is it possible to separate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michalc can you check my above comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eladkal Sounds reasonable to separate to another PR. Will do that now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d069378
to
7707b2c
Compare
a281203
to
807a638
Compare
2e43225
to
21f5a7d
Compare
Cool. Running CI... lets wait to make sure its green |
FileTaskHandler is the base class for logging handlers, including those that don't log to files via delegating to logging.FileHandler, e.g. in the CloudwatchTaskHandler at https://github.com/apache/airflow/blob/2940b9fa55a6a72c60c2162e541631addec3d6b8/airflow/providers/amazon/aws/log/cloudwatch_task_handler.py#L67 It is suspected that type checking is not enabled in this part of the CloudwatchTaskHandler otherwise it would have already been failing. This change adjusts the base class so if type checking is enabled in the task handler, if it delegates to a logging.Handler that is not a logging.FileHandler as the CloudWatchHandler, then the type checking should pass. This was originally part of apache#31855 and split out. related: apache#31834
This stores log lines in Redis up to a configured maximum log lines, always keeping the most recent, up to a configured TTL. This deviates from other existing task handlers in that it accepts a connection ID. This allows it to be used in addition to other handlers, and so allows a graceful/reversible transition from one logging system to another. This is particularly useful in situations that use Redis as a message broker, where additional infrastructure isn't desired. closes: apache#31834
21f5a7d
to
9ca7921
Compare
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
FileTaskHandler is the base class for logging handlers, including those that don't log to files via delegating to logging.FileHandler, e.g. in the CloudwatchTaskHandler at https://github.com/apache/airflow/blob/2940b9fa55a6a72c60c2162e541631addec3d6b8/airflow/providers/amazon/aws/log/cloudwatch_task_handler.py#L67 It is suspected that type checking is not enabled in this part of the CloudwatchTaskHandler otherwise it would have already been failing. This change adjusts the base class so if type checking is enabled in the task handler, if it delegates to a logging.Handler that is not a logging.FileHandler as the CloudWatchHandler, then the type checking should pass. This was originally part of apache#31855 and split out. related: apache#31834
* Allow FileTaskHandler to delegate to instances of logging.Handler FileTaskHandler is the base class for logging handlers, including those that don't log to files via delegating to logging.FileHandler, e.g. in the CloudwatchTaskHandler at https://github.com/apache/airflow/blob/2940b9fa55a6a72c60c2162e541631addec3d6b8/airflow/providers/amazon/aws/log/cloudwatch_task_handler.py#L67 It is suspected that type checking is not enabled in this part of the CloudwatchTaskHandler otherwise it would have already been failing. This change adjusts the base class so if type checking is enabled in the task handler, if it delegates to a logging.Handler that is not a logging.FileHandler as the CloudWatchHandler, then the type checking should pass. This was originally part of apache/airflow#31855 and split out. related: apache/airflow#31834 * Add Redis task handler This stores log lines in Redis up to a configured maximum log lines, always keeping the most recent, up to a configured TTL. This deviates from other existing task handlers in that it accepts a connection ID. This allows it to be used in addition to other handlers, and so allows a graceful/reversible transition from one logging system to another. This is particularly useful in situations that use Redis as a message broker, where additional infrastructure isn't desired. closes: apache/airflow#31834 GitOrigin-RevId: 42b4b43c4c2ccf0b6e7eaa105c982df495768d01
* Allow FileTaskHandler to delegate to instances of logging.Handler FileTaskHandler is the base class for logging handlers, including those that don't log to files via delegating to logging.FileHandler, e.g. in the CloudwatchTaskHandler at https://github.com/apache/airflow/blob/2940b9fa55a6a72c60c2162e541631addec3d6b8/airflow/providers/amazon/aws/log/cloudwatch_task_handler.py#L67 It is suspected that type checking is not enabled in this part of the CloudwatchTaskHandler otherwise it would have already been failing. This change adjusts the base class so if type checking is enabled in the task handler, if it delegates to a logging.Handler that is not a logging.FileHandler as the CloudWatchHandler, then the type checking should pass. This was originally part of apache/airflow#31855 and split out. related: apache/airflow#31834 * Add Redis task handler This stores log lines in Redis up to a configured maximum log lines, always keeping the most recent, up to a configured TTL. This deviates from other existing task handlers in that it accepts a connection ID. This allows it to be used in addition to other handlers, and so allows a graceful/reversible transition from one logging system to another. This is particularly useful in situations that use Redis as a message broker, where additional infrastructure isn't desired. closes: apache/airflow#31834 GitOrigin-RevId: 42b4b43c4c2ccf0b6e7eaa105c982df495768d01
This stores log lines in Redis up to a configured maximum log lines, always keeping the most recent, up to a configured TTL.
This deviates from other existing task handlers in that it accepts a connection ID. This allows it to be used in addition to other handlers, and so allows a graceful/reversible transition from one logging system to another.
This is particularly useful in situations that use Redis as a message broker, where additional infrastructure isn't desired.
closes: #31834