-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(GAQ): Add Redis Sentinel Support for Global Async Queries #29912
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #29912 +/- ##
===========================================
+ Coverage 60.48% 83.64% +23.15%
===========================================
Files 1931 529 -1402
Lines 76236 38268 -37968
Branches 8568 0 -8568
===========================================
- Hits 46114 32009 -14105
+ Misses 28017 6259 -21758
+ Partials 2105 0 -2105
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
HI @villebro , Could you please review this PR? It was created based on the idea discussed #28839 However, I recently noticed that there is a new SIP in the design phase to modernize the framework around the same topic: #29839 . I'm not sure if this PR will still add value before the new proposal is rolled out, but I would appreciate your feedback. |
@nsivarajan thanks for the PR - I think this PR is welcomed even if there's ongoing work to modernize the async task framework. Looks generally good, but let me do a proper review shortly (either today or tomorrow) |
@villebro Thanks for the feedback! Looking forward to your review. |
Hi @villebro, I wanted to follow up on this PR. I understand you're busy, but I was hoping to get your review when you have a moment. Please let me know if there's any additional information I can provide or if there are any specific areas you'd like me to address. Thanks again for your time and feedback! |
@nsivarajan thank you for your patience in the meantime... good to see that CI is passing and there's no conflicts still, at least! :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much @nsivarajan for this beautiful PR (the test coverage here is great)! Sorry for the long review time - there was a lot to take in, and I may need to do a new round after I get some feedback on these comments. Please let me know what you think, and I'll do my best to do a second (and hopefully final) pass shortly.
@@ -16,18 +16,27 @@ | |||
# under the License. | |||
import logging | |||
import uuid | |||
from typing import Any, Literal, Optional | |||
from typing import Any, Dict, Literal, Optional, Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can now use dict
and |
in type annotations as long as we do the following import at the beginning:
from __future__ iimport annotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing that out. I'll make sure to update the code accordingly.
# Decode bytes to strings, decode_responses is not supported at RedisCache and RedisSentinelCache | ||
if isinstance(self._cache, (RedisSentinelCacheBackend, RedisCacheBackend)): | ||
decoded_results = [ | ||
( | ||
event_id.decode("utf-8"), | ||
{ | ||
key.decode("utf-8"): value.decode("utf-8") | ||
for key, value in event_data.items() | ||
}, | ||
) | ||
for event_id, event_data in results | ||
] | ||
return ( | ||
[] if not decoded_results else list(map(parse_event, decoded_results)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this part was mentioned in the description. Can you elaborate on the need for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RedisCache and RedisSentinelCache typically don't support built-in decoding like decode_responses. They return data in bytes, which needs to be manually decoded into the desired formats (e.g., strings).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I hadn't noticed we were using redis.Redis
, I thought we were using RedisCache
from flask_caching
here. Thanks for clarifying.
# Define the cache backends once as mocks | ||
cache_backends = { | ||
"RedisCacheBackend": mock.Mock(spec=RedisCacheBackend), | ||
"RedisSentinelCacheBackend": mock.Mock(spec=RedisSentinelCacheBackend), | ||
"redis.Redis": mock.Mock(spec=redis.Redis), | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this loop be replaced by pytest.mark.parametrize
? It feels more idiomatic (unless I'm missing some reason why it needs to be implemented like this..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a great suggestion. I'll update the code with pytest.mark.parametrize
and commit the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using pytest.mark.parametrize
with mock
from unittest
caused compatibility issues. Switching to parameterized.expand
resolved these issues and successfully covered the test cases, leading to a successful CI run. I’ll revisit this case later for further investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nsivarajan I believe it's generally suggested to use pytest_mock
with pytest
test cases. I don't want to drag out this review process more than necessary, but I think the cleanest solution would be one with pytest.mark.parametrize
and pytest_mock
. LMKWYT, but in the meantime let me re-review the functional parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @villebro , for the review. I agree that using the standard approach is ideal. However, I've encountered continuous failures when combining pytest.mark.parametrize
with mock
, which suggests there might be compatibility issues. After some research, I found references that indicate potential conflicts when using pytest features with unittest.TestCase subclasses and mock (see pytest documentation, SeleniumBase issue, and pytest issue).
I noticed that our repository already uses parameterized.expand
in similar cases, so I opted for it here to ensure the tests run smoothly.
Would it be possible to proceed with this PR using parameterized.expand
? I'll continue investigating the issues with pytest.mark.parametrize
and mock
and plan to submit a follow-up PR if I find a viable solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting in the extra effort on these tests @nsivarajan ! Let's not let this derail your PR - I can also take a stab at migrating the remaining components to pytest
(seems like a fun coffee break challenge).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor observations during second review round
# Define the cache backends once | ||
cache_backends = { | ||
"RedisCacheBackend": mock.Mock(spec=RedisCacheBackend), | ||
"RedisSentinelCacheBackend": mock.Mock(spec=RedisSentinelCacheBackend), | ||
"redis.Redis": mock.Mock(spec=redis.Redis), | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe these are now redundant as we have the parametrized tests?
# Define the cache backends once | |
cache_backends = { | |
"RedisCacheBackend": mock.Mock(spec=RedisCacheBackend), | |
"RedisSentinelCacheBackend": mock.Mock(spec=RedisSentinelCacheBackend), | |
"redis.Redis": mock.Mock(spec=redis.Redis), | |
} | |
# Define the cache backends once | |
cache_backends = { | |
"RedisCacheBackend": mock.Mock(spec=RedisCacheBackend), | |
"RedisSentinelCacheBackend": mock.Mock(spec=RedisSentinelCacheBackend), | |
"redis.Redis": mock.Mock(spec=redis.Redis), | |
} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll get this removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handled in last commit
# Define the cache backends once as mocks | ||
# cache_backends = { | ||
# "RedisCacheBackend": mock.Mock(spec=RedisCacheBackend), | ||
# "RedisSentinelCacheBackend": mock.Mock(spec=RedisSentinelCacheBackend), | ||
# "redis.Redis": mock.Mock(spec=redis.Redis), | ||
# } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here:
# Define the cache backends once as mocks | |
# cache_backends = { | |
# "RedisCacheBackend": mock.Mock(spec=RedisCacheBackend), | |
# "RedisSentinelCacheBackend": mock.Mock(spec=RedisSentinelCacheBackend), | |
# "redis.Redis": mock.Mock(spec=redis.Redis), | |
# } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll get this removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in last commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for all the hard work here! I've left a few extra type annotation tips, but they're not blocking (we've already done enough iterating here). I'll add a new task to our 5.0 project board to remove the old config. If you feel up for it, please open a PR with the change that removes the old config option so we can merge it when the 5.0 breaking window is opened up.
|
||
def get_cache_backend( | ||
config: dict[str, Any], | ||
) -> Union[RedisCacheBackend, RedisSentinelCacheBackend, redis.Redis]: # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the new type annotations support this simpler format:
) -> Union[RedisCacheBackend, RedisSentinelCacheBackend, redis.Redis]: # type: ignore | |
) -> RedisCacheBackend | RedisSentinelCacheBackend | redis.Redis: # type: ignore |
@@ -73,7 +103,7 @@ class AsyncQueryManager: | |||
|
|||
def __init__(self) -> None: | |||
super().__init__() | |||
self._redis: redis.Redis # type: ignore | |||
self._cache: Optional[BaseCache] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here:
self._cache: Optional[BaseCache] = None | |
self._cache: BaseCache | None = None |
Thanks, @villebro, for the feedback and type annotation tips! I appreciate your guidance throughout this process. I'll go ahead and open a new PR to remove the old config option and will reference the task once it's created on the project board, so it’ll be ready when the 5.0 breaking window opens up. Looking forward to wrapping this up and moving on to the next steps! |
Thanks @nsivarajan, I really appreciate the help! Here's the project card: https://github.com/orgs/apache/projects/345?pane=issue&itemId=78081192 |
SUMMARY
This PR introduces a feature that allows the Async Query Manager to use a configured cache backend through the new GLOBAL_ASYNC_QUERIES_CACHE_BACKEND setting. To maintain backward compatibility, the existing GLOBAL_ASYNC_QUERIES_REDIS_CONFIG setting is still supported but will be deprecated in the future. Additionally, this update introduces support for Redis Sentinel caching, which eliminates the single point of failure associated with the previous standalone Redis configuration.
References taken from this Idea #28839
TESTING INSTRUCTIONS
Configure GLOBAL_ASYNC_QUERIES_CACHE_BACKEND in config.py or superset_config.py with the appropriate properties:
Set CACHE_TYPE to "RedisCache" for Redis cache backend.
Set CACHE_TYPE to "RedisSentinelCache" for Redis Sentinel cache backend.
Set CACHE_TYPE to None to fall back on the previous GLOBAL_ASYNC_QUERIES_REDIS_CONFIG.
ADDITIONAL INFORMATION