-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Elasticsearch to 8 #33135
Upgrade Elasticsearch to 8 #33135
Changes from 4 commits
5ec5a22
5389096
5903216
2aca004
19bca4c
54119de
625cc31
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,7 +30,7 @@ | |
# Using `from elasticsearch import *` would break elasticsearch mocking used in unit test. | ||
import elasticsearch | ||
import pendulum | ||
from elasticsearch.exceptions import ElasticsearchException, NotFoundError | ||
from elasticsearch.exceptions import NotFoundError | ||
|
||
from airflow.configuration import conf | ||
from airflow.exceptions import AirflowProviderDeprecationWarning | ||
|
@@ -89,7 +89,7 @@ def __init__( | |
json_fields: str, | ||
host_field: str = "host", | ||
offset_field: str = "offset", | ||
host: str = "localhost:9200", | ||
host: str = "http://localhost:9200", | ||
frontend: str = "localhost:5601", | ||
index_patterns: str | None = conf.get("elasticsearch", "index_patterns", fallback="_all"), | ||
es_kwargs: dict | None = conf.getsection("elasticsearch_configs"), | ||
|
@@ -101,8 +101,8 @@ def __init__( | |
super().__init__(base_log_folder, filename_template) | ||
self.closed = False | ||
|
||
self.client = elasticsearch.Elasticsearch(host.split(";"), **es_kwargs) # type: ignore[attr-defined] | ||
|
||
self.client = elasticsearch.Elasticsearch(host, **es_kwargs) # type: ignore[attr-defined] | ||
# in airflow.cfg, host of elasticsearch has to be http://dockerhostXxxx:9200 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. May I know what error do we see if the protocol is not included in the set value? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
if USE_PER_RUN_LOG_ID and log_id_template is not None: | ||
warnings.warn( | ||
"Passing log_id_template to ElasticsearchTaskHandler is deprecated and has no effect", | ||
|
@@ -292,27 +292,24 @@ def es_read(self, log_id: str, offset: int | str, metadata: dict) -> list | Elas | |
} | ||
|
||
try: | ||
max_log_line = self.client.count(index=self.index_patterns, body=query)["count"] | ||
max_log_line = self.client.count(index=self.index_patterns, body=query)["count"] # type: ignore | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we have here a type ignore? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So - if we look at the official ES package, the But the body parameter is still accepted because there's a decorator at the beginning, which modifies the function to accept Therefore, without type ignore, the pre-commit job |
||
except NotFoundError as e: | ||
self.log.exception("The target index pattern %s does not exist", self.index_patterns) | ||
raise e | ||
except ElasticsearchException as e: | ||
self.log.exception("Could not get current log size with log_id: %s", log_id) | ||
raise e | ||
|
||
logs: list[Any] | ElasticSearchResponse = [] | ||
if max_log_line != 0: | ||
try: | ||
query.update({"sort": [self.offset_field]}) | ||
res = self.client.search( | ||
res = self.client.search( # type: ignore | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same question regarding type ignore |
||
index=self.index_patterns, | ||
body=query, | ||
size=self.MAX_LINE_PER_PAGE, | ||
from_=self.MAX_LINE_PER_PAGE * self.PAGE, | ||
) | ||
logs = ElasticSearchResponse(self, res) | ||
except elasticsearch.exceptions.ElasticsearchException: | ||
self.log.exception("Could not read log with log_id: %s", log_id) | ||
except Exception as err: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cannot we not narrow down the exception we catch? Is the previous exception no longer present? If so, have they added any other similar class and can we use that? Having such a broad level exception catch and not re-raising it might lead to some silent failures. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes the exception https://github.com/elastic/elasticsearch-py/blob/main/elasticsearch/exceptions.py And I feel like all those errors can occur when calling the ES API. So maybe we should raise the exception after logging to the error log ? |
||
self.log.exception("Could not read log with log_id: %s. Exception: %s", log_id, err) | ||
|
||
return logs | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In elasticsearch 7,
use_ssl
is an accepted parameter when constructingElasticSearch
client. See the following source code :https://github.com/elastic/elasticsearch-py/blob/7.14/elasticsearch/client/__init__.py#L113
However, in elasticsearch 8, it no longer accepts
use_ssl
parameter. See the following source code:https://github.com/elastic/elasticsearch-py/blob/8.9/elasticsearch/_sync/client/__init__.py#L129
Therefore to make the testsuite compile with the ES8 , I use
http_compress
as the argument (which is one of the accepted arguments for constructing ES client