-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Elasticsearch to 8 #33135
Upgrade Elasticsearch to 8 #33135
Changes from all commits
5ec5a22
5389096
5903216
2aca004
19bca4c
54119de
625cc31
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,7 +30,7 @@ | |
# Using `from elasticsearch import *` would break elasticsearch mocking used in unit test. | ||
import elasticsearch | ||
import pendulum | ||
from elasticsearch.exceptions import ElasticsearchException, NotFoundError | ||
from elasticsearch.exceptions import NotFoundError | ||
|
||
from airflow.configuration import conf | ||
from airflow.exceptions import AirflowProviderDeprecationWarning | ||
|
@@ -89,7 +89,7 @@ def __init__( | |
json_fields: str, | ||
host_field: str = "host", | ||
offset_field: str = "offset", | ||
host: str = "localhost:9200", | ||
host: str = "http://localhost:9200", | ||
frontend: str = "localhost:5601", | ||
index_patterns: str | None = conf.get("elasticsearch", "index_patterns", fallback="_all"), | ||
es_kwargs: dict | None = conf.getsection("elasticsearch_configs"), | ||
|
@@ -101,8 +101,8 @@ def __init__( | |
super().__init__(base_log_folder, filename_template) | ||
self.closed = False | ||
|
||
self.client = elasticsearch.Elasticsearch(host.split(";"), **es_kwargs) # type: ignore[attr-defined] | ||
|
||
self.client = elasticsearch.Elasticsearch(host, **es_kwargs) # type: ignore[attr-defined] | ||
# in airflow.cfg, host of elasticsearch has to be http://dockerhostXxxx:9200 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. May I know what error do we see if the protocol is not included in the set value? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
if USE_PER_RUN_LOG_ID and log_id_template is not None: | ||
warnings.warn( | ||
"Passing log_id_template to ElasticsearchTaskHandler is deprecated and has no effect", | ||
|
@@ -292,27 +292,24 @@ def es_read(self, log_id: str, offset: int | str, metadata: dict) -> list | Elas | |
} | ||
|
||
try: | ||
max_log_line = self.client.count(index=self.index_patterns, body=query)["count"] | ||
max_log_line = self.client.count(index=self.index_patterns, body=query)["count"] # type: ignore | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we have here a type ignore? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So - if we look at the official ES package, the But the body parameter is still accepted because there's a decorator at the beginning, which modifies the function to accept Therefore, without type ignore, the pre-commit job |
||
except NotFoundError as e: | ||
self.log.exception("The target index pattern %s does not exist", self.index_patterns) | ||
raise e | ||
except ElasticsearchException as e: | ||
self.log.exception("Could not get current log size with log_id: %s", log_id) | ||
raise e | ||
|
||
logs: list[Any] | ElasticSearchResponse = [] | ||
if max_log_line != 0: | ||
try: | ||
query.update({"sort": [self.offset_field]}) | ||
res = self.client.search( | ||
res = self.client.search( # type: ignore | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same question regarding type ignore |
||
index=self.index_patterns, | ||
body=query, | ||
size=self.MAX_LINE_PER_PAGE, | ||
from_=self.MAX_LINE_PER_PAGE * self.PAGE, | ||
) | ||
logs = ElasticSearchResponse(self, res) | ||
except elasticsearch.exceptions.ElasticsearchException: | ||
self.log.exception("Could not read log with log_id: %s", log_id) | ||
except Exception as err: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cannot we not narrow down the exception we catch? Is the previous exception no longer present? If so, have they added any other similar class and can we use that? Having such a broad level exception catch and not re-raising it might lead to some silent failures. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes the exception https://github.com/elastic/elasticsearch-py/blob/main/elasticsearch/exceptions.py And I feel like all those errors can occur when calling the ES API. So maybe we should raise the exception after logging to the error log ? |
||
self.log.exception("Could not read log with log_id: %s. Exception: %s", log_id, err) | ||
|
||
return logs | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -53,7 +53,7 @@ versions: | |
dependencies: | ||
- apache-airflow>=2.4.0 | ||
- apache-airflow-providers-common-sql>=1.3.1 | ||
- elasticsearch>7,<7.15.0 | ||
- elasticsearch>8,<9 | ||
|
||
integrations: | ||
- integration-name: Elasticsearch | ||
|
@@ -72,3 +72,97 @@ connection-types: | |
|
||
logging: | ||
- airflow.providers.elasticsearch.log.es_task_handler.ElasticsearchTaskHandler | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On top of also moving When we have more of those, we might want to choose to do it automatically but for now we need to add it "manually" |
||
config: | ||
elasticsearch: | ||
description: ~ | ||
options: | ||
host: | ||
description: | | ||
Elasticsearch host | ||
version_added: 1.10.4 | ||
type: string | ||
example: ~ | ||
default: "" | ||
log_id_template: | ||
description: | | ||
Format of the log_id, which is used to query for a given tasks logs | ||
version_added: 1.10.4 | ||
type: string | ||
example: ~ | ||
is_template: true | ||
default: "{dag_id}-{task_id}-{run_id}-{map_index}-{try_number}" | ||
end_of_log_mark: | ||
description: | | ||
Used to mark the end of a log stream for a task | ||
version_added: 1.10.4 | ||
type: string | ||
example: ~ | ||
default: "end_of_log" | ||
frontend: | ||
description: | | ||
Qualified URL for an elasticsearch frontend (like Kibana) with a template argument for log_id | ||
Code will construct log_id using the log_id template from the argument above. | ||
NOTE: scheme will default to https if one is not provided | ||
version_added: 1.10.4 | ||
type: string | ||
example: "http://localhost:5601/app/kibana#/discover\ | ||
?_a=(columns:!(message),query:(language:kuery,query:'log_id: \"{log_id}\"'),sort:!(log.offset,asc))" | ||
default: "" | ||
write_stdout: | ||
description: | | ||
Write the task logs to the stdout of the worker, rather than the default files | ||
version_added: 1.10.4 | ||
type: string | ||
example: ~ | ||
default: "False" | ||
json_format: | ||
description: | | ||
Instead of the default log formatter, write the log lines as JSON | ||
version_added: 1.10.4 | ||
type: string | ||
example: ~ | ||
default: "False" | ||
json_fields: | ||
description: | | ||
Log fields to also attach to the json output, if enabled | ||
version_added: 1.10.4 | ||
type: string | ||
example: ~ | ||
default: "asctime, filename, lineno, levelname, message" | ||
host_field: | ||
description: | | ||
The field where host name is stored (normally either `host` or `host.name`) | ||
version_added: 2.1.1 | ||
type: string | ||
example: ~ | ||
default: "host" | ||
offset_field: | ||
description: | | ||
The field where offset is stored (normally either `offset` or `log.offset`) | ||
version_added: 2.1.1 | ||
type: string | ||
example: ~ | ||
default: "offset" | ||
index_patterns: | ||
description: | | ||
Comma separated list of index patterns to use when searching for logs (default: `_all`). | ||
version_added: 2.6.0 | ||
type: string | ||
example: something-* | ||
default: "_all" | ||
elasticsearch_configs: | ||
description: ~ | ||
options: | ||
http_compress: | ||
description: ~ | ||
version_added: 1.10.5 | ||
type: string | ||
example: ~ | ||
default: "False" | ||
verify_certs: | ||
description: ~ | ||
version_added: 1.10.5 | ||
type: string | ||
example: ~ | ||
default: "True" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
.. Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
.. http://www.apache.org/licenses/LICENSE-2.0 | ||
.. Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
.. include:: ../exts/includes/providers-configurations-ref.rst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In elasticsearch 7,
use_ssl
is an accepted parameter when constructingElasticSearch
client. See the following source code :https://github.com/elastic/elasticsearch-py/blob/7.14/elasticsearch/client/__init__.py#L113
However, in elasticsearch 8, it no longer accepts
use_ssl
parameter. See the following source code:https://github.com/elastic/elasticsearch-py/blob/8.9/elasticsearch/_sync/client/__init__.py#L129
Therefore to make the testsuite compile with the ES8 , I use
http_compress
as the argument (which is one of the accepted arguments for constructing ES client