Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust test for Airflow-2.9 #2149

Merged
merged 8 commits into from
May 16, 2024
Merged

Adjust test for Airflow-2.9 #2149

merged 8 commits into from
May 16, 2024

Conversation

pankajastro
Copy link
Contributor

@pankajastro pankajastro commented May 3, 2024

  • pin google-cloud-bigquery
  • Remove conditional query building for different Airflow version
  • Adjust dataset and cleanoperator test
  • clean checkoperator test

@pankajastro pankajastro force-pushed the fix-af-2.9-test branch 4 times, most recently from 87a8dbb to 650957f Compare May 3, 2024 12:37
@pankajastro pankajastro changed the title Fix AF-2.9 Test Adjust test for Airflow-2.9 May 3, 2024
[
(SequentialExecutor(), "LocalExecutor", True),
(LocalExecutor(), "LocalExecutor", False),
(None, "LocalExecutor", False),
(None, "SequentialExecutor", True),
(LocalExecutor(), "LocalExecutor", False),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of None default executor getting assigned

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think line 111 and 112 are duplicate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know which executor was getting used when None was passed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked a bit deeper into the issue and discovered that it's not actually the default value causing the problem, but rather caching.

We eventually call https://github.com/apache/airflow/blob/main/airflow/executors/executor_loader.py#L90 when we are doing job.executor in _get_executor_from_job_id which first time load the value from next call it returns value from https://github.com/apache/airflow/blob/main/airflow/executors/executor_loader.py#L98 in case we have none value for executor_in_job. This behavior leads to the last test consistently failing. For instance,
if test sequence is

(None, "LocalExecutor", False),
(None, "SequentialExecutor", True),

(None, "SequentialExecutor", True) fail

for sequence

(None, "SequentialExecutor", True),
(None, "LocalExecutor", False),

(None, "LocalExecutor", False) fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the code PTAL commit: e3d3239

@pankajastro
Copy link
Contributor Author

pre-commit failure is looking like because of recent pre-commit hook update will handle that in separate PR

@@ -57,7 +57,8 @@ google = [
"protobuf",
"apache-airflow-providers-google>=10.15.0",
"sqlalchemy-bigquery>=1.3.0",
"smart-open[gcs]>=5.2.1,<7.0.0"
"smart-open[gcs]>=5.2.1,<7.0.0",
"google-cloud-bigquery<3.21.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like recent optimization in biquery causing 404 error for bigquery job. will create an issue with finding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe PR googleapis/python-bigquery#1900 but need bit more debugging

Copy link

codecov bot commented May 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.12%. Comparing base (162b0ab) to head (aea0ff6).
Report is 10 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2149      +/-   ##
==========================================
+ Coverage   89.75%   91.12%   +1.37%     
==========================================
  Files          75       75              
  Lines        4322     4316       -6     
  Branches      539      538       -1     
==========================================
+ Hits         3879     3933      +54     
+ Misses        343      287      -56     
+ Partials      100       96       -4     
Flag Coverage Δ
PythonSDK 91.12% <100.00%> (+1.37%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -177,8 +177,8 @@ def test_single_worker_mode_backfill_airflow_2_5(executor_in_job, executor_in_cf
[
(SequentialExecutor(), "LocalExecutor", True),
(LocalExecutor(), "LocalExecutor", False),
(None, "LocalExecutor", False),
(None, "SequentialExecutor", True),
(LocalExecutor(), "LocalExecutor", False),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate tests, line 179 and 180.

@@ -22,7 +21,6 @@
{
"database": Database.BIGQUERY,
"file": File(path=str(CWD) + "/../../../data/homes_main.csv"),
"table": Table(conn_id="gcp_conn_project"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we don't need it now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This connection was only utilized in these two tests, which appear to be unnecessary. However, I'm unsure why we included it.

Comment on lines 227 to 228
if job.executor_class is None and job.executor:
return type(job.executor).__name__
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using job.executor in this context appears incorrect because it will not only retrieve the executor from the job but also consider the executor conf.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should update our test case instead.

@kaxil
Copy link
Collaborator

kaxil commented May 14, 2024

Why this in review state since 5 days?

@pankajastro
Copy link
Contributor Author

Why this in review state since 5 days?

Will connect with the codeowners to conclude on this.

@pankajastro
Copy link
Contributor Author

some tests are failing because of Databricks related infra change

@pankajastro
Copy link
Contributor Author

some tests are failing because of Databricks related infra change

Fixed in PR #2161 and #2159

@pankajastro pankajastro merged commit 149dd9e into main May 16, 2024
31 of 33 checks passed
@pankajastro pankajastro deleted the fix-af-2.9-test branch May 16, 2024 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants