Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pre-commit hook to validate provider spec #33608

Closed
wants to merge 2 commits into from

Conversation

uranusjr
Copy link
Member

Inspired by #33601. This adds a pre-commit hook to ensure the provider.yaml specifications are valid when its surrounding provider changes. Two errors are found and fixed.

Since this obviously also needs to change the Pinot provider, this will unfortunately cause merge conflicts when either this or the other PR is merged. We’ll just need to resolve it when it happens.

The `provider.yaml` for `airflow.providers.apache.pinot` provider was referencing a non-existing hook class of `PinotHook`, causing a warning during `providers_manager`'s sanity check. This corrects it to list both hooks of this provider.
Also fix errors found by the new check.
@uranusjr uranusjr force-pushed the check-provider-references branch from 63c6f61 to 5389595 Compare August 22, 2023 10:32
@uranusjr uranusjr changed the title Referencing correct hooks for Apache Pinot Add pre-commit hook to validate provider spec Aug 22, 2023
@potiuk
Copy link
Member

potiuk commented Aug 22, 2023

Hmm ... we already have pre_commit_check_provider_yaml_files.py which does a lot more check on various provider classes and even verifies whether all the classes and packages specified are properly named, imports them etc. etc.

Basically what it does is that it runs

python3 /opt/airflow/scripts/in_container/run_provider_yaml_files_check.py inside breeze and it runs a series of checks there:

    check_integration_duplicates(all_parsed_yaml_files)
    check_duplicates_in_integrations_names_of_hooks_sensors_operators(all_parsed_yaml_files)
    check_completeness_of_list_of_transfers(all_parsed_yaml_files)
    check_duplicates_in_list_of_transfers(all_parsed_yaml_files)
    check_hook_connection_classes(all_parsed_yaml_files)
    check_plugin_classes(all_parsed_yaml_files)
    check_extra_link_classes(all_parsed_yaml_files)
    check_correctness_of_list_of_sensors_operators_hook_modules(all_parsed_yaml_files)
    check_unique_provider_name(all_parsed_yaml_files)
    check_providers_have_all_documentation_files(all_parsed_yaml_files)

So I'd say either the check should be added there, or maybe some of the existing check should be improved there.

@potiuk
Copy link
Member

potiuk commented Aug 22, 2023

Or if you prefer to reimplement the checks in a simpler way - maybe - then it's worth to replace the other checks with new ones.

BTW. The thing why it uses breeze is that it provides the consistent set of libraries and system level packages and imports that are always the same for everyone - so you can consistently run all kinds of imports and they are guaranteed to work as breeze has all the dependencies installed already.

@uranusjr
Copy link
Member Author

I wonder if that existing hook already covers the same topics but is not currently triggered correctly.

@potiuk
Copy link
Member

potiuk commented Aug 22, 2023

I wonder if that existing hook already covers the same topics but is not currently triggered correctly.

It should be triggred - especially that we are running it with --all-files in main. So I guess this must be either missing or buggy check there.

@uranusjr
Copy link
Member Author

But we don’t run it with --all-files in CI, which causes the CI to be incorrectly green in PRs changing relevant files, and stay green in main after those changes are merged.

@potiuk
Copy link
Member

potiuk commented Aug 23, 2023

But we don’t run it with --all-files in CI, which causes the CI to be incorrectly green in PRs changing relevant files, and stay green in main after those changes are merged.

We do. All the runs in CI are alwas with --all-files for all PRs.

https://github.com/apache/airflow/blob/main/.github/workflows/ci.yml#L537

@potiuk
Copy link
Member

potiuk commented Aug 23, 2023

The problem is that we do not check deprecation warnigs there. Both

  • airflow.providers.apache.pinot.hooks.pinot.PinotHook
  • airflow.providers.slack.notifications.slack_notifier.SlackNotifier

were importable with deprecation warning - for backwards compatibility - but in the checks we have we have not checkded if deprecation warnings are raised for them.

@potiuk
Copy link
Member

potiuk commented Aug 23, 2023

And we missed notifier checks in there.

@potiuk
Copy link
Member

potiuk commented Aug 23, 2023

Improve provider verification script here: #33640

There were quite a few other errors detected by checking Deprecated warnings.

@potiuk potiuk closed this Aug 23, 2023
potiuk added a commit to potiuk/airflow that referenced this pull request Aug 23, 2023
The Slack Notification listed in provider.yaml for slack has
been still using the deprecated version of it. Since the
provider verification of ours did not check neither notifications
nor warnings, it was not found before.

Found in the attempt of adding new verification in apache#33608 then
also detected by apache#33640 that improved existing verification.
potiuk added a commit that referenced this pull request Aug 23, 2023
…33643)

The Slack Notification listed in provider.yaml for slack has
been still using the deprecated version of it. Since the
provider verification of ours did not check neither notifications
nor warnings, it was not found before.

Found in the attempt of adding new verification in #33608 then
also detected by #33640 that improved existing verification.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants