Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Referencing correct hooks for Apache Pinot #33601

Merged
merged 2 commits into from
Aug 23, 2023

Conversation

alexbegg
Copy link
Contributor

The provider.yaml for airflow.providers.apache.pinot provider was referencing a non-existing hook class of PinotHook, causing a warning during providers_manager's sanity check. This corrects it to list both hooks of this provider.

closes #33596


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@Taragolis
Copy link
Contributor

@alexbegg thanks for contribution to Pinot Provider. Could you also add missing class attributes to PinotDbApiHook and PinotAdminHook

  • conn_name_attr: name of attribute which use for connection_id
  • default_conn_name: default connection_id
  • conn_type: type of connection from provider.yaml
  • hook_name: Name in Connection dropdown list

Examples:

conn_name_attr = "postgres_conn_id"
default_conn_name = "postgres_default"
conn_type = "postgres"
hook_name = "Postgres"

conn_name_attr = "gcp_conn_id"
default_conn_name = "google_cloud_default"
conn_type = "google_cloud_platform"
hook_name = "Google Cloud"

@alexbegg
Copy link
Contributor Author

alexbegg commented Aug 23, 2023

I have added those missing args, however I noticed some oddities with this provider:

  • The example DAG is seeming to confuse the use of an Operator and of a Hook. If you see in this example, a task_id is defined for a hook, which is not correct (this argument will just be ignored). The hook's params are used almost as if they were operator params...
    @task
    def pinot_dbi_api():
    PinotDbApiHook(
    task_id="run_example_pinot_script",
    pinot="ls /;",
    pinot_options="-x local",
    )
  • The "Operators" documentation page for the provider, https://airflow.apache.org/docs/apache-airflow-providers-apache-pinot/stable/operators.html, shows how to use the hooks in TaskFlow-decorated task functions, instead of actual operators as the page name would suggest.
  • In the above example code, initializing the hook class is used to execute a command, however shouldn't the hook just be used to initialize the class and then methods of that hook are used to execute commands? If that is done then we can even add in operators that make use of that hook's method.

Should this provider get actual operators, to avoid this confusion?

@potiuk
Copy link
Member

potiuk commented Aug 23, 2023

Should this provider get actual operators, to avoid this confusion?

The DBAPI- derived Hooks in provider are used by SQL*Operator family from Common SQL - that's why there is no "PinotSQL" operator - you just use common.sql SQL* operators and specify Pinot connection, the Hook is then created automatically and used by them

Maybe worth-while if someone (You ?) can make it clearer in the documentation if it is not clear enough.

The `provider.yaml` for `airflow.providers.apache.pinot` provider was referencing a non-existing hook class of `PinotHook`, causing a warning during `providers_manager`'s sanity check. This corrects it to list both hooks of this provider.
@alexbegg
Copy link
Contributor Author

alexbegg commented Aug 23, 2023

I am not a regular user of Pinot so I am probably not the best to improve documentation for it. I attempted, but then stopped because of my unfamiliarity. I believe if the main point of this PR is to fix this bug then it will be good to go as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Apache Pinot provider.yaml references missing PinotHook class
3 participants