Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support not publicly trusted certificates in built-in component catalog connectors #2912

Merged

Conversation

ptitzler
Copy link
Member

@ptitzler ptitzler commented Aug 30, 2022

This PR enables administrators/users to utilize the HTTP-based built-in catalog connectors (URL catalog, Apache Airflow package connector, Apache Airflow provider package connector) in in secured environments where SSL server authenticity can only be validated using certificates based on private public key infrastructure (PKI) with root and optionally intermediate certificate authorities (CAs) that are not publicly trusted. To accomplish the goal a new environment variable TRUSTED_CA_BUNDLE_PATH is introduced, which the built-in catalog connectors utilize as input. See #2797 for motivation.

  • Other URL-based connectors such as the Artifactory connector and the MLX connector can be enabled in the future by adding https://github.com/elyra-ai/elyra/pull/2912/files#diff-c1e598e1c49c4e714cfc557209f7b1a2ac60d0e5c18e2a0e6528a1bdda1f805dR36 and https://github.com/elyra-ai/elyra/pull/2912/files#diff-c1e598e1c49c4e714cfc557209f7b1a2ac60d0e5c18e2a0e6528a1bdda1f805dR115.
  • This PR intentionally does not impose how the environment variable is defined because the best approach depends on how JupyterLab/Elyra is deployed.
  • This PR does not support defining custom certificate locations for individual catalog connector instances (meaning, for example, connector instance A uses one certificate and connector instance B uses another certificate). This could be accomplished by extending the catalog connectors to include an additional property, but would result in more work for users because the certificate location would have to be entered manually for every catalog connector instance.
  • Even though the get_verify_parm method in elyra/util/url.py currently only utilizes a proprietary environment variable TRUSTED_CA_BUNDLE_PATH to obtain the local filesystem location for certificates, the intention is to perhaps extend this in the future to also take into account other inputs that might already be defined. A hypothetical example is a JupyterLab configuration setting that might serve a similar purpose.

Closes #2797

What changes were proposed in this pull request?

  • See above
  • Updated the connector documentation in the 'pipeline components' topic in the user guide.

How was this pull request tested?

  • Added tests for new utility method (pytest -v elyra/tests/util/test_url.py)
  • Reviewed the output of make docs

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

@ptitzler ptitzler added kind:enhancement New feature or request status:Work in Progress Development in progress. A PR tagged with this label is not review ready unless stated otherwise. component:catalog connectors Access to component catalogs labels Aug 30, 2022
@elyra-bot
Copy link

elyra-bot bot commented Aug 30, 2022

Thanks for making a pull request to Elyra!

To try out this branch on binder, follow this link: Binder

@ptitzler ptitzler added this to the 3.12.0 milestone Aug 30, 2022
@ptitzler
Copy link
Member Author

@shalberd wdyt?

@shalberd
Copy link
Contributor

I think this is looking very good, especially with regards to the assumptions.
"This PR does not support defining custom certificate locations for individual catalog connector instances (meaning, for example, connector instance A uses one certificate and connector instance B uses another certificate"

--> (based on trusting separate CAs). Absolutely OK, if organizations have private public key infrastructure (PKI), even two different connectors could use instance A with a certain CA and instance B with another CA. What's more, I tested this with a .crt file in PEM format that not only had one root CA followed by the intermediate CA, but beforehand and afterwards containing many more CAs and intermediate CAs, and the request worked fine. The only thing that was important was that somewhere in the pem file, a root CA followed by an (optional) intermediate CA was present. So even with trust based on multiple root CAs, this approach you propose here works fine and establishes the trust chain. I also like the fact that in your helper function, you simply use Verify=True if the env variable is not present.

regarding wording " in secured environments where authenticity can only be validated using private certificates." rather
"in secured environments where SSL server authenticity can only be validated using certificates based on private public key infrastructure (PKI) with root and optionally intermediate certificate authorities (CAs) that are not publicly trusted."

"This PR intentionally does not impose how the environment variable is defined because the best approach depends on how JupyterLab/Elyra is deployed."

Agreed, the calling software that embeds Elyra defines how to bring the env-variable into the system.

I'd call the env variable TRUSTED_CA_BUNDLE_PATH, but that is nitpicky maybe. I just says that the issue being handled here is optional trust of certificate authorities.

@lresende
Copy link
Member

Is this going to be used only for catalogs? if so, should we use an env var name that indicates that?

NOT A CONTRIBUTION

elyra/util/url.py Outdated Show resolved Hide resolved
@shalberd
Copy link
Contributor

shalberd commented Aug 30, 2022

@lresende well, here, it is used for catalog download urls, but the env variable is not about catalogs in and of themselves, but instead about explicit mention of trusted CA bundles as they are often used in airgapped environments, where public trust of SSL CAs is not important and instead private PKIs are used for issuing SSL certificates.

Thus my suggestion

TRUSTED_CA_BUNDLE_PATH

original reason was

https://docs.openshift.com/container-platform/4.8/networking/configuring-a-custom-pki.html

centrally defined CAs and intermediate CAs injected into a configmap in a namespace

opendatahub-io-contrib/jupyterhub-odh#137 (comment)

that file in the configmap is then mapped into the location we define with this env.

@ptitzler
Copy link
Member Author

[...] it is used for catalog download urls, but the env variable is not about catalogs in and of themselves, but instead about explicit mention of trusted CA bundles [...]

I agree. If I was 100% sure that locally stored trusted CA bundles will only be used in the context of catalog connectors in the future, I probably would have favored an implementation that adds an optional property to the connector configurations to avoid the need to introduce a proprietary environment variable.

@ptitzler ptitzler changed the title Support custom ssl certificates in component catalog connectors Support not publicly trusted certificates in built-in component catalog connectors Aug 31, 2022
@ptitzler ptitzler removed the status:Work in Progress Development in progress. A PR tagged with this label is not review ready unless stated otherwise. label Aug 31, 2022
@ptitzler
Copy link
Member Author

@shalberd can you please review the doc updates I've just added? There are the same for all three connectors, e.g.
https://github.com/elyra-ai/elyra/blob/378f6d5caa0d493919c042e01295b840e8335cac/docs/source/user_guide/pipeline-components.md#url-component-catalog

@shalberd
Copy link
Contributor

Hi @ptitzler the section regarding CA certificates in the documentation is cool that way, yes. Both to the point and technically accurate.

@shalberd
Copy link
Contributor

shalberd commented Sep 1, 2022

I am in discussions with jupyterhub-odh (open data hub) team to get an overlay to the Kustomize / KfDef manifests that allows for definition of the path and adding of that env variable in spawned containers, as well as adding the auto-inject configmap containing them pem file into the namespace. Have already opened a pull request in odh-manifests.

Copy link
Member

@kiersten-stokes kiersten-stokes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ptitzler ptitzler merged commit c77c2f7 into elyra-ai:main Sep 12, 2022
@ptitzler ptitzler deleted the support-custom-ssl-certificate-in-connectors branch September 12, 2022 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:catalog connectors Access to component catalogs kind:enhancement New feature or request
Projects
None yet
4 participants