-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature][fn] Introduce connectors catalogue #58
base: master
Are you sure you want to change the base?
Conversation
On this:
Would it be possible to have the newer catalogue index downloads on the reload command, such as: pulsar-admin sinks reload I believe in the current implementation that will scan for new connectors NARs, which is equivalent to downloading a newer index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work,
I have left a couple of suggestions
(and also we need tests)
@@ -797,7 +803,7 @@ private void setupInput(ContextImpl contextImpl) throws Exception { | |||
|
|||
pulsarSourceConfig.setSubscriptionType(contextImpl.getSubscriptionType()); | |||
|
|||
pulsarSourceConfig.setTypeClassName(sourceSpec.getTypeClassName()); | |||
// pulsarSourceConfig.setTypeClassName(sourceSpec.getTypeClassName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
final ConnectorsManager connectorsManager = new ConnectorsManager(connectorsDir, narExtractionDirectory, connectorsCatalogueUrl); | ||
connectors.forEach((name, connector) -> { | ||
try { | ||
connectorsManager.loadConnector(name, connector.getConnectorDefinition().getSinkClass() == null ? ComponentType.SOURCE : ComponentType.SINK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to do it here ? (in localrun mode eagerly download everything?)
final File localCache = new File(narExtractionDirectory, | ||
"downloads"); | ||
localCache.mkdirs(); | ||
localArchive = new File(localCache, archiveName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should properly set permissions (only user can read/write)
throws IOException { | ||
try { | ||
File localArchive; | ||
if (connector.getDownloadPath().startsWith(Utils.HTTP)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should log here something to make it clear that we are downloading from the Internet the connector and also the directory
@@ -279,18 +279,22 @@ public static File createPkgTempFile() throws IOException { | |||
} | |||
|
|||
public static File extractFileFromPkgURL(String destPkgUrl) throws IOException, URISyntaxException { | |||
final File pkgTempFile = createPkgTempFile(); | |||
pkgTempFile.deleteOnExit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not guaranteed to happen.
but it is already like this in legacy code, we can keep it
@@ -437,3 +437,4 @@ zooKeeperSessionTimeoutMillis: -1 | |||
# ZooKeeper operation timeout in seconds | |||
# Deprecated: use metadataStoreOperationTimeoutSeconds | |||
zooKeeperOperationTimeoutSeconds: -1 | |||
connectorsCatalogueUrl: file:///Users/nicolo.boschi/catalogue.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this catalogue will include URLs that will probably refer to third party websites.
It may be better to also configure a list of allowed DNS names for the final connector URLs,
this way a forged catalogue cannot force the worker to download anything from the Internet.
In other words we must have an explicit list of allowed sources for the .nar files. It can be a comma separated list of DNS names or "base urls"
connectorsAllowedDownloadArchives: https://server1/xxx,https://server2/xxx,https://server2/yyyyy
(the comma character is not common for URLs, we can use it without problems
futures.add(future); | ||
} | ||
|
||
FutureUtil.waitForAll(futures).join(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please set a timeout
* Make the catalogue connector not downloading the artifact and configure the docker image * Remove the need for class loader for connectors from catalogue * Make the catalogue connector not downloading the artifact and configure the docker image
The pr had no activity for 30 days, mark with Stale label. |
Motivation
STILL DRAFT
To enhance the builtin connectors offer, the functions worker must contain all the NAR files in the docker image.
This is not a scalable solution and it leads to heavy-weight docker images.
At the functions worker startup, all the nar are looked up in the ´connectors´ directory and unpacked.
The idea is to download the actual nar on demand, only when strictly needed.
Instead of bundling the nar inside the docker image, the NAR files will be placed in a remote machine, which will be called
catalogue
.In the functions worker will be possible to configure the URL for the catalogue index (only basic auth supported for now).
The catalogue will be tipically a static webserver which serves a YAML file, with this structure:
All these entries are builtin connectors. When the users ask for the builtin connectors, the metadata are retrieved from the index file.
Once the actual nar is needed, it's downloaded on-demand and cached in the container.
The nar is downloaded in the functions worker when:
The catalogue may or not also serve the connectors artifacts. There is basic auth support for remote artifacts as well.
In a Kubernetes environment, would be possible to create a separated deployment as connector catalogue, backed by nginx or similar services.
With the catalogue being decoupled from the functions worker, the upgrades of each connector is independant to the functions worker image. However, a functions worker restart is needed to download the newer catalogue index.
Modifications
connectorsCatalogueUrl