Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support WASM files being fetched from an http or https endpoint #2700

Closed
salaboy opened this issue Mar 23, 2023 · 26 comments · Fixed by #2817
Closed

Support WASM files being fetched from an http or https endpoint #2700

salaboy opened this issue Mar 23, 2023 · 26 comments · Fixed by #2817
Labels
kind/enhancement New feature or request stale

Comments

@salaboy
Copy link

salaboy commented Mar 23, 2023

Describe the feature

Today the WASM Middleware component can fetch warm files from a path: https://github.com/dapr/components-contrib/blob/master/middleware/http/wasm/httpwasm.go#L34

https://docs.dapr.io/reference/components-reference/supported-middleware/middleware-wasm/

But it will be great if we can use an OCI registry to fetch wasm files, so we can leverage package distribution, enabling developers to share their wasm files as container images.

I was looking at other projects, and something like the oras-go project can be used: https://github.com/oras-project/oras-go -> https://oras.land/client_libraries/0_go/

Creating an image with a WASM file should be as simple as creating a Dockefile like this:

FROM scratch
COPY main.wasm /main.wams

Ideas? Suggestions?

Release Note

RELEASE NOTE:

@salaboy salaboy added the kind/enhancement New feature or request label Mar 23, 2023
@codefromthecrypt
Copy link
Contributor

Thanks for opening this. I think my first thought was something simple as getting a single wasm layer, similar to trivy. Later, and perhaps more important for the outbound binding, it could use more sophisticated layering, but I think we can start with just getting the binary and only that.

e.g.
https://github.com/aquasecurity/trivy-module-wordpress
https://github.com/aquasecurity/trivy/blob/main/pkg/module/command.go#L15

@codefromthecrypt
Copy link
Contributor

maybe to match how other CNCF projects like istio we can use the same field name and description https://istio.io/latest/docs/reference/config/proxy_extensions/wasm-plugin/

Field Type Description Required
url string URL of a Wasm module or OCI container. If no scheme is present, defaults to oci://, referencing an OCI image. Other valid schemes are file:// for referencing .wasm module files present locally within the proxy container, and http[s]:// for .wasm module files hosted remotely. No

@daixiang0
Copy link
Member

maybe to match how other CNCF projects like istio we can use the same field name and description https://istio.io/latest/docs/reference/config/proxy_extensions/wasm-plugin/

Field Type Description Required
url string URL of a Wasm module or OCI container. If no scheme is present, defaults to oci://, referencing an OCI image. Other valid schemes are file:// for referencing .wasm module files present locally within the proxy container, and http[s]:// for .wasm module files hosted remotely. No

Agree.

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Apr 25, 2023

TL;DR; We shouldn't follow istio as it isn't relevant to real world usage. We should not anchor on how it does proxy-wasm, rather assess what is relevant to real world usage. We only need to allow https for the short term, as that's the only main way wasm is distributed apart from local files.


While this issue makes sense, it is based on a faulty (if easy to fall into) assessment that the design of istio came from user demand (it didn't).

In practice, OCI is rarely used in istio. The main spec isn't a standard rather just one company's document written and not really followed. In the extremely rare cases where OCI is used publicly for proxy-wasm, it is using normal container image with a single file layer of the wasm binary. In the vast majority of cases, wasm is distributed by local files (which we do now), and next normal http fetch (which we don't yet).

I think we only need to support container image consumption IFF an end user asks for it, not a vendor.

@daixiang0
Copy link
Member

daixiang0 commented Apr 25, 2023

Since Wasm hub is not as common used as docker hub, we can use HTTP to get single WASM file for now.

If OCI becomes widely used, we can support it, which is not late.

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Apr 25, 2023

agree. any OCI repo (wasm hub and otherwise) is rarely used and could be supported in a URI scheme later. Let's focus on http fetch as it also includes no deps, so is basically only config code.

@jcchavezs
Copy link
Contributor

My 2p: using OCI isn't anything beneficial nowadays. In istio I think the idea was to reuse the pulling mechanism of Kubernetes but it is not like that (e.g. try to build your image locally and pull as a wasm plugin). Even companies doing service mesh (i know a couple for sure) are willing to pull the wasm binary itself and avoiding getting into a burden with OCI pulling matters.

@codefromthecrypt
Copy link
Contributor

@jcchavezs thanks for the feedback, and I meant to ask as it came up in kubecon.. someone said there's a limit of size for k8s configmap files? I suspect the wasm is already out of hand when that occurs, but it matters practically to know. Do you have an idea of what that is (or maybe @salaboy if you do?)

@jcchavezs
Copy link
Contributor

jcchavezs commented Apr 25, 2023 via email

@codefromthecrypt
Copy link
Contributor

right now, a wasm larger than a meg is probably a bug not a feature, except for really huge things like python. In any case, http or other options should be used for exactly this reason, it sounds..

@daixiang0
Copy link
Member

I prefer lazy load rather than pull when init.

Another issue is that HTTP is hard to detect WASM changes, maybe we can add detect time interval to do it.

@codefromthecrypt
Copy link
Contributor

I prefer lazy load rather than pull when init.

Another issue is that HTTP is hard to detect WASM changes, maybe we can add detect time interval to do it.

well what I mean to say is that we can learn from Istio's implementation of http based fetch because even if not efficient I can tell it is in use. agree we won't get the SHA based stuff unless the other side supports ETag, but on the other hand building a solution that is more specific, but no-one uses, isn't helpful either.

@daixiang0
Copy link
Member

let's do it easy then iteration.

@salaboy
Copy link
Author

salaboy commented Apr 26, 2023

I am renaming this to http fetch then :)

@salaboy salaboy changed the title Support WASM files being fetched from an OCI registry using oras-go Support WASM files being fetched from an http or https endpoint Apr 26, 2023
@codefromthecrypt
Copy link
Contributor

so here's the design proposal for HTTP (http:// or https:// urls)

  • in simple case fetch the wasm binary, optionally compressed
  • It can also fetch a tar.gz which includes the wasm binary and any other files needed to mount as read-only
    • this would support python for example, which needs to mount other files
    • the wasm binary would have to be unambiguous (only one file ending in .wasm)
    • this is similar to OCI except there's no ENTRYPOINT to specify the specific wasm file, nor COMMAND for its args

The latter case is almost the same integration code as what OCI would need, just you don't need to use OCI to get the extra files. Note: istio also supports tar.gz, just it doesn't map any of the other files read-only to the guest.

@jcchavezs
Copy link
Contributor

Naive question: if we mount the ancillary files, what for will we be mounting? Will it be a virtual FS? Will wasm binary have access to itself?

@codefromthecrypt
Copy link
Contributor

@jcchavezs none of the questions were naive. In fact it took me days to figure out status quo on these topics, so glad you asked the questions I asked myself!

Naive question: if we mount the ancillary files, what for will we be mounting?

The only sensible mount is root imho as there's no configuration if you have only the tarball. This also cleanly maps to OCI which are mounting anything in layers are the root.

For example, a tar could include exactly the same as the python docker image like this, and it would be easy to say why.

$ ./car -tf ghcr.io/vmware-labs/python-wasm:3.11.3
python.wasm
usr/local/lib/python3.11/lib-dynload/.empty
usr/local/lib/python3.11/os.py
usr/local/lib/python311.zip

Will it be a virtual FS?

IMHO the first implementation should simply be in-memory virtual FS. It could be as simple as a fstest.MapFS in implementation considering the component is alpha. Later, we can have separate PRs for more sophisticated stuff.

Will wasm binary have access to itself?

I can't in good faith implement that, so when I make the PR I will definitely remove it. I think crun are going to fix the OCI misunderstanding the same way. IOTW there's no reason for wasm to read (or write) itself, we are in control of the code, so we shouldn't introduce this bug.

@codefromthecrypt
Copy link
Contributor

@yaron2 sorry that issue didn't fix this. It changed the config to "url" but didn't add http fetcher yet

@yaron2 yaron2 reopened this May 2, 2023
@yaron2
Copy link
Member

yaron2 commented May 2, 2023

@yaron2 sorry that issue didn't fix this. It changed the config to "url" but didn't add http fetcher yet

It got closed automatically when the PR was merged. Reopened now

@github-actions
Copy link

github-actions bot commented Jun 1, 2023

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 1, 2023
@jcchavezs
Copy link
Contributor

jcchavezs commented Jun 1, 2023

Any reason to not to follows @codefromthecrypt's proposal? I think with a explicit consensus, implementation can follow.

@github-actions github-actions bot removed the stale label Jun 1, 2023
@codefromthecrypt
Copy link
Contributor

@jcchavezs yep this is just awaiting implementation. I was waiting until after dapr 1.11 release to begin that.

@github-actions
Copy link

github-actions bot commented Jul 1, 2023

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jul 1, 2023
@codefromthecrypt
Copy link
Contributor

still TODO

@github-actions
Copy link

github-actions bot commented Aug 3, 2023

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 3, 2023
@github-actions
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request stale
Projects
None yet
5 participants