Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid path: /var/local/lib/wasme-cache when deploying on Istio #158

Open
djannot opened this issue Aug 14, 2020 · 15 comments · Fixed by #159
Open

Invalid path: /var/local/lib/wasme-cache when deploying on Istio #158

djannot opened this issue Aug 14, 2020 · 15 comments · Fixed by #159
Assignees

Comments

@djannot
Copy link

djannot commented Aug 14, 2020

I've followed this guide:
https://docs.solo.io/web-assembly-hub/latest/tutorial_code/deploy_tutorials/deploying_with_istio/

And was able to deploy on my cluster running Istio 1.6.7, but then I got this error on all the Pod from the istio-proxy container:

2020-08-14T08:23:31.404984Z	warning	envoy config	[external/envoy/source/common/config/grpc_subscription_impl.cc:101] gRPC config for type.googleapis.com/envoy.api.v2.Listener rejected: Error adding/updating listener(s) virtualInbound: Invalid path: /var/local/lib/wasme-cache/a515a5d244b021c753f2e36c744e03a109cff6f5988e34714dbe725c904fa917

2020-08-14T08:23:32.807385Z	warn	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2020-08-14T08:23:34.770722Z	warn	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected

The same filter works when I deploy it on Gloo.

@Sodman Sodman self-assigned this Aug 14, 2020
@Sodman
Copy link
Member

Sodman commented Aug 14, 2020

I think I know what the issue is here. We re-wrote our CI/CD release pipeline for the 0.0.24 release and it looks like the VERSION didn't get picked up by the build, so it defaulted to dev. As such, the Istio Operator image is pulling in dev instead of 0.0.24. I'll cut a new release to fix the version issue.

@pantianying
Copy link

hello 0.0.25 still has this problem
image

@Sodman
Copy link
Member

Sodman commented Aug 17, 2020

Yes, this should be fixed by #159

@Sodman
Copy link
Member

Sodman commented Aug 17, 2020

Still seeing this issue after this change

@Sodman Sodman reopened this Aug 17, 2020
@Sodman
Copy link
Member

Sodman commented Aug 17, 2020

False alarm, this is indeed fixed by the 0.0.26 release!

@Sodman Sodman closed this as completed Aug 17, 2020
@GuangTianLi
Copy link

GuangTianLi commented Aug 25, 2020

hello 0.0.26 still has this problem.

image

[Envoy (Epoch 0)] [2020-08-25 03:34:34.117][23][warning][config][external/envoy/source/common/config/grpc_subscription_impl.cc:87] gRPC config for type.googleapis.com/envoy.api.v2.Listener rejected: Error adding/updating listener(s) 172.22.3.210_8000: Failed to initialize WASM code from /var/local/lib/wasme-cache/3f319eec32afdfb1c053e1aea3a665504ff9d5f5ea4019146bcb455dfaea29d1
virtualInbound: Failed to initialize WASM code from /var/local/lib/wasme-cache/3f319eec32afdfb1c053e1aea3a665504ff9d5f5ea4019146bcb455dfaea29d1

@Sodman
Copy link
Member

Sodman commented Aug 25, 2020

Hi @GuangTianLi, It looks like your issue is different. The original issue here was complaining about an invalid path, which was ultimately caused by the wrong version of the operator being loaded.

It looks like in your error message the WASM code is failing to initialize, but it's not complaining about invalid paths.

@pantianying
Copy link

i am sorry , my 0.0.26 version still has this problem, can someone tell me why?

`2020-08-31T02:55:52.212398Z info Envoy command: [-c etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster details.istio-project --service-node sidecar10.129.5.186details-v1-5f8447ccd5-7ggl8.istio-project~istio-project.svc.cluster.local --max-obj-name-len 189 --local-address-ip-version v4 --log-format %Y-%m-%dT%T.%fZ %l envoy %n %v -l warning --component-log-level misc:error --concurrency 2]
2020-08-31T02:55:52.615490Z warning envoy config [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamAggregatedResources gRPC config stream closed: 14, no healthy upstream
2020-08-31T02:55:52.615558Z warning envoy config [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:54] Unable to establish new stream
2020-08-31T02:55:52.642499Z warning envoy main [external/envoy/source/server/server.cc:475] there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
2020-08-31T02:55:52.748311Z info sds resource:default new connection
2020-08-31T02:55:52.748396Z info sds Skipping waiting for ingress gateway secret
2020-08-31T02:55:53.554728Z warning envoy config [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamAggregatedResources gRPC config stream closed: 14, no healthy upstream
2020-08-31T02:55:53.554766Z warning envoy config [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:54] Unable to establish new stream
2020-08-31T02:55:53.563283Z info cache Root cert has changed, start rotating root cert for SDS clients
2020-08-31T02:55:53.563336Z info cache GenerateSecret default
2020-08-31T02:55:53.570992Z info sds resource:default pushed key/cert pair to proxy
2020-08-31T02:55:56.997075Z info sds resource:ROOTCA new connection
2020-08-31T02:55:56.997173Z info sds Skipping waiting for ingress gateway secret
2020-08-31T02:55:56.997203Z info cache Loaded root cert from certificate ROOTCA
2020-08-31T02:55:56.997280Z info sds resource:ROOTCA pushed root cert to proxy
2020-08-31T02:55:57.425220Z warning envoy config [external/envoy/source/common/config/grpc_subscription_impl.cc:101] gRPC config for type.googleapis.com/envoy.api.v2.Listener rejected: Error adding/updating listener(s) virtualInbound: Invalid path: /var/local/lib/wasme-cache/a515a5d244b021c753f2e36c744e03a109cff6f5988e34714dbe725c904fa917

2020-08-31T02:55:59.287257Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2020-08-31T02:56:01.233766Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2020-08-31T02:56:03.187032Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2020-08-31T02:56:05.176523Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2020-08-31T02:56:07.194144Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2020-08-31T02:56:09.192120Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2020-08-31T02:56:11.176369Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2020-08-31T02:56:13.176285Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2020-08-31T02:56:15.176349Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2020-08-31T02:56:17.176047Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected`

image

@ilackarms
Copy link
Member

hi @pantianying @djannot

this issue was actually resolved in. #95 but we have had some CI issues that are blocking it from getting merged, and at this point the PR needs to be updated. cc @yuval-k

a temporary workaround is to restart the target pods; envoy should eventually pick up the wasm module file

@ilackarms ilackarms reopened this Aug 31, 2020
@yuval-k
Copy link
Member

yuval-k commented Aug 31, 2020

i believe the CI issues are related to an envoy bug that was only recently fixed. i.e. the approach in the PR might only work for the next istio release

@pantianying
Copy link

hi @pantianying @djannot

this issue was actually resolved in. #95 but we have had some CI issues that are blocking it from getting merged, and at this point the PR needs to be updated. cc @yuval-k

a temporary workaround is to restart the target pods; envoy should eventually pick up the wasm module file

restart the target pods? when I use

wasme deploy istio webassemblyhub.io/pantianying/add-header:v0.0.3

And then the new POD couldn't init successfully, you mean that restart the pod that couldn't init successfully can solve this issue?

@Sodman
Copy link
Member

Sodman commented Oct 6, 2020

@pantianying Yes, for now restarting the pod should fix it. Unfortunately like @yuval-k mentioned we're waiting for Istio to pull in the upstream envoy fix for the issue which ultimately causes this cache race condition.

@harpratap
Copy link

@Sodman It's still failing for me warning envoy config gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener rejected: Error adding/updating listener(s) virtualInbound: Invalid path: /var/local/lib/wasme-cache/314c75ded0da28314381281e74ab8b91196055360bd7b57f132de21c2116b9a3

And then my pod crashes PostStartHookError: command 'pilot-agent wait' exited with 255: Error: timeout waiting for Envoy proxy to become ready. Last error: HTTP status code 503

Wasme version 0.0.32
Istio version 1.8.2
Kubernetes 1.18.6

I am able to make it work on kind in my local though, but doesn't work on our on-prem cluster. So I am not sure where to start debugging this

@Sodman
Copy link
Member

Sodman commented Feb 3, 2021

Hi @harpratap, do the logs from the wasme pod (which manages the cache), give any more insight? If the cache didn't pull the image correctly (could be an HTTP error) it's possible it never cached it, which would explain why it didn't get loaded into the proxy. If this is the case, you could try bouncing the cache pod to force a refresh.

@tanjunchen
Copy link
Contributor

I guess that it can be resolved by delete the po in wasme namespace. laughing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants