Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commit from 7/Jul/2022 breaks nomad #70

Closed
paularlott opened this issue Jul 14, 2022 · 16 comments · Fixed by #71
Closed

Commit from 7/Jul/2022 breaks nomad #70

paularlott opened this issue Jul 14, 2022 · 16 comments · Fixed by #71

Comments

@paularlott
Copy link

The commit from the 7/Jul/2022 "Pods using the same volume share mount" appears to have broken the CSI driver on nomad, if I build a version prior to that commit everything works as expected, however from that commit onwards the SeaweedFS mount always fails in the target container.

Error from the job mounting the volume:

Driver Failure | failed to create container: API error (400): invalid mount config for type "bind": bind source path does not exist: /opt/nomad/client/csi/monolith/seaweedfs/per-alloc/102f0f75-3dc2-7ed5-4ea3-0a2588fada96/code_server/rw-file-system-multi-node-multi-writer

From the CSI Job:

I0714 02:04:44     1 main.go:38] connect to filer 192.168.8.50:8888,192.168.8.51:8888,192.168.8.52:8888
I0714 02:04:44     1 driver.go:50] Driver: seaweedfs-csi-driver version: 1.0.0
I0714 02:04:44     1 driver.go:99] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I0714 02:04:44     1 driver.go:99] Enabling volume access mode: SINGLE_NODE_WRITER
I0714 02:04:44     1 driver.go:99] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER
I0714 02:04:44     1 driver.go:99] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER
I0714 02:04:44     1 driver.go:110] Enabling controller service capability: CREATE_DELETE_VOLUME
I0714 02:04:44     1 driver.go:110] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME
I0714 02:04:44     1 driver.go:110] Enabling controller service capability: SINGLE_NODE_MULTI_WRITER
I0714 02:04:44     1 server.go:92] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I0714 02:04:53     1 nodeserver.go:32] node stage volume code_server to /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:53     1 mounter_seaweedfs.go:38] mounting [192.168.8.50:8888 192.168.8.51:8888 192.168.8.52:8888] /testing to /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:53     1 mounter.go:39] Mounting fuse with command: weed and args: [-logtostderr=true mount -dirAutoCreate=true -umask=000 -dir=/local/csi/staging/code_server/rw-file-system-multi-node-multi-writer -collection=testing -filer=192.168.8.50:8888,192.168.8.51:8888,192.168.8.52:8888 -filer.path=/testing -cacheCapacityMB=256 -localSocket=/tmp/seaweedfs-mount-1677588823.sock -collectionQuotaMB=953 -replication=001 -concurrentWriters=32 -cacheDir=/alloc/cache_dir]
I0714 02:04:53     1 nodeserver.go:78] volume code_server successfully staged to /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:53     1 nodeserver.go:87] node publish volume code_server to /local/csi/per-alloc/102f0f75-3dc2-7ed5-4ea3-0a2588fada96/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:53     1 nodeserver.go:118] volume code_server successfully published to /local/csi/per-alloc/102f0f75-3dc2-7ed5-4ea3-0a2588fada96/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:58     1 nodeserver.go:125] node unpublish volume code_server from /local/csi/per-alloc/102f0f75-3dc2-7ed5-4ea3-0a2588fada96/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:58     1 nodeserver.go:192] node unstage volume code_server from /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:58     1 volume.go:117] unmounting volume code_server from /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
W0714 02:04:58     1 mounter.go:66] Unable to find PID of fuse mount /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer, it must have finished already

The CSI driver is mounting the SeaweedFS volume to the staging folder and accessing it on the host will let me view the files from the cluster.

In the old driver the file system is mounted at:

per-alloc/e01bf906-f4e1-64e4-5360-d049dc05355c/code_server/rw-file-system-multi-node-multi-writer

However on the new it's is mounted at:

/local/csi/staging/code_server/rw-file-system-multi-node-multi-writer

and the alloc just has a symbolic link to the mount:

per-alloc/a464a996-bb12-c2c6-4dec-4993ce31651b/code_server/rw-file-system-multi-node-multi-writer -> /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer

It appears that the target container either can't follow the sym link or can't get to /local/csi/staging/.

Maybe it should use a bind mount rather than a symlink?

@chrislusf
Copy link
Contributor

@garenchan can you help with this?

@garenchan
Copy link
Contributor

garenchan commented Jul 14, 2022

I'm terribly sorry. It's my mistake.

In k8s, we should mount /var/lib/kubelet/plugins staging directory into csi driver pod with bidirectional propagation. Maybe we need to do something similar with /local/csi in nomad?

- name: plugins-dir
mountPath: /var/lib/kubelet/plugins
mountPropagation: "Bidirectional"

@paularlott
Copy link
Author

Unfortunately I'm not very up on the internal workings of nomad and the CSI drivers. From a configuration point of view I don't think we can mount that in, maybe it can be done in the job file but it would be messy and different from the other CSI drivers. In that it would have to be a specific mount entry in each job in addition to the CSI entry.

One option / quick fix that I can think of is providing an option to tell the driver to mount in the old location or mount in the new and symlink, that gets everything running and can toggle between the two options to experiment with getting nomad to see the csi/staging folder.

@paularlott
Copy link
Author

I've had a go and I can't find a way of getting that extra mount point into Nomad, I think this might be one of those cases where Nomad isn't as full featured as k8s. It looks like they bind mount just the mount point from the per-alloc folder.

Happy to try any ideas.

@garenchan
Copy link
Contributor

OK, I'll try to find a solution. I'm sorry to have caused you any trouble.

@paularlott
Copy link
Author

Not a problem, I'm still testing, on paper it seems a good idea, I think it's just lack of a feature in Nomad.

I think it might be good to version the plugins that way we can roll back trivially something like .

@danlsgiga
Copy link
Contributor

I just hit this same issue and it was driving me insane (I thought apparmor on Ubuntu was messing up).

Thanks for reporting this here! I can help with any patch if needed as my homelab is now broken 😭 (by choice)

@danlsgiga
Copy link
Contributor

I don't see the exact issue here since the csi-plugin is successfully creating the stage mountpoint and publishing it afterwards successfully as well (according to the logs). So, theoretically, the per-alloc path that Nomad looks for should be there. However I think the publish action just create a symlink to the stage mountpoint, which I believe may be the culprit here?

@danlsgiga
Copy link
Contributor

I think Nomad is following the CSI standards here but a symlink to the mount point might not be allowed by the docker isolation and cgroups limitations?

@chrislusf
Copy link
Contributor

a side note: can we have a github action to run as an integration test on each commit or PR.

I am not familiar with CSI that much.

garenchan added a commit to garenchan/seaweedfs-csi-driver that referenced this issue Jul 15, 2022
@garenchan
Copy link
Contributor

garenchan commented Jul 15, 2022

@paularlott @danlsgiga

I tried to fix this issue by using bind mount instead of symbolic link. I have not used Nomad and currently have no free resources to deploy it. Could you please help me verify if this commit can solve the problem? Thank you very much.

garenchan@1c4aa84

@paularlott
Copy link
Author

@garenchan well there's good news and bad news, building the new version the mounts work and the containers start, the bad news is you can't write to the mounts, read is ok but writes lock everything up.

@paularlott
Copy link
Author

@garenchan sorry, please ignore my previous comment about it not working, I've just rebuilt the SeaweedFS cluster and it's working like a dream now. I must have mangled the cluster when I was testing something else.

chrislusf added a commit that referenced this issue Jul 15, 2022
Fix #70: use bind mount rather than symbolic link
@danlsgiga
Copy link
Contributor

@chrislusf when can we get a new release using the latest 3.15 seaweedfs release?

@paularlott
Copy link
Author

@danlsgiga you can build off the docker file in cmd / seaweedfs-csi-driver / Dockerfile, which is what I did to test. Also what I'll be doing going forwards and pushing to a private registry, that way I can version and revert very quickly if I need to.

@danlsgiga
Copy link
Contributor

Works perfectly! Thanks all for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants