-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Commit from 7/Jul/2022 breaks nomad #70
Comments
@garenchan can you help with this? |
I'm terribly sorry. It's my mistake. In k8s, we should mount seaweedfs-csi-driver/deploy/helm/seaweedfs-csi-driver/templates/daemonset.yml Lines 90 to 92 in 8987bd9
|
Unfortunately I'm not very up on the internal workings of nomad and the CSI drivers. From a configuration point of view I don't think we can mount that in, maybe it can be done in the job file but it would be messy and different from the other CSI drivers. In that it would have to be a specific mount entry in each job in addition to the CSI entry. One option / quick fix that I can think of is providing an option to tell the driver to mount in the old location or mount in the new and symlink, that gets everything running and can toggle between the two options to experiment with getting nomad to see the csi/staging folder. |
I've had a go and I can't find a way of getting that extra mount point into Nomad, I think this might be one of those cases where Nomad isn't as full featured as k8s. It looks like they bind mount just the mount point from the per-alloc folder. Happy to try any ideas. |
OK, I'll try to find a solution. I'm sorry to have caused you any trouble. |
Not a problem, I'm still testing, on paper it seems a good idea, I think it's just lack of a feature in Nomad. I think it might be good to version the plugins that way we can roll back trivially something like . |
I just hit this same issue and it was driving me insane (I thought apparmor on Ubuntu was messing up). Thanks for reporting this here! I can help with any patch if needed as my homelab is now broken 😭 (by choice) |
I don't see the exact issue here since the csi-plugin is successfully creating the |
I think Nomad is following the CSI standards here but a symlink to the mount point might not be allowed by the docker isolation and cgroups limitations? |
a side note: can we have a github action to run as an integration test on each commit or PR. I am not familiar with CSI that much. |
I tried to fix this issue by using bind mount instead of symbolic link. I have not used Nomad and currently have no free resources to deploy it. Could you please help me verify if this commit can solve the problem? Thank you very much. |
@garenchan well there's good news and bad news, building the new version the mounts work and the containers start, the bad news is you can't write to the mounts, read is ok but writes lock everything up. |
@garenchan sorry, please ignore my previous comment about it not working, I've just rebuilt the SeaweedFS cluster and it's working like a dream now. I must have mangled the cluster when I was testing something else. |
Fix #70: use bind mount rather than symbolic link
@chrislusf when can we get a new release using the latest 3.15 seaweedfs release? |
@danlsgiga you can build off the docker file in cmd / seaweedfs-csi-driver / Dockerfile, which is what I did to test. Also what I'll be doing going forwards and pushing to a private registry, that way I can version and revert very quickly if I need to. |
Works perfectly! Thanks all for the fix! |
The commit from the 7/Jul/2022 "Pods using the same volume share mount" appears to have broken the CSI driver on nomad, if I build a version prior to that commit everything works as expected, however from that commit onwards the SeaweedFS mount always fails in the target container.
Error from the job mounting the volume:
From the CSI Job:
The CSI driver is mounting the SeaweedFS volume to the staging folder and accessing it on the host will let me view the files from the cluster.
In the old driver the file system is mounted at:
per-alloc/e01bf906-f4e1-64e4-5360-d049dc05355c/code_server/rw-file-system-multi-node-multi-writer
However on the new it's is mounted at:
/local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
and the alloc just has a symbolic link to the mount:
per-alloc/a464a996-bb12-c2c6-4dec-4993ce31651b/code_server/rw-file-system-multi-node-multi-writer -> /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
It appears that the target container either can't follow the sym link or can't get to /local/csi/staging/.
Maybe it should use a bind mount rather than a symlink?
The text was updated successfully, but these errors were encountered: