-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PV was re-formated and deleted data in volume #418
Comments
Hi @ysakashita, Can you open a NetApp support case and provide the Trident logs? We need the logs since there is not a way to reproduce the issue. |
@gnarl |
We tried to deep dive the issue. I show logs below.
In this case, the cause why format of volume is
As you may know, blkid command get device information from Therefore, We checked journal log.
At just the same time that the error is occering in Trident(blkid command), a retry process is being executed in multipathd too.
The retry process is implemented as follows. main.c(L1005-L1011) in multipathd So, I see the comments below... It becomes retry domap() when flock of device cannot be acquired for some reason. IMO, as a fix plan for this problem, there is a correction for error( https://github.com/NetApp/trident/blob/v20.04.0/utils/osutils.go#L1968-L1984 I think it's so serious bug for storage. |
We are able to reproduce the issue.
A point of the way is to concentrate the load on a multipathd and OS. (e.g)
At the same time, the retry message appare in the journal log.
|
@ysakashita, thank you for the update. |
Hello @ysakashita commit 9bc9efb was pushed that addresses the issue. |
@ntap-arorar Thank you for the commit. What version do you apply the bug fix? (20.04.01 or 20.07.0) |
@gnarl @ntap-arorar environment
make build for trident
Commit 1 [hash: 3478163, title: skip fstype if it is not used, date: May 13, 2020] e.g. git cherry-pick 3478163
testI try to repdoduce using #418 (comment) . As result,
PVs (Vol: /dev/dm-0 to /dev/dm-8) don't have any issue. Below is all trident's log (loglevel: debug) Also I show a part of the log.
I found |
Kindly confirm that your issue is resolved with commit ba749c5. This commit will be included in the Trident 20.07 release. |
@gnarl Commit 1 [hash: 3478163, title: skip fstype if it is not used, date: May 13, 2020] |
* adds period to allowed ontap storage prefix characters * updates changelog Fixes: #455
This is still present for us on Trident 20.10.1 (running on kube 1.19.2, Flatcar Container Linux 2605.11.0, 5.4.87-flatcar)
Is multipathd playing any role here? This is affecting crucial deployments for us so we could consider giving up on multipaths if that is to blame here. |
Furthermore, I believe that what is happening on our case is that
so it cannot be grabbed from the respective code: Line 2376 in ae27f6f
We've seen cases where
and presumably it could trigger that behaviour? |
We could put the dd call before the blkid call in case this is just strange behavior from blkid. We could call blkid only after dd indicated the volume contained data. However the other possibility (suggested above) is that blkid is behaving correctly and, at least at the moment when it was called, it really did read all zeros. This suggests something went wrong with the multipathing setup, and we will need to figure out how to reproduce that issue. |
This specific behavior happens when you run blkid without sudo. Non-root users aren't able to read block devices typically, so blkid can't work. Trident always runs the node-plugin as root so it would not face this particular issue. |
@bswartz yup, my bad on the |
Also, in general trident could probably have a much more defensive strategy and never decide on formatting a volume that contains data, since it relies on calling os commands that could fail for a number of reasons. @bswartz |
I would want to see a more complete log snippit that covers the whole time from iscsi login to formatting. We are already quite defensive in the existing code and it's hard to imagine a sequence of actions that could cause us to believe that a volume which had data was in fact empty and in need of formatting. The original fix for this issue addresses all of the possibilities that we're aware of, so there must be some other possibility we've never thought of or seen before. Personally, I'm suspicious about the unusual Linux distro. I've never worked with "Flatcar Container Linux" and I'm wondering if that distro does anything strange which could account for this behavior. We've certainly never tested on that distro. |
The challenge here is that, due to the CSI architecture, nodes can't tell the different between a newly created volume, which always needs to be formatted, and a existing (previously-formatted) volume that's getting re-used. Nodes are expected to figure out which situation they're in on their own, which is why we do the blkid check (many other CSI drivers work like this too). Nodes are stateless and not in a position to remember anything about volumes. The only other approach is to use the REST backchannel to talk to the controller plugin and query the state there. We haven't wanted to do that in the past because it will create new reliability and scaling issues that will harm users running very large clusters. Once we understand the root cause of the problem in your situation, that other approach might be indicated, but I don't want to jump to that solution right away because of the downsides it carries. Let's try to understand the root cause of your specific problem. |
I dumped the logs in a gist: https://gist.github.com/ffilippopoulos/40747b8982d174b99b88099d9a6770f4
That is basically a fork of coreos since they announced eol. |
Also, I do not know whether that helps or not, I've built am image off v20.10.1 with just the following additional log line:
and I was able to reproduce this quickly and see that the blkid command output appeards to be empty indeed:
|
Hey @bswartz it is indeed a fork of CoreOS Container Linux: (https://coreos.com/) which was EOLed and the CoreOS company was bought by Red Hat. The team went on to work on "Fedora CoreOS" https://getfedora.org/coreos - but that project deviated in design from the original. And so Flatcar Linux (https://www.flatcar-linux.org/) is the spiritual successor to the original CoreOS. The idea here being a very "minimal" distribution with a big differentiation is that it doesn't have a package manager unlike most Linux distributions. We are not ruling out that OS itself can be at fault here. We will do our best to debug and provide all the information.
Hope some of that information helps. |
and got some good first results. It looks like we still get a few
and from the pod perspective it looks that we haven't lost any data (obviously this is not thoroughly checked but it looks promising). Hope some of that helps :)) We can also try soaking changes on our environment and raise a PR if you guys want. |
Root of the issue does appear to be with the
Regardless, this should be properly handled. |
Based on the fact that the root cause is the different output format for blkid on this particular Linux variant, I think the correct fix is to cope with strange blkid variants better and not to add a heavyweight backchannel mechanism to statefully track volume formatting. Edit: Sorry, there was a misunderstanding. I thought that we confirmed that blkid merely had a different output format, but it turns out that the output is empty, and the version of blkid is the regular util-linux version which should have an output format we can parse. I would still like to figure out whether the problem is caused by a permission issue, or a device-not-found problem, or something else entirely. It's easy to fix the accidental-reformat problem that can occur in this case, but I'm worried that the fix could break existing things that currently work if blkid is behaving strangely and we don't know why. |
We are going to address the blkid/format issue separately from the multipath issue. The fix for the inappropriate format when blikd misbehaves will be to add more checks and verification so we never rely on blkid alone to determine that a volume is not formatted. This will prevent corruption in the case when a volume that already contains data is attached to a pod and blkid gives us bad information. While it seems clear that something about your multipath config is the underlying cause for the blkid problem, we're tracking that as a separate problem. I expect that the fix for this bug, combined with a solution to your multipath problems, will give you a completely working setup. Until the multipath problem is solved, at least the fix for this bug will avoid any possibility of data corruption. |
Closing per this commit: |
Describe the bug
We are using StatefulSet to create 100 Pods that mount 10G PV.
Then, we operated a rolling update StatefulSet and moving nodes.
As a result, only one PV was re-formated and deleted data in volume.
Environment
To Reproduce
Rolling update of StatefulSet and Node.
But, we don't know clealy how to reproduce way yet.
Additional context
The trident log (Debug) is below.
I checked source code.
https://github.com/NetApp/trident/blob/v20.04.0/utils/osutils.go#L1968-L1984
I think trident will add error case(exit code 2) patterns.
Also
fsType := ""
may be dangerous.Because, when an error occurs, the default valus
""
returns from getFSType().Therefore, trident call to format commands.
https://github.com/NetApp/trident/blob/v20.04.0/utils/osutils.go#L199-L202
In this case, the data in PV(Volume) is deleted.
We would like to avoid to delete data in volume.
(Another Issue may be better,) it's better that the debug log messages, which are the error of blkid and the retring format, change from Debug to Info level.
The text was updated successfully, but these errors were encountered: