-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: warning about empty admissison headers #112680
Comments
What test were you running? We'll need to hack up some extra instrumentation to figure out where the call originated. |
Just saw this in the wild, in the logs for #112373. Will take a closer look in a bit.
|
Should we mark this as a GA blocker? |
I don't think it is one, unless we see some performance consequence, the internal scale testing would surface these. |
Added some additional logging and found one instance of where we missed the admission header:
The Going to additional logging and fix these all in a single batch. |
Yeah, probably isn't a functional problem, but operators are jumpy about backtraces in logs. |
Hi @aadityasondhi, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Ack. Added the labels, but I am actively working on cleaning this up. |
We just observed this in the CCT test cluster as well, shortly after upgrading to 23.2 beta 1 and dropping a job holding the protected timestamp back. |
FYI: I have a draft WIP PR to track these down, #114077. Based on an internal conversation[1], I will remove this warning message and the included stack trace for 23.2 since it does not point to the source anyways since this is happening in an async task. Will continue to track these down in master. [1] https://cockroachlabs.slack.com/archives/C067NG669KJ/p1701876295435589 |
This error message, while useful for debugging, spams the logs with a stack trace which can be distracting when reading the logs. Since AC defaults to skip when there is an empty header, this is not a concern, unless we see real-world performance impact (which we have not). This patch removes it from release builds while we figure out all the sources for missing headers. Informs cockroachdb#112680 Release note: None
This error message, while useful for debugging, spams the logs with a stack trace which can be distracting when reading the logs. Since AC defaults to skip when there is an empty header, this is not a concern, unless we see real-world performance impact (which we have not). This patch removes it from release builds while we figure out all the sources for missing headers. Informs cockroachdb#112680 Release note: None
114616: logstore: sync sideloaded storage directories r=erikgrinaker a=pavelkalinnikov This PR ensures that the hierarchy of directories/files created by the sideloaded log storage is properly synced. Previously, only the "leaf" files in this hierarchy were fsync-ed. Even though this guarantees the files content and metadata is synced, this still does not guarantee that the references to these files are durable. For example, Linux man page for `fsync` [^1] says: ``` Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed. ``` It means that these files can be lost after a system crash of power off. This leads to issues because: 1. Pebble WAL syncs are not atomic with the sideloaded files syncs. It is thus possible that raft log metadata "references" a sideloaded file and gets synced, but the file is not yet. A power off/on at this point leads to an internal inconsistency, and can result in a crash loop when raft will try to load these entries to apply and/or send to other replicas. 2. The durability of entry files is used as a pre-condition to sending raft messages that trigger committing these entries. A coordinated power off/on on a majority of replicas can thus lead to losing committed entries and unrecoverable loss-of-quorum. This PR fixes the above issues, by syncing parents of all the directories and files that the sideloaded storage creates. [^1]: https://man7.org/linux/man-pages/man2/fsync.2.html Part of #114411 Epic: none Release note (bug fix): this commit fixes a durability bug in raft log storage, caused by incorrect syncing of filesystem metadata. It was possible to lose writes of a particular kind (`AddSSTable`) that is e.g. used by `RESTORE`. This loss was possible only under power-off or OS crash conditions. As a result, CRDB could enter a crash loop on restart. In the worst case of a correlated power-off/crash across multiple nodes this could lead to loss of quorum or data loss. 115689: ui: add warning to network page when unavailable r=maryliag a=stevendanna The network page doesn't work inside a virtual cluster yet. Rather than just presenting a spinner, here we add a warning to the page. Additionally, it simplifies the text of the warning presented on the Advanced Debug page. Informs #115022 <img width="1239" alt="Screenshot 2023-12-06 at 16 09 16" src="https://github.com/cockroachdb/cockroach/assets/852371/43778020-c892-4e96-b1c4-ec58b20309ae"> <img width="1236" alt="Screenshot 2023-12-06 at 16 09 33" src="https://github.com/cockroachdb/cockroach/assets/852371/30643fbb-ec68-4973-b35f-60a9a874e6a5"> Release note: None 115705: kv,admission: only log empty admission warning for non-release builds r=aadityasondhi a=aadityasondhi This error message, while useful for debugging, spams the logs with a stack trace which can be distracting when reading the logs. Since AC defaults to skip when there is an empty header, this is not a concern, unless we see real-world performance impact (which we have not). This patch removes it from release builds while we figure out all the sources for missing headers. Informs #112680 Release note: None Co-authored-by: Pavel Kalinnikov <pavel@cockroachlabs.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com> Co-authored-by: Aaditya Sondhi <20070511+aadityasondhi@users.noreply.github.com>
I too still see this on master on 5f76d93:
I'll defer to @aadityasondhi and @nicktrav to whether this should block RC. |
I observed this in artifacts from #116761 (which is multi-tenant logic test run), similar to Renato's observation above - so perhaps the empty admission header problem is now limited only to multi-tenant deployments. |
Thanks for pointing this out! This should NOT block RC. It is benign warning that only shows in tests: #115705. Thanks for re-opening the issue though, I will investigate. The default behaviour for AC with missing header is to skip AC altogether so no cause for alarm here. |
This warning only shows in non-release builds so hopefully the new wording makes it a little less alarming. Informs cockroachdb#112680 Release note: None
116964: admission: reword empty admission header warning r=nicktrav a=aadityasondhi This warning only shows in non-release builds so hopefully the new wording makes it a little less alarming. Informs #112680 Release note: None 116974: cloud/amazon: disable network access for unit test r=msbutler a=rickystewart No longer necessary since #116251 Epic: CRDB-8308 Release note: None Co-authored-by: Aaditya Sondhi <20070511+aadityasondhi@users.noreply.github.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
This warning only shows in non-release builds so hopefully the new wording makes it a little less alarming. Informs #112680 Release note: None
Seen in the artifacts of a
|
Saw this during a run of the
|
What qualifies as test-only? We're seeing this on the DRT cluster, which was just upgraded to master
|
Any non-release build. So current master will show this message. |
Just saw this log line in the output for this failure #120095
|
Seen in #121176:
|
Seen in 121568:
|
Is this issue something we're still tracking? We see these messages very frequently on the
262 occurrences in a single log file:
|
It has fallen through the cracks. Mostly because we don't have a great way of finding the root of these missing headers, they come from separate goroutines. I found some here through code reading: #116286. I can spend some time next week to dig deeper and see if I can find some more call sites that are missing these headers. |
Thanks @aadityasondhi. For reference, we hit dozens of these every day on the DRT clusters. I'd suggest that you grep our logs if you're looking for more call sites. |
Are we still collecting? Here's one from #129980 Details
|
Details
|
As far as I can tell, nobody's proactively fixing them, so we should not prompt engineers to report additional duplicates. Touches cockroachdb#112680. Epic: none Release note: None
134193: intentresolver: disable warning about empty admission header r=tbg a=tbg As far as I can tell, nobody's proactively fixing them, so we should not prompt engineers to report additional duplicates. Touches #112680. Epic: none Release note: None Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
As far as I can tell, nobody's proactively fixing them, so we should not prompt engineers to report additional duplicates. Touches cockroachdb#112680. Epic: none Release note: None
Describe the problem
When reading through logs I've been seeing this:
Is this something unexpected that we should be fixing? I imagine the goal of this log was to identify places that might benefit from admission headers.
Jira issue: CRDB-32550
The text was updated successfully, but these errors were encountered: