Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kyaml: fatal error: concurrent map read and map write #3659

Closed
pst opened this issue Mar 2, 2021 · 15 comments
Closed

kyaml: fatal error: concurrent map read and map write #3659

pst opened this issue Mar 2, 2021 · 15 comments
Labels
area/kyaml issues for kyaml kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. triage/under-consideration

Comments

@pst
Copy link
Contributor

pst commented Mar 2, 2021

I'm using kustomize/api to provide a Terraform provider for Kustomize. Presumably as a result of Terraform/GRPC concurrency there is a fatal error caused by concurrent map read and map write from within the kyaml code base around determining if a resource is namespace scoped.

The downstream issue has full Terraform debug output logs. But the relevant parts seem to be:

fatal error: concurrent map read and map write

goroutine 58 [running]:
runtime.throw(0x2cd0a08, 0x21)
	runtime/panic.go:1116 +0x72 fp=0xc0124988f0 sp=0xc0124988c0 pc=0x1035312
runtime.mapaccess2(0x29d7b00, 0xc0082ab770, 0xc0124989a0, 0x2, 0xc01139ea10)
	runtime/map.go:469 +0x25b fp=0xc012498930 sp=0xc0124988f0 pc=0x100f5bb
sigs.k8s.io/kustomize/kyaml/openapi.IsNamespaceScoped(...)
	sigs.k8s.io/kustomize/kyaml@v0.10.7/openapi/openapi.go:270
sigs.k8s.io/kustomize/api/resid.Gvk.IsNamespaceableKind(0x0, 0x0, 0xc01139e9f8, 0x2, 0xc01139ea10, 0xe, 0x3e6da60)
	sigs.k8s.io/kustomize/api@v0.6.9/resid/gvk.go:219 +0xf9 fp=0xc0124989d0 sp=0xc012498930 pc=0x1b37339
sigs.k8s.io/kustomize/api/resid.ResId.EffectiveNamespace(0x0, 0x0, 0xc01139e9f8, 0x2, 0xc01139ea10, 0xe, 0xc0113a6a40, 0x1c, 0xc0113a6a60, 0x11, ...)
	sigs.k8s.io/kustomize/api@v0.6.9/resid/resid.go:120 +0x5a fp=0xc012498a68 sp=0xc0124989d0 pc=0x1b37eda

Kustomize version

Relevant lines from go.mod:

sigs.k8s.io/kustomize/api v0.6.9
sigs.k8s.io/kustomize/kyaml v0.10.7

I'm blocked from updating to a higher api version due to #3614

@liggitt
Copy link
Contributor

liggitt commented Mar 2, 2021

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 2, 2021
@Shell32-Natsu
Copy link
Contributor

@natasha41575 Looks like relate to OpenAPI in kyaml.

@alapidas
Copy link

Seeing this same issue a la the kustomize provider in Terraform.

@alapidas
Copy link

@pst Can you add a mutex around the krusty code as a workaround? Not sure what kind of performance hit serializing these operations may have.

pst added a commit to kbst/terraform-provider-kustomization that referenced this issue Mar 17, 2021
This mutex prevents multiple Kustomizer runs in parallel
to avoid the `concurrent map read and map write` bug from
upstream.

kubernetes-sigs/kustomize#3659
stefanprodan added a commit to fluxcd/kustomize-controller that referenced this issue May 11, 2021
Serialize kustomize build runs to avoid kyaml OpenAPI concurrent map read/write panic
kubernetes-sigs/kustomize#3659

Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
stefanprodan added a commit to fluxcd/kustomize-controller that referenced this issue May 11, 2021
Serialize kustomize build runs to avoid kyaml OpenAPI concurrent map read/write panic
kubernetes-sigs/kustomize#3659

Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
stefanprodan added a commit to fluxcd/kustomize-controller that referenced this issue Jun 3, 2021
Serialize kustomize build runs to avoid kyaml OpenAPI concurrent map read/write panic
kubernetes-sigs/kustomize#3659

Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
stefanprodan added a commit to fluxcd/kustomize-controller that referenced this issue Jun 8, 2021
Serialize kustomize build runs to avoid kyaml OpenAPI concurrent map read/write panic
kubernetes-sigs/kustomize#3659

Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2021
@pst
Copy link
Contributor Author

pst commented Jun 14, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 12, 2021
@pst
Copy link
Contributor Author

pst commented Sep 13, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 13, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 12, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 11, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ljakimczuk
Copy link

@alapidas FYI, about the performance concerns you had, we have been hit by this in one of our environments due to its network conditions.

We use Flux with the build serialization workaround implemented, and the repository it reconciles relies on the kustomize Helm Chart plugin. Because of the sometimes poor network conditions of this particular environment, the build struggles to pull the Helm Chart, sometimes sevirely resulting in a failure, sometimes it just takes longer because many segments are lost and must be re-transmitted. In result, the Flux Kustomization that struggles to build due to the pulling problem, blocks other Kustomization sometimes for a very long time, below is the sample of reconciliation times we noticed for failed cases.

...
Reconciliation failed after 11m24.043533326s, next try in 30s | name=flux
Reconciliation finished in 11m43.475090566s, next run in 10s | name=collection
...
Reconciliation failed after 11m53.681076341s, next try in 30s | name=flux
Reconciliation finished in 11m53.989449191s, next run in 10s | name=collection
...
Reconciliation failed after 11m33.807809068s, next try in 30s | name=flux
Reconciliation finished in 11m52.965351964s, next run in 10s | name=collection
...
Reconciliation failed after 30m28.306259106s, next try in 30s | name=flux
Reconciliation finished in 30m46.210455889s, next run in 10s | name=collection
...
Reconciliation failed after 13m16.751290663s, next try in 30s | name=flux
Reconciliation finished in 13m24.166936029s, next run in 10s | name=collection
...
Reconciliation failed after 7m51.220454452s, next try in 30s | name=flux
Reconciliation finished in 7m51.714838107s, next run in 10s | name=collection
...
Reconciliation failed after 12m6.895618115s, next try in 30s | name=flux
Reconciliation finished in 12m7.952321057s, next run in 10s | name=collection
...
Reconciliation failed after 32m48.289605576s, next try in 30s | name=flux
Reconciliation finished in 32m48.477273626s, next run in 10s | name=collection
...

@HirazawaUi
Copy link

/reopen
This problem still exists, is there anyone who can help to solve :)

@k8s-ci-robot
Copy link
Contributor

@HirazawaUi: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen
This problem still exists, is there anyone who can help to solve :)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kyaml issues for kyaml kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. triage/under-consideration
Projects
None yet
Development

No branches or pull requests

10 participants