Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ongoing GOLANGCI-LINT timeouts failing PRs and breaking nightly release #241

Closed
bobcatfish opened this issue Feb 21, 2020 · 25 comments
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@bobcatfish
Copy link
Contributor

bobcatfish commented Feb 21, 2020

Expected Behavior

  • GOLANGCI-LINT should only fail when there is an actual problem
  • We shouldn't need to indefinitely bump the timeout on GOLANGCI-LINT
  • We should only run linting when it's needed (when a code change is being made)

Actual Behavior

GOLANGCI-LINT has been failing recently, on PRs with only doc changes and also on nightly releases (which change no code).

Steps to Reproduce the Problem

Not sure yet!

Additional Info

{"level":"info","ts":1582250421.7177243,"logger":"fallback-logger","caller":"logging/config.go:69","msg":"Fetch GitHub commit ID from kodata failed: \"KO_DATA_PATH\" does not exist or is empty"}
level=info msg="[config_reader] Config search paths: [./ /workspace/src/github.com/tektoncd/pipeline /workspace/src/github.com/tektoncd /workspace/src/github.com /workspace/src /workspace /]"
level=info msg="[config_reader] Used config file .golangci.yml"
level=info msg="[lintersdb] Active 15 linters: [deadcode errcheck gocritic gofmt goimports golint gosec gosimple govet ineffassign staticcheck structcheck typecheck unused varcheck]"
level=info msg="[loader] Go packages loading at mode 575 (exports_file|name|types_sizes|compiled_files|deps|files|imports) took 3m25.926421841s"
level=info msg="[runner/filename_unadjuster] Pre-built 0 adjustments in 301.544974ms"
level=info msg="Memory: 2556 samples, avg is 158.0MB, max is 945.8MB"
level=info msg="Execution took 5m0.006786027s"
level=info msg="[runner/goanalysis_metalinter/goanalysis] analyzers took 8m5.409780412s with top 10 stages: buildssa: 6m58.8331346s, inspect: 11.947437511s, goimports: 8.35357768s, fact_purity: 6.806485912s, ctrlflow: 5.808022239s, gofmt: 5.273010346s, printf: 5.068830764s, fact_deprecated: 4.800368051s, gosec: 3.274820755s, gocritic: 2.030760171s"
level=info msg="[runner/unused/goanalysis] analyzers took 39.713235538s with top 10 stages: buildssa: 37.774446394s, U1000: 1.938789144s"
level=info msg="[runner/skip dirs] Skipped 4 issues from dir pkg/client/clientset/versioned/scheme by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 14 issues from dir pkg/client/clientset/versioned/typed/pipeline/v1alpha1 by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 4 issues from dir pkg/client/resource/clientset/versioned/typed/resource/v1alpha1/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 6 issues from dir pkg/client/resource/clientset/versioned/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha2/pipelinerun/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/client/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha2/taskrun/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/taskrun/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha2/clustertask/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/resource/injection/client/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/resource/informers/externalversions by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/clustertask/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 4 issues from dir pkg/client/resource/clientset/versioned/scheme by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/resource/injection/informers/resource/v1alpha1/pipelineresource/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha2/task/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 2 issues from dir pkg/client/resource/clientset/versioned by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 2 issues from dir pkg/client/clientset/versioned by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/task/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha2/pipeline/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/resource/injection/informers/factory/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 12 issues from dir pkg/client/clientset/versioned/typed/pipeline/v1alpha2 by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 14 issues from dir pkg/client/clientset/versioned/typed/pipeline/v1alpha1/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 6 issues from dir pkg/client/clientset/versioned/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/factory/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/pipeline/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/condition/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/pipelinerun/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 4 issues from dir pkg/client/resource/clientset/versioned/typed/resource/v1alpha1 by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 12 issues from dir pkg/client/clientset/versioned/typed/pipeline/v1alpha2/fake by pattern pkg/client"
level=info msg="[runner] Issues before processing: 358, after processing: 0"
level=info msg="[runner] Processors filtering stat (out/in): skip_files: 358/358, exclude-rules: 16/16, nolint: 0/16, filename_unadjuster: 358/358, exclude: 16/257, cgo: 358/358, skip_dirs: 257/358, identifier_marker: 257/257, path_prettifier: 358/358, autogenerated_exclude: 257/257"
level=info msg="[runner] processing took 162.917616ms with stages: exclude: 90.086916ms, identifier_marker: 49.20231ms, path_prettifier: 8.568372ms, nolint: 8.100646ms, skip_dirs: 3.415867ms, autogenerated_exclude: 2.838244ms, cgo: 410.942µs, filename_unadjuster: 287.285µs, max_same_issues: 2.045µs, diff: 968ns, max_from_linter: 781ns, source_code: 670ns, uniq_by_line: 662ns, exclude-rules: 557ns, path_shortener: 531ns, skip_files: 483ns, max_per_file_from_linter: 337ns"
level=info msg="[runner] linters took 3m1.13275983s with stages: goanalysis_metalinter: 2m27.989538707s, unused: 32.979990358s"
level=info msg="File cache stats: 0 entries of total size 0B"
level=error msg="Timeout exceeded: try increase it by passing --timeout option"

bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Feb 21, 2020
This might be a controversial choice but I think our release Pipelines
should only include Tasks that we expect to give us useful feedback
about the release.

Unit tests and linting should fail on pull requests; if they fail on a
release this means one of two things:

1. The linting/test didn't actually run against the merged version of the
  pull request
2. There is something flakey in the linting/test

In the case of (1) this means the pull request shouldn't have actually
been merged, and this failing at release time means anyone looking at
the releases will now be responsible for fixing this pull request's
problems, therefore we should be sure to run this on the pull request
before actually merging (this is I think what Tide is responsible for).

In the case of (2), why should this block a release?

So I think we should remove these steps entirely, especially in light
of tektoncd/plumbing#241 making these flake.

(I think we could remove "build" from this as well since it's not
actually used to build the resulting artifact, and the step that
actually does build the artifact would fail anyway.)
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Feb 21, 2020
This might be a controversial choice but I think our release Pipelines
should only include Tasks that we expect to give us useful feedback
about the release.

Unit tests and linting should fail on pull requests; if they fail on a
release this means one of two things:

1. The linting/test didn't actually run against the merged version of the
  pull request
2. There is something flakey in the linting/test

In the case of (1) this means the pull request shouldn't have actually
been merged, and this failing at release time means anyone looking at
the releases will now be responsible for fixing this pull request's
problems, therefore we should be sure to run this on the pull request
before actually merging (this is I think what Tide is responsible for).

In the case of (2), why should this block a release?

So I think we should remove these steps entirely, especially in light
of tektoncd/plumbing#241 making these flake.

(I think we could remove "build" from this as well since it's not
actually used to build the resulting artifact, and the step that
actually does build the artifact would fail anyway.)
@bobcatfish
Copy link
Contributor Author

level=info msg="[config_reader] Config search paths: [./ /workspace/src/github.com/tektoncd/pipeline /workspace/src/github.com/tektoncd /workspace/src/github.com /workspace/src /workspace /]"
level=info msg="[config_reader] Used config file .golangci.yml"
level=info msg="[lintersdb] Active 15 linters: [deadcode errcheck gocritic gofmt goimports golint gosec gosimple govet ineffassign staticcheck structcheck typecheck unused varcheck]"
level=info msg="[loader] Go packages loading at mode 575 (exports_file|name|types_sizes|compiled_files|deps|files|imports) took 3m25.926421841s"
level=info msg="[runner/filename_unadjuster] Pre-built 0 adjustments in 301.544974ms"
level=info msg="Memory: 2556 samples, avg is 158.0MB, max is 945.8MB"
level=info msg="Execution took 5m0.006786027s"

Thoughts:

  • 5 minutes????
  • Are we linting the vendor directory? I can't imagine that we are b/c surely we'd see a lot more failures but I don't see anything excluding it, altho maybe golangci-lint does that itself?

@bobcatfish
Copy link
Contributor Author

Running it myself is still surprisingly slow but not 5 min slow

docker run -it -v `pwd`:/go/src/github.com/tektoncd/pipeline -w /go/src/github.com/tektoncd/pipeline golangci/golangci-lint:v1.21 golangci-lint run -v
(⎈ |euca:default)➜  pipeline git:(remove_lint_release) ✗ sudo docker run -it -v `pwd`:/go/src/github.com/tektoncd/pipeline -w /go/src/github.com/tektoncd/pipeline golangci/golangci-lint:v1.21 golangci-lint run -v                                                            
INFO [config_reader] Config search paths: [./ /go/src/github.com/tektoncd/pipeline /go/src/github.com/tektoncd /go/src/github.com /go/src /go /] 
INFO [config_reader] Used config file .golangci.yml 
INFO [lintersdb] Active 15 linters: [deadcode errcheck gocritic gofmt goimports golint gosec gosimple govet ineffassign staticcheck structcheck typecheck unused varcheck] 
INFO [loader] Go packages loading at mode 575 (exports_file|files|types_sizes|compiled_files|imports|name|deps) took 29.08651601s 
INFO [runner/filename_unadjuster] Pre-built 0 adjustments in 25.180484ms 
INFO [runner/goanalysis_metalinter/goanalysis] analyzers took 2m2.847448372s with top 10 stages: buildssa: 1m24.991040199s, goimports: 2.635509554s, printf: 2.574138871s, inspect: 2.332713315s, fact_deprecated: 2.092404725s, gofmt: 2.042424264s, ctrlflow: 1.996057545s, gosec: 1.360364537s, fact_purity: 1.284188723s, SA1005: 873.14257ms 
INFO [runner/unused/goanalysis] analyzers took 11.546499746s with top 10 stages: buildssa: 10.819748788s, U1000: 726.750958ms 
INFO [runner/skip dirs] Skipped 6 issues from dir pkg/client/clientset/versioned/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/taskrun/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 2 issues from dir pkg/client/resource/clientset/versioned by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/condition/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/task/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha2/pipelinerun/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/clustertask/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/resource/injection/informers/resource/v1alpha1/pipelineresource/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 12 issues from dir pkg/client/clientset/versioned/typed/pipeline/v1alpha2 by pattern pkg/client 
INFO [runner/skip dirs] Skipped 12 issues from dir pkg/client/clientset/versioned/typed/pipeline/v1alpha2/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 14 issues from dir pkg/client/clientset/versioned/typed/pipeline/v1alpha1/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 6 issues from dir pkg/client/resource/clientset/versioned/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 2 issues from dir pkg/client/clientset/versioned by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/client/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/pipeline/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha2/task/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha2/pipeline/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 4 issues from dir pkg/client/resource/clientset/versioned/typed/resource/v1alpha1 by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha2/taskrun/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 4 issues from dir pkg/client/resource/clientset/versioned/scheme by pattern pkg/client 
INFO [runner/skip dirs] Skipped 4 issues from dir pkg/client/clientset/versioned/scheme by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/factory/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/pipelinerun/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha2/clustertask/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 4 issues from dir pkg/client/resource/clientset/versioned/typed/resource/v1alpha1/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/resource/injection/client/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/resource/injection/informers/factory/fake by pattern pkg/client 
INFO [runner/skip dirs] Skipped 14 issues from dir pkg/client/clientset/versioned/typed/pipeline/v1alpha1 by pattern pkg/client 
INFO [runner/skip dirs] Skipped 1 issues from dir pkg/client/resource/informers/externalversions by pattern pkg/client 
INFO [runner] Issues before processing: 358, after processing: 0 
INFO [runner] Processors filtering stat (out/in): cgo: 358/358, path_prettifier: 358/358, identifier_marker: 257/257, exclude-rules: 16/16, filename_unadjuster: 358/358, skip_files: 358/358, exclude: 16/257, nolint: 0/16, skip_dirs: 257/358, autogenerated_exclude: 257/257 
INFO [runner] processing took 29.289398ms with stages: path_prettifier: 11.357087ms, exclude: 8.300751ms, identifier_marker: 4.266512ms, nolint: 2.665138ms, autogenerated_exclude: 1.735067ms, skip_dirs: 834.666µs, cgo: 66.63µs, filename_unadjuster: 59.078µs, max_same_issues: 1.689µs, path_shortener: 654ns, diff: 423ns, source_code: 334ns, uniq_by_line: 326ns, max_from_linter: 324ns, exclude-rules: 254ns, skip_files: 251ns, max_per_file_from_linter: 214ns 
INFO [runner] linters took 15.478932642s with stages: goanalysis_metalinter: 12.839552681s, unused: 2.610000064s 
INFO File cache stats: 0 entries of total size 0B 
INFO Memory: 440 samples, avg is 433.7MB, max is 2300.3MB 
INFO Execution took 44.68862113s 

@dibyom
Copy link
Member

dibyom commented Feb 21, 2020

probably the same as #211

I think I remember hearing that this happens when the cluster is under load

@bobcatfish
Copy link
Contributor Author

bobcatfish commented Feb 21, 2020

I think I remember hearing that this happens when the cluster is under load

hmm why would the cluster be under load?

why would the dogfood cluster be under load? i could see the prow cluster being under load

@bobcatfish
Copy link
Contributor Author

Looking at a recent successful linting run from a recent nightly release:

{"level":"info","ts":1582077619.438988,"logger":"fallback-logger","caller":"logging/config.go:69","msg":"Fetch GitHub commit ID from kodata failed: \"KO_DATA_PATH\" does not exist or is empty"}
level=info msg="[config_reader] Config search paths: [./ /workspace/src/github.com/tektoncd/pipeline /workspace/src/github.com/tektoncd /workspace/src/github.com /workspace/src /workspace /]"
level=info msg="[config_reader] Used config file .golangci.yml"
level=info msg="[lintersdb] Active 15 linters: [deadcode errcheck gocritic gofmt goimports golint gosec gosimple govet ineffassign staticcheck structcheck typecheck unused varcheck]"
level=info msg="[loader] Go packages loading at mode 575 (types_sizes|compiled_files|deps|exports_file|files|imports|name) took 1m46.491337366s"
level=info msg="[runner/filename_unadjuster] Pre-built 0 adjustments in 186.876301ms"
level=info msg="[runner/unused/goanalysis] analyzers took 1m10.119177516s with top 10 stages: buildssa: 1m6.955499281s, U1000: 3.163678235s"
level=info msg="[runner/goanalysis_metalinter/goanalysis] analyzers took 5m0.276415115s with top 10 stages: buildssa: 4m8.006192488s, goimports: 7.313849641s, gofmt: 4.702407967s, inspect: 4.694633557s, fact_purity: 3.743714741s, ctrlflow: 3.47881283s, gosec: 3.00406477s, printf: 2.976696756s, golint: 2.917256949s, fact_deprecated: 2.339282949s"
level=warning msg="[runner] Can't run linter goanalysis_metalinter: fact_deprecated: failed prerequisites: fact_deprecated@github.com/tektoncd/pipeline/pkg/artifacts [github.com/tektoncd/pipeline/pkg/apis/pipeline/v1alpha1.test], fact_deprecated@github.com/tektoncd/pipeline/pkg/workspace [github.com/tektoncd/pipeline/pkg/apis/pipeline/v1alpha1.test]"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/resource/clientset/versioned/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/clientset/versioned/fake by pattern pkg/client"
level=info msg="[runner] Issues before processing: 2, after processing: 0"
level=info msg="[runner] Processors filtering stat (out/in): cgo: 2/2, filename_unadjuster: 2/2, skip_files: 2/2, skip_dirs: 0/2, path_prettifier: 2/2"
level=info msg="[runner] processing took 316.659µs with stages: path_prettifier: 176.118µs, skip_dirs: 125.746µs, cgo: 4.819µs, filename_unadjuster: 2.578µs, max_same_issues: 1.286µs, nolint: 811ns, identifier_marker: 787ns, autogenerated_exclude: 775ns, exclude: 567ns, max_from_linter: 485ns, source_code: 484ns, diff: 452ns, path_shortener: 417ns, exclude-rules: 362ns, skip_files: 354ns, max_per_file_from_linter: 349ns, uniq_by_line: 269ns"
level=info msg="[runner] linters took 1m50.920191194s with stages: goanalysis_metalinter: 1m24.442621714s, unused: 26.477124711s"
level=info msg="File cache stats: 0 entries of total size 0B"
level=info msg="Memory: 1641 samples, avg is 823.9MB, max is 2770.0MB"
level=info msg="Execution took 3m38.515697034s"

Still > 3 min just to lint

@bobcatfish
Copy link
Contributor Author

Several days before that, 2m 44 seconds:

{"level":"info","ts":1581645615.8491151,"logger":"fallback-logger","caller":"logging/config.go:69","msg":"Fetch GitHub commit ID from kodata failed: \"KO_DATA_PATH\" does not exist or is empty"}
level=info msg="[config_reader] Config search paths: [./ /workspace/src/github.com/tektoncd/pipeline /workspace/src/github.com/tektoncd /workspace/src/github.com /workspace/src /workspace /]"
level=info msg="[config_reader] Used config file .golangci.yml"
level=info msg="[lintersdb] Active 15 linters: [deadcode errcheck gocritic gofmt goimports golint gosec gosimple govet ineffassign staticcheck structcheck typecheck unused varcheck]"
level=info msg="[loader] Go packages loading at mode 575 (exports_file|imports|types_sizes|compiled_files|deps|files|name) took 1m42.920251191s"
level=info msg="[runner/filename_unadjuster] Pre-built 0 adjustments in 142.36707ms"
level=info msg="[runner/goanalysis_metalinter/goanalysis] analyzers took 2m57.368805301s with top 10 stages: buildssa: 2m23.92774982s, goimports: 5.932456476s, gofmt: 3.355882839s, inspect: 3.271885838s, gosec: 2.361230914s, fact_deprecated: 2.164040686s, ctrlflow: 1.939821999s, fact_purity: 1.754041086s, printf: 1.60708617s, golint: 1.309482418s"
level=info msg="[runner/unused/goanalysis] analyzers took 30.274860022s with top 10 stages: buildssa: 28.718434986s, U1000: 1.556425036s"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/resource/informers/externalversions by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 4 issues from dir pkg/client/resource/clientset/versioned/typed/resource/v1alpha1 by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 14 issues from dir pkg/client/clientset/versioned/typed/pipeline/v1alpha1/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 2 issues from dir pkg/client/clientset/versioned by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 6 issues from dir pkg/client/resource/clientset/versioned/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/resource/injection/client/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 4 issues from dir pkg/client/clientset/versioned/scheme by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/pipelinerun/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/pipeline/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha2/pipelinerun/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha2/pipeline/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 4 issues from dir pkg/client/resource/clientset/versioned/scheme by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/taskrun/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 4 issues from dir pkg/client/resource/clientset/versioned/typed/resource/v1alpha1/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 12 issues from dir pkg/client/clientset/versioned/typed/pipeline/v1alpha2/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/client/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/task/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha2/taskrun/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha2/task/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 2 issues from dir pkg/client/resource/clientset/versioned by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/clustertask/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha2/clustertask/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 6 issues from dir pkg/client/clientset/versioned/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 14 issues from dir pkg/client/clientset/versioned/typed/pipeline/v1alpha1 by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/resource/injection/informers/factory/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 12 issues from dir pkg/client/clientset/versioned/typed/pipeline/v1alpha2 by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/factory/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/resource/injection/informers/resource/v1alpha1/pipelineresource/fake by pattern pkg/client"
level=info msg="[runner/skip dirs] Skipped 1 issues from dir pkg/client/injection/informers/pipeline/v1alpha1/condition/fake by pattern pkg/client"
level=info msg="[runner] Issues before processing: 358, after processing: 0"
level=info msg="[runner] Processors filtering stat (out/in): skip_files: 358/358, exclude-rules: 16/16, nolint: 0/16, filename_unadjuster: 358/358, identifier_marker: 257/257, skip_dirs: 257/358, autogenerated_exclude: 257/257, cgo: 358/358, path_prettifier: 358/358, exclude: 16/257"
level=info msg="[runner] processing took 105.903218ms with stages: exclude: 44.486491ms, identifier_marker: 40.405434ms, path_prettifier: 8.557388ms, nolint: 6.771643ms, skip_dirs: 2.476795ms, autogenerated_exclude: 2.435746ms, cgo: 464.342µs, filename_unadjuster: 297.433µs, max_same_issues: 2.56µs, diff: 877ns, max_from_linter: 861ns, source_code: 765ns, path_shortener: 724ns, uniq_by_line: 622ns, exclude-rules: 613ns, skip_files: 567ns, max_per_file_from_linter: 357ns"
level=info msg="[runner] linters took 59.824863578s with stages: goanalysis_metalinter: 49.391022371s, unused: 10.327598326s"
level=info msg="File cache stats: 0 entries of total size 0B"
level=info msg="Memory: 1433 samples, avg is 254.1MB, max is 2431.7MB"
level=info msg="Execution took 2m44.046063331s"

@bobcatfish
Copy link
Contributor Author

Ah okay, looks like the vendor directory isn't included by default: https://github.com/golangci/golangci-lint#command-line-options

      --skip-dirs-use-default          Use or not use default excluded directories:
                                         - (^|/)vendor($|/)
                                         - (^|/)third_party($|/)
                                         - (^|/)testdata($|/)
                                         - (^|/)examples($|/)
                                         - (^|/)Godeps($|/)
                                         - (^|/)builtin($|/)
                                        (default true)

@bobcatfish
Copy link
Contributor Author

I think @dibyom is right!

Check out our CPU usage in the dogfood cluster in the last 6 hours:

image

(I think 100% here actually means 400%, i dunno why it's being normalized like that - in other views I can see 400% - and the nodes we are using have 4 CPUs)

When I run linting locally it uses ~10 CPUs, so it makes sense that only having 4 CPUs would slow things down, and several of these running at once would make it more likely to be even slower.

So I think a couple options are:

  1. Increasing the timeout isnt crazy but when this is applied to PRs it isn't great
  2. Update the dogfood + prow clusters to use instances with more CPUs <-- i think this one is worth trying!

@bobcatfish
Copy link
Contributor Author

bobcatfish commented Feb 21, 2020

Asked prow folks what they are using for kubernetes testing and they think n1-standard-8 n1-highmem-8 (we are currently using n1-standard-4)

@bobcatfish
Copy link
Contributor Author

Looks like we need to create a new nodepool or something to do this: https://cloud.google.com/kubernetes-engine/docs/tutorials/migrating-node-pool

@chmouel
Copy link
Member

chmouel commented Feb 22, 2020

Perhaps, splitting the lint jobs may help ?

The main job with the minimal/essentials linters which would be less resource intensive and the other job with the other linters?

or perhaps the other linter job in a periodic one if that's something available already?

@bobcatfish
Copy link
Contributor Author

Note: double check we are using a version that includes the fixes in golangci/golangci-lint#337

bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Mar 23, 2020
This might be a controversial choice but I think our release Pipelines
should only include Tasks that we expect to give us useful feedback
about the release.

Linting should never fail on a release Pipeline since any issues should
be caught in the PR and
tektoncd/plumbing#241 is making linting
flakey.

I also made it so build and unit test can run in parallel since neither depends
on the other.

I tested this by manually applying the pipelines and manually triggering
the nightly cron:

```
 k --context dogfood create job --from cronjob/nightly-cron-trigger-pipeline-nightly-release nightly-cron-trigger-pipeline-nightly-release-manual-03232020
```
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Mar 23, 2020
This might be a controversial choice but I think our release Pipelines
should only include Tasks that we expect to give us useful feedback
about the release.

Linting should never fail on a release Pipeline since any issues should
be caught in the PR and
tektoncd/plumbing#241 is making linting
flakey.

I also made it so build and unit test can run in parallel since neither depends
on the other.

I tested this by manually applying the pipelines and manually triggering
the nightly cron:

```
 k --context dogfood create job --from cronjob/nightly-cron-trigger-pipeline-nightly-release nightly-cron-trigger-pipeline-nightly-release-manual-03232020
```
@bobcatfish
Copy link
Contributor Author

Note: double check we are using a version that includes the fixes in golangci/golangci-lint#337

Looks like those were merged ~ oct 2019, in b16da69 @vdemeester updates us from 1.23.3 which was released in feb 2020, so it looks like we've got those optimizations already

tekton-robot pushed a commit to tektoncd/pipeline that referenced this issue Mar 24, 2020
This might be a controversial choice but I think our release Pipelines
should only include Tasks that we expect to give us useful feedback
about the release.

Linting should never fail on a release Pipeline since any issues should
be caught in the PR and
tektoncd/plumbing#241 is making linting
flakey.

I also made it so build and unit test can run in parallel since neither depends
on the other.

I tested this by manually applying the pipelines and manually triggering
the nightly cron:

```
 k --context dogfood create job --from cronjob/nightly-cron-trigger-pipeline-nightly-release nightly-cron-trigger-pipeline-nightly-release-manual-03232020
```
@vdemeester
Copy link
Member

/kind bug

@tekton-robot tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 17, 2020
@bobcatfish
Copy link
Contributor Author

@StevenACoffman
Copy link

Hi! So you can either disable the "unused" linter, or upgrade to the recently released 1.26.0 or better yet do both! When the next release (1.26.1 or 1.27.0) drops it will contain fixed that reduced memory usage on flexkube dropped from 10 GB to 1 GB. and improve speed substantially.

@vdemeester
Copy link
Member

@StevenACoffman yeah that's what I gathered 😝

@bobcatfish
Copy link
Contributor Author

Looking back at #241 (comment) ive confused myself again, that comment says the dogfood cluster is maxing out CPU, but these are running in the prow cluster.

I think it might have been a typo cuz it looks like the prow cluster gets to 100% utilization pretty regularly (link to data in metrics explorer for folks who have access)

image

(side note it looks like this does happen to the dogfooding cluster as well, once around 10pm and another around noon, not sure what timezone but it doesnt look as dramatic)

@bobcatfish
Copy link
Contributor Author

Okay I've migrated everything in the prow cluster to a new pool: highmem-pool, using n1-standard-8 If this improves things but we want more, we can go up to 16... or 32! OR 96! maybe not.

Anyway I followed along with https://cloud.google.com/kubernetes-engine/docs/tutorials/migrating-node-pool more or less, though I created the nodepool through the web interface. It was surprisingly easy but also terrifying.

Some commands I ran:

k --context prow get nodes -l cloud.google.com/gke-nodepool=new-pool

for node in $(kubectl --context prow get nodes -l cloud.google.com/gke-nodepool=new-pool -o=name); do
  kubectl --context prow cordon "$node";
done

kubectl --context prow get pods -o=wide --field-selector=status.phase==Running
kubectl --context prow get pods -o=wide --field-selector=status.phase==Running  --all-namespaces

for node in $(kubectl --context prow get nodes -l cloud.google.com/gke-nodepool=new-pool -o=name); do
  kubectl --context prow drain --force --ignore-daemonsets --delete-local-data --grace-period=10 "$node";
done

Anyway let's see how the CPU usage looks over the next couple days and see how many linting timeouts we run into!

@bobcatfish
Copy link
Contributor Author

CPU usage is still sometimes going surprisingly high:

image

But it seems significantly less extreme than before!

@bobcatfish
Copy link
Contributor Author

I think we've stopped seeing this issue, we can reopen if it starts happening again.

As we start migrating to tekton based checks (i think @vdemeester created a linting check that runs in dogfooding) we will probably need to increase the instance size of the dogfooding pool also.

@StevenACoffman
Copy link

For later reference, where is the version of golangci-lint pinned?

@bobcatfish
Copy link
Contributor Author

@StevenACoffman I'm pretty sure it's this:

# Install GolangCI linter: https://github.com/golangci/golangci-lint/
ARG GOLANGCI_VERSION=1.26.0
RUN curl -sL https://github.com/golangci/golangci-lint/releases/download/v${GOLANGCI_VERSION}/golangci-lint-${GOLANGCI_VERSION}-linux-amd64.tar.gz | tar -C /usr/local/bin -xvzf - --strip-components=1 --wildcards "*/golangci-lint"

@StevenACoffman
Copy link

Thanks! Consider upgrading to 1.27.0 for more memory and cpu usage improvements.

bobcatfish added a commit to bobcatfish/plumbing that referenced this issue Jun 16, 2020
@StevenACoffman pointed out in tektoncd#241
that the 1.27 release has CPU and memory usage improvements - since
we've struggled with CPU limitations in the past, and had linting
timeout as a result, seems worth an upgrade!
@bobcatfish bobcatfish mentioned this issue Jun 16, 2020
1 task
@bobcatfish
Copy link
Contributor Author

oooo sg! Thanks @StevenACoffman - opened #430

tekton-robot pushed a commit that referenced this issue Jun 17, 2020
@StevenACoffman pointed out in #241
that the 1.27 release has CPU and memory usage improvements - since
we've struggled with CPU limitations in the past, and had linting
timeout as a result, seems worth an upgrade!
opendroid added a commit to opendroid/the-gpl that referenced this issue Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

6 participants