Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ingester/fix] Apply sanitizers to avoid panic on span.process=nil #3819

Merged
merged 13 commits into from
Jul 26, 2022

Conversation

locmai
Copy link
Contributor

@locmai locmai commented Jul 20, 2022

Which problem is this PR solving?

Short description of the changes

  • Using the process.GetTags() and process.GetServiceName() instead of directly reference to those fields to avoid panic due to nil pointer dereference.

@locmai locmai requested a review from a team as a code owner July 20, 2022 10:17
@locmai locmai requested a review from vprithvi July 20, 2022 10:17
assert.Equal(t, 0, len(dbSpanWithNilTags.Process.Tags))
assert.Equal(t, 0, len(dbSpanWithNilProcess.Process.Tags))

tagsMap := map[string]interface{}(nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this was expected but this is what I got from the convertKeyValuesString func if the Tags was nil.

@locmai locmai force-pushed the fix/check-nil-process branch 2 times, most recently from fb2a295 to d21e22d Compare July 20, 2022 10:26
return Process{
ServiceName: process.ServiceName,
ServiceName: process.GetServiceName(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is exactly the type of change that I did not want to make. It avoids a panic, but does it silently, creating invalid data in the process (service name is not allowed to be empty, it could break many other places, including the UI).

I've added a sanitizer in #3631. We can try invoking it early in the Kafka consumer path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look at this and making things clearer. Let me take a look at the sanitizer and try what you suggested.

It make sense that the service name must not be empty, so I might revert the change to process.GetServiceName() here. For the process.GetTags, I think it still nicely handles the nil cases those led to the panics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it still nicely handles the nil cases those led to the panics.

Yes, but again, it does that accidentally, it so happens that in this context the behavior is appropriate. But fundamentally you're trying to work around malformed data that should've never gotten to this point in the pipeline. I don't mind this specific change, but at least with the current code we knew that there is a data issue (even though panicing on it is not ideal reaction).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yurishkuro , thank you so much for your review. As you mentioned, I understand it's important to detect data issue in our system, so should we return error here and let caller know that we do not accept the nil process for example?

I think panic should be considered if we do want application immediately crash. Otherwise, we can consider error.

Thank you for your explanation and effort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we apply the sanitizer earlier in the pipeline then this code does need to change as it will never have to deal with nil process.

assert.Equal(t, 0, len(dbSpanWithNilTags.Process.Tags))
assert.Equal(t, 0, len(dbSpanWithNilProcess.Process.Tags))

tagsMap := map[string]interface{}(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tagsMap := map[string]interface{}(nil)
nilMap := map[string]interface{}(nil)

dbSpanWithNilTags := converter.FromDomainEmbedProcess(&spanWithNilProcessTags)
dbSpanWithNilProcess := converter.FromDomainEmbedProcess(&spanWithNilProcess)

assert.Equal(t, 3, len(dbSpanWithNilTags.Tags))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert.Equal(t, 3, len(dbSpanWithNilTags.Tags))
assert.Len(t, 3, dbSpanWithNilTags.Tags)

@locmai locmai force-pushed the fix/check-nil-process branch from dd9064b to 7b08a60 Compare July 25, 2022 08:32
albertteoh and others added 9 commits July 25, 2022 15:35
Add `all-in-one` component to the list of components to build docker images for.

Signed-off-by: Albert Teoh <see.kwang.teoh@gmail.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
…g#3814)

Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.36.0 to 0.37.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](prometheus/common@v0.36.0...v0.37.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
* Tenancy for queries

Signed-off-by: Ed Snible <snible@us.ibm.com>

* New parameter for RegisterGRPCGateway() test function

Signed-off-by: Ed Snible <snible@us.ibm.com>

* More tests that are local to package itself

Signed-off-by: Ed Snible <snible@us.ibm.com>

* Additional test cases to raise test coverage

Signed-off-by: Ed Snible <snible@us.ibm.com>

* Fix test

Signed-off-by: Ed Snible <snible@us.ibm.com>

* spelling

Signed-off-by: Ed Snible <snible@us.ibm.com>

* Rename file

Signed-off-by: Ed Snible <snible@us.ibm.com>

* Refactor tenancy packages

Signed-off-by: Ed Snible <snible@us.ibm.com>

* restore empty_test.go as part of refactoring tenancy out of storage

Signed-off-by: Ed Snible <snible@us.ibm.com>

* lint/gofumpt

Signed-off-by: Ed Snible <snible@us.ibm.com>

* Enforce tenancy on non-streaming gRPC and add additional tests

Signed-off-by: Ed Snible <snible@us.ibm.com>

* Test for tenant flow to storage for both streaming and unary RPC

Signed-off-by: Ed Snible <snible@us.ibm.com>

* HTTP tenancy test

Signed-off-by: Ed Snible <snible@us.ibm.com>

* Unit test for unary tenancy handler

Signed-off-by: Ed Snible <snible@us.ibm.com>

* Factor out rendundent test function

Signed-off-by: Ed Snible <snible@us.ibm.com>

* Address review comments

Signed-off-by: Ed Snible <snible@us.ibm.com>

* Error name

Signed-off-by: Ed Snible <snible@us.ibm.com>

* Refactor TenancyConfig to TenancyManager

Signed-off-by: Ed Snible <snible@us.ibm.com>

* Address review comments

Signed-off-by: Ed Snible <snible@us.ibm.com>

* Refactor so that NewTenancyManager() only called from main and tests

Signed-off-by: Ed Snible <snible@us.ibm.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
@locmai locmai force-pushed the fix/check-nil-process branch from 7b08a60 to 26a4398 Compare July 25, 2022 08:36
model.Int64("b.b", 1),
}

spanWithNilTags := model.Span{Tags: tags, Process: &model.Process{Tags: nil}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would pass without any code changes in the PR. It's the nil Process that's causing panic in production, not the nil tags (nil tags are still iterable).

Signed-off-by: Yuri Shkuro <github@ysh.us>
@codecov
Copy link

codecov bot commented Jul 26, 2022

Codecov Report

Merging #3819 (9d12426) into main (7b33160) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #3819      +/-   ##
==========================================
+ Coverage   97.58%   97.59%   +0.01%     
==========================================
  Files         291      291              
  Lines       16938    16939       +1     
==========================================
+ Hits        16529    16532       +3     
+ Misses        323      321       -2     
  Partials       86       86              
Impacted Files Coverage Δ
cmd/ingester/app/processor/span_processor.go 100.00% <100.00%> (ø)
pkg/config/tlscfg/cert_watcher.go 94.68% <0.00%> (+2.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b33160...9d12426. Read the comment docs.

@yurishkuro yurishkuro changed the title fix: check nil process [ingester/fix] Apply sanitizers to avoid panic on span.process=nil Jul 26, 2022
@yurishkuro yurishkuro merged commit a5974e1 into jaegertracing:main Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Jaeger ingester panics on convertProcess function
5 participants