Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/filelog] truncate large log entry #17358

Merged
merged 2 commits into from
Jan 3, 2023

Conversation

haoqixu
Copy link
Member

@haoqixu haoqixu commented Jan 3, 2023

Description:

The PositionalScanner will return the bufio.ErrTooLong error if the log entry is larger than max(max_log_size, 16384).
If splitFunc needs more data to produce a valid token but the scanning buffer is already full, truncate the log entry.

This fix use a different strategy to split log entry comparing to #17309.

When scanning this log content with the start line spliter:

|<------------- log_max_size --------->|
<log_start_line>...log_content...<log_start_line>...log_content...<log_start_line>

#17309 will emit:

|<------------- log_max_size --------->|
<log_start_line>...log_content...<log_st
art_line>...log_content...

This PR uses a double size (log_max_size * 2) buffer to avoid breaking the start line mark and emits:

|<------------- log_max_size --------->|
<log_start_line>...log_content...
<log_start_line>...log_content...

Link to tracking Issue: #16487

Testing:

  • TestTokenizationTooLong()
  • TestTokenizationTooLongWithLineStartPattern()

Documentation:
Update the description of max_log_size in filelogreceiver's README.md

@runforesight
Copy link

runforesight bot commented Jan 3, 2023

Foresight Summary

    
Major Impacts

build-and-test-windows duration(4 seconds) has decreased 47 minutes 3 seconds compared to main branch avg(47 minutes 7 seconds).
View More Details

⭕  build-and-test-windows workflow has finished in 4 seconds (47 minutes 3 seconds less than main branch avg.) and finished at 3rd Jan, 2023.


Job Failed Steps Tests
windows-unittest-matrix -     🔗  N/A See Details
windows-unittest -     🔗  N/A See Details

✅  changelog workflow has finished in 1 minute 48 seconds (6 minutes 45 seconds less than main branch avg.) and finished at 3rd Jan, 2023.


Job Failed Steps Tests
changelog -     🔗  N/A See Details

✅  check-links workflow has finished in 1 minute 56 seconds (2 minutes 5 seconds less than main branch avg.) and finished at 3rd Jan, 2023.


Job Failed Steps Tests
changed files -     🔗  N/A See Details
check-links -     🔗  N/A See Details

✅  tracegen workflow has finished in 2 minutes 21 seconds (2 minutes 8 seconds less than main branch avg.) and finished at 3rd Jan, 2023.


Job Failed Steps Tests
build-dev -     🔗  N/A See Details
publish-latest -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details

✅  prometheus-compliance-tests workflow has finished in 3 minutes 47 seconds (6 minutes 50 seconds less than main branch avg.) and finished at 3rd Jan, 2023.


Job Failed Steps Tests
prometheus-compliance-tests -     🔗  ✅ 21  ❌ 0  ⏭ 0    🔗 See Details

✅  build-and-test workflow has finished in 36 minutes 16 seconds (23 minutes 20 seconds less than main branch avg.) and finished at 3rd Jan, 2023.


Job Failed Steps Tests
unittest-matrix (1.18, internal) -     🔗  ✅ 597  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, internal) -     🔗  ✅ 597  ❌ 0  ⏭ 0    🔗 See Details
correctness-metrics -     🔗  ✅ 2  ❌ 0  ⏭ 0    🔗 See Details
correctness-traces -     🔗  ✅ 17  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, processor) -     🔗  ✅ 1476  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, extension) -     🔗  ✅ 528  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, processor) -     🔗  ✅ 1476  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, extension) -     🔗  ✅ 528  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, receiver-0) -     🔗  ✅ 2563  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, receiver-0) -     🔗  ✅ 2563  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, exporter) -     🔗  ✅ 2450  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, exporter) -     🔗  ✅ 2450  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, other) -     🔗  ✅ 4397  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, receiver-1) -     🔗  ✅ 1886  ❌ 0  ⏭ 0    🔗 See Details
integration-tests -     🔗  ✅ 53  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, other) -     🔗  ✅ 4397  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, receiver-1) -     🔗  ✅ 1886  ❌ 0  ⏭ 0    🔗 See Details
setup-environment -     🔗  N/A See Details
checks -     🔗  N/A See Details
check-collector-module-version -     🔗  N/A See Details
check-codeowners -     🔗  N/A See Details
lint-matrix (receiver-0) -     🔗  N/A See Details
lint-matrix (receiver-1) -     🔗  N/A See Details
lint-matrix (processor) -     🔗  N/A See Details
lint-matrix (exporter) -     🔗  N/A See Details
lint-matrix (extension) -     🔗  N/A See Details
lint-matrix (internal) -     🔗  N/A See Details
lint-matrix (other) -     🔗  N/A See Details
build-examples -     🔗  N/A See Details
lint -     🔗  N/A See Details
unittest (1.19) -     🔗  N/A See Details
unittest (1.18) -     🔗  N/A See Details
cross-compile (darwin, amd64) -     🔗  N/A See Details
cross-compile (darwin, arm64) -     🔗  N/A See Details
cross-compile (linux, 386) -     🔗  N/A See Details
cross-compile (linux, amd64) -     🔗  N/A See Details
cross-compile (linux, arm) -     🔗  N/A See Details
cross-compile (linux, arm64) -     🔗  N/A See Details
cross-compile (linux, ppc64le) -     🔗  N/A See Details
cross-compile (windows, 386) -     🔗  N/A See Details
cross-compile (windows, amd64) -     🔗  N/A See Details
build-package (deb) -     🔗  N/A See Details
build-package (rpm) -     🔗  N/A See Details
windows-msi -     🔗  N/A See Details
publish-check -     🔗  N/A See Details
publish-dev -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details

✅  load-tests workflow has finished in 10 minutes 9 seconds (8 minutes 29 seconds less than main branch avg.) and finished at 3rd Jan, 2023.


Job Failed Steps Tests
loadtest (TestIdleMode) -     🔗  ✅ 1  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestTraceAttributesProcessor) -     🔗  ✅ 3  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestMetric10kDPS|TestMetricsFromFile) -     🔗  ✅ 6  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestTraceBallast1kSPSWithAttrs|TestTraceBallast1kSPSAddAttrs) -     🔗  ✅ 10  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestTraceNoBackend10kSPS|TestTrace1kSPSWithAttrs) -     🔗  ✅ 8  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestMetricResourceProcessor|TestTrace10kSPS) -     🔗  ✅ 12  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestBallastMemory|TestLog10kDPS) -     🔗  ✅ 19  ❌ 0  ⏭ 0    🔗 See Details
setup-environment -     🔗  N/A See Details

🔎 See details on Foresight

*You can configure Foresight comments in your organization settings page.

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this @haoqixu. This is a really nice solution, very concise and accommodates the "line start pattern" issue well.

The only thing I'm wondering about is if using 2x the max log size is more than necessary. In theory, this could cause some unnecessary performance degredation, but in practice I think it is minimal because it only applies when log lines are consistently too long, in which case the user should adjust the max_log_size setting.

In any case, I don't see an obvious better way to handle this right now. Ideally, we could do something like:

  bufLen := maxLogSize
  if /* split on line start token */ {
    bufLen += len(line start token) 
  }

However, we don't have access to that information at this point in the code. Perhaps a future optimization is possible though. I think for now this is a great improvement.

@maokitty
Copy link

maokitty commented Jan 4, 2023

The only thing I'm wondering about is if using 2x the max log size is more than necessary. In theory, this could cause some unnecessary performance degredation, but in practice I think it is minimal because it only applies when log lines are consistently too long, in which case the user should adjust the max_log_size setting.

I find there is a case that in practice will be more frequent catch 2x max log size than expect .

case

otel user may use unified line_start_pattern config for whole company ,when the pattern do not match the log content,it will reach 2x max log size all the time

solution

  1. change buf size from 2*max_log_size to max_log_size

    • the benefit of 2*max_log_size is that it will not split log entry even it is in the border , and i think this is minimal for log start pattern just happen in border, compare to pattern do not match
  2. add log to inform user that truncate happen

    • user should have some place to find out that they should modify their config for a better experience in their backend
    • metric or truncate flag may be too heavy here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants