-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrationsv2: limit batch sizes to migrations.batchSizeBytes (= 100mb by default) #109540
Conversation
Pinging @elastic/kibana-core (Team:Core) |
root: { | ||
appenders: ['default', 'file'], | ||
}, | ||
loggers: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what's the difference, but even though the previous config passed validation, it didn't produce a log file?
ack: going to review on Friday 27.08 |
src/core/server/saved_objects/migrationsv2/migrations_state_action_machine.ts
Show resolved
Hide resolved
src/core/server/saved_objects/migrationsv2/initial_state.test.ts
Outdated
Show resolved
Hide resolved
src/core/server/saved_objects/migrationsv2/integration_tests/batch_size_bytes.test.ts
Outdated
Show resolved
Hide resolved
src/core/server/saved_objects/migrationsv2/integration_tests/batch_size_bytes.test.ts
Outdated
Show resolved
Hide resolved
src/core/server/saved_objects/migrationsv2/integration_tests/migration_from_v1.test.ts
Outdated
Show resolved
Hide resolved
src/core/server/saved_objects/migrationsv2/model/create_batches.test.ts
Outdated
Show resolved
Hide resolved
src/core/server/saved_objects/migrationsv2/model/create_batches.ts
Outdated
Show resolved
Hide resolved
*/ | ||
const NDJSON_NEW_LINE_BYTES = 1; | ||
|
||
const batches = [[]] as [SavedObjectsRawDoc[]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a batch size doesn't affect siblings 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docker change LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for doing this 🧡 !
beforeAll(async () => { | ||
await removeLogFile(); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Should we also delete the logs afterAll
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I wrote this logic for the first time, I didn't add this logic to be able to investigate test failures with the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually added the logs for debugging failed tests (and later used them for further integration testing) since these logs won't interfere with other test runs I think it's worth having them lie around so that you can see what had happened
💔 Backport failed
To backport manually run: |
Friendly reminder: Looks like this PR hasn’t been backported yet. |
… by default) (elastic#109540) * Fix logging for existing integration test * First stab at limiting batches to batchSizeBytes * Fix tests * Fix batch size calculation, NDJSON needs to be terminated by an empty line * Integration tests * Fix type failures * rename migration integration tests and log files to be consistent & more descriptive * Review feedback * Remove duplication of fatal error reasons * migrations.maxBatchSizeBytes to docker environment vars * docs for migrations.maxBatchSizeBytes # Conflicts: # src/core/server/saved_objects/migrationsv2/integration_tests/7_13_0_unknown_types.test.ts # src/core/server/saved_objects/migrationsv2/integration_tests/migration_from_v1.test.ts
… by default) (elastic#109540) * Fix logging for existing integration test * First stab at limiting batches to batchSizeBytes * Fix tests * Fix batch size calculation, NDJSON needs to be terminated by an empty line * Integration tests * Fix type failures * rename migration integration tests and log files to be consistent & more descriptive * Review feedback * Remove duplication of fatal error reasons * migrations.maxBatchSizeBytes to docker environment vars * docs for migrations.maxBatchSizeBytes # Conflicts: # src/core/server/saved_objects/migrationsv2/integration_tests/7_13_0_unknown_types.test.ts # src/core/server/saved_objects/migrationsv2/integration_tests/migration_from_v1.test.ts
Looks like this PR has backport PRs but they still haven't been merged. Please merge them ASAP to keep the branches relatively in sync. |
… 100mb by default) (#109540) (#110967) * Migrationsv2: limit batch sizes to migrations.batchSizeBytes (= 100mb by default) (#109540) * Fix logging for existing integration test * First stab at limiting batches to batchSizeBytes * Fix tests * Fix batch size calculation, NDJSON needs to be terminated by an empty line * Integration tests * Fix type failures * rename migration integration tests and log files to be consistent & more descriptive * Review feedback * Remove duplication of fatal error reasons * migrations.maxBatchSizeBytes to docker environment vars * docs for migrations.maxBatchSizeBytes # Conflicts: # src/core/server/saved_objects/migrationsv2/integration_tests/7_13_0_unknown_types.test.ts # src/core/server/saved_objects/migrationsv2/integration_tests/migration_from_v1.test.ts * Fix tests on 7.x being off by one byte Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Looks like this PR has backport PRs but they still haven't been merged. Please merge them ASAP to keep the branches relatively in sync. |
…= 100mb by default) (#109540) (#110968) * Migrationsv2: limit batch sizes to migrations.batchSizeBytes (= 100mb by default) (#109540) * Fix logging for existing integration test * First stab at limiting batches to batchSizeBytes * Fix tests * Fix batch size calculation, NDJSON needs to be terminated by an empty line * Integration tests * Fix type failures * rename migration integration tests and log files to be consistent & more descriptive * Review feedback * Remove duplication of fatal error reasons * migrations.maxBatchSizeBytes to docker environment vars * docs for migrations.maxBatchSizeBytes # Conflicts: # src/core/server/saved_objects/migrationsv2/integration_tests/7_13_0_unknown_types.test.ts # src/core/server/saved_objects/migrationsv2/integration_tests/migration_from_v1.test.ts * Fix tests on 7.x being off by one byte Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@rudolf how do we test this PR? Can we get access to data you used for testing it? Thanks! |
@bhavyarm I've added a "QA testing" section, but these steps are already exercised by the integration tests, so I shared some real world saved objects over slack. |
💔 Build Failed
Failed CI StepsTest FailuresKibana Pipeline / general / X-Pack API Integration Tests.x-pack/test/api_integration/apis/uptime/rest/telemetry_collectors_fleet·ts.apis uptime uptime REST endpoints with generated data telemetry collectors fleet should receive expected results for fleet managed monitors after calling monitor loggingStandard Out
Stack Trace
Kibana Pipeline / general / X-Pack API Integration Tests.x-pack/test/api_integration/apis/uptime/rest/telemetry_collectors_fleet·ts.apis uptime uptime REST endpoints with generated data telemetry collectors fleet should receive expected results for fleet managed monitors after calling monitor loggingStandard Out
Stack Trace
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
Summary
Fixes #107641
Migrationsv2
_id
to reduce memory consumption and avoid logging document contents when there's an exceptionRelease notes
Fixes a bug that would cause Kibana upgrade migrations to fail after receiving a 413 Request Entity Too Large response from Elasticsearch if the
.kibana*
indices contained many large documents.QA Testing
(These scenarios are already tested in integration tests)
Regression testing on cloud
I tested this on cloud by loading
src/core/server/saved_objects/migrationsv2/integration_tests/archives/7.7.2_xpack_100k_obj.zip
into a deployment and then comparing the migrations duration on 7.14.1 vs 7.15.0 (BC). Because 100k saved objects means we're doing 100 requests in serial, there's quite a lot of variance between runs (30 seconds). But migrations took a similar amount of time with the best case results for both versions being only a few seconds apart.Looking at monitoring data the event-loop delay only had one small spike up to 8ms so it's unlikely we're CPU bound.
Because of the high memory consumption I had to test this with 2GB instances which have slightly more CPU power than our 1GB instances. So I will repeat this once #111911 is merged.
Checklist
Delete any items that are not applicable to this PR.
Risk Matrix
Delete this section if it is not applicable to this PR.
Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.
When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:
For maintainers