schema-reader: Shutdown service if corrupt entries in `_schemas` topic #936

nosahama · 2024-08-21T13:14:41Z

About this change - What it does

These breaking changes are guarded by environmental variables

These can be toggled by the various users, the default values are shown below:

KARAPACE_KAFKA_SCHEMA_READER_STRICT_MODE: false
KARAPACE_KAFKA_RETRIABLE_ERRORS_SILENCED: true

The logic below only applies when `KARAPACE_KAFKA_SCHEMA_READER_STRICT_MODE` is set to `true`, everything else remains the same, thus no test changes.

Previously, when we encounter errors within the _schemas topic, we would continue the message loading and skip the problematic schema. This is not ideal as it might leave the application with corrupt schema data and the side-effects could be grave.

What we do now is to kill the service, log the errors and allow a graceful shutdown. We will follow this work by adding metrics for such cases.

This will also stop the service post backup-v1 restore if there are any corrupt schemas present in the backup log file

We can see below that the shutdown is graceful:

Adding restart: always to docker compose shows the behaviour, the service never proceeds past that stage, i think based on the restart behaviour for systemd, we might need to rely on the alerts and then intervene otherwise it'd leave the service in a crash loop, we need to verify if there are SLOs or metrics setup somewhere to track at least service uptime, which will be affected by this.

github-actions · 2024-08-22T15:09:40Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
karapace
config.py
errors.py
in_memory_database.py
karapace_all.py
messaging.py
rapu.py
schema_reader.py					193, 203-207, 370, 519-521, 535-537, 551-556, 566-567, 603
schema_registry.py
schema_registry_apis.py
statsd.py
utils.py
karapace/compatibility
__init__.py
Project Total

_{This report was generated by python-coverage-comment-action}

karapace/schema_reader.py

Previously, when we encounter errors within the `_schemas` topic, we would continue the message loading and skip the problematic schema. This is not ideal as it might leave the application with corrupt schema data and the side-effects could be grave. What we do now is to kill the service, log the errors and allow a graceful shutdown. We will follow this work by adding metrics for such cases.

karapace/schema_reader.py

container/compose.yml

karapace/config.py

We'd like to allow the shutdown logic on corrutp schema be guarded by a feature flag so we do not surprise customers, this is disabled by default.

jjaakola-aiven

LGTM

nosahama requested review from a team as code owners August 21, 2024 13:14

nosahama force-pushed the nosahama/abort-service-on-corrupt-schema branch 4 times, most recently from d4cfcbb to 5040bf1 Compare August 22, 2024 14:02

nosahama force-pushed the nosahama/abort-service-on-corrupt-schema branch 3 times, most recently from 92d127e to 9086eeb Compare August 27, 2024 08:20

jjaakola-aiven reviewed Aug 27, 2024

View reviewed changes

karapace/schema_reader.py Outdated Show resolved Hide resolved

karapace/schema_reader.py Outdated Show resolved Hide resolved

nosahama force-pushed the nosahama/abort-service-on-corrupt-schema branch from 9086eeb to 1cdd0f1 Compare August 27, 2024 13:20

jjaakola-aiven reviewed Aug 29, 2024

View reviewed changes

karapace/schema_reader.py Show resolved Hide resolved

This was referenced Aug 29, 2024

tests,schema_reader: kafka message handling error tests #941

Merged

chore, schema-coordinator: handle aiokafka retriable errors #934

Closed

nosahama force-pushed the nosahama/abort-service-on-corrupt-schema branch 2 times, most recently from d2f22c9 to 2e716a4 Compare August 30, 2024 14:44

nosahama commented Aug 30, 2024

View reviewed changes

container/compose.yml Show resolved Hide resolved

MichaelTansiniAiven reviewed Sep 2, 2024

View reviewed changes

container/compose.yml Show resolved Hide resolved

nosahama force-pushed the nosahama/abort-service-on-corrupt-schema branch 5 times, most recently from c6588b7 to 751c662 Compare September 3, 2024 16:22

jjaakola-aiven reviewed Sep 4, 2024

View reviewed changes

karapace/config.py Show resolved Hide resolved

schema-reader: Wrap shutdown feature around flag

2d9f1e8

We'd like to allow the shutdown logic on corrutp schema be guarded by a feature flag so we do not surprise customers, this is disabled by default.

nosahama force-pushed the nosahama/abort-service-on-corrupt-schema branch from 751c662 to 2d9f1e8 Compare September 4, 2024 12:45

jjaakola-aiven approved these changes Sep 4, 2024

View reviewed changes

jjaakola-aiven enabled auto-merge September 4, 2024 16:06

jjaakola-aiven merged commit 800f3cb into main Sep 5, 2024
10 checks passed

jjaakola-aiven deleted the nosahama/abort-service-on-corrupt-schema branch September 5, 2024 06:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schema-reader: Shutdown service if corrupt entries in `_schemas` topic #936

schema-reader: Shutdown service if corrupt entries in `_schemas` topic #936

nosahama commented Aug 21, 2024 •

edited

Loading

github-actions bot commented Aug 22, 2024 •

edited

Loading

jjaakola-aiven left a comment

schema-reader: Shutdown service if corrupt entries in _schemas topic #936

schema-reader: Shutdown service if corrupt entries in _schemas topic #936

Conversation

nosahama commented Aug 21, 2024 • edited Loading

About this change - What it does

These breaking changes are guarded by environmental variables

The logic below only applies when KARAPACE_KAFKA_SCHEMA_READER_STRICT_MODE is set to true, everything else remains the same, thus no test changes.

github-actions bot commented Aug 22, 2024 • edited Loading

Coverage report

jjaakola-aiven left a comment

Choose a reason for hiding this comment

schema-reader: Shutdown service if corrupt entries in `_schemas` topic #936

schema-reader: Shutdown service if corrupt entries in `_schemas` topic #936

nosahama commented Aug 21, 2024 •

edited

Loading

The logic below only applies when `KARAPACE_KAFKA_SCHEMA_READER_STRICT_MODE` is set to `true`, everything else remains the same, thus no test changes.

github-actions bot commented Aug 22, 2024 •

edited

Loading