Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tx/rm_stm: remove mem_state #18076

Merged
merged 3 commits into from
Apr 29, 2024
Merged

Conversation

bharathv
Copy link
Contributor

@bharathv bharathv commented Apr 25, 2024

Another step towards consolidated state management.

This PR removes mem_state struct from rm_stm entirely. Main change in this PR is to get rid of mem_state::estimated map that is supposed to hold the estimated offset of the first data batch in a transaction while its replication is in progress. This estimate is in place so that any racy LSO computation on the partition does not exceed the (future) replicated offset. We remove that by redefining the transaction boundary to include the begin/fence batch in the transaction. first data batch is guaranteed to happen after the fence batch, so clamping the LSO at the fence batch by redefining the transaction boundaries ensures that in replication batches are not accounted for LSO.

The advantage of removing mem state is that the entire transactional state left in the stm is replayed from the log, so it is easy it to port to the new consolidated producer state definitions.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

the method factors out logic to update transactional state, no logic
changes.
@bharathv
Copy link
Contributor Author

/dt

@bharathv
Copy link
Contributor Author

/dt

@bharathv
Copy link
Contributor Author

/dt

@bharathv
Copy link
Contributor Author

/ci-repeat 3

'estimated' was used as an estimate for the first data batch offset (so
LSO cannot exceed it) while replication is in progress. This is the last
in memory state that is pending for cleanup.

This commit redefines the transacation boundary to begin from fence batch
rather the first data batch which allows us to get rid of this
additional state which lays the foundation for producer state move.
@bharathv bharathv marked this pull request as ready for review April 26, 2024 05:32
@bharathv bharathv requested review from mmaslankaprv and ztlpn April 26, 2024 05:32
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 26, 2024

@@ -561,6 +541,8 @@ class rm_stm final : public raft::persisted_stm<> {
ss::gate _gate;
// Highest producer ID applied to this stm.
model::producer_id _highest_producer_id;
// for monotonicity of computed LSO.
model::offset _last_known_lso{-1};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do not need to be initialized, it will be default initialized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made a small change in sync() related to this.. retained this to be -1 to be compatible with previous code.

@bharathv bharathv merged commit 6180623 into redpanda-data:dev Apr 29, 2024
17 checks passed
@bharathv bharathv deleted the rem_mem_state branch April 29, 2024 15:28
bharathv added a commit to bharathv/redpanda that referenced this pull request Apr 30, 2024
Chaos logs showed that there can be duplicate (back to back) begin_tx
requests due to some certain client quirks beyond our control. We had
some checks in place to ignore them but they were invalidated by
redpanda-data#18076 which redefined the
transaction boundries.

After redpanda-data#18076 an entry in the ongoing_map does not mean the data has been
replicated as a part of the transaction. This PR adjusts the checks
accordingly.

Note: This code is going away soon, so we will have better API in the
producer_state to make the code more easily understandable. This PR is
to make chaos happy until then.
bharathv added a commit to bharathv/redpanda that referenced this pull request May 3, 2024
In earlier versions we were not considering transactions without data
batches as candidates for expiry, thats not the case any more with
redpanda-data#18076, adding a
regression test to future proof.
bharathv added a commit to bharathv/redpanda that referenced this pull request May 8, 2024
In earlier versions we were not considering transactions without data
batches as candidates for expiry, thats not the case any more with
redpanda-data#18076, adding a
regression test to future proof.
Lazin pushed a commit to Lazin/redpanda that referenced this pull request Jun 1, 2024
In earlier versions we were not considering transactions without data
batches as candidates for expiry, thats not the case any more with
redpanda-data#18076, adding a
regression test to future proof.
Lazin pushed a commit to Lazin/redpanda that referenced this pull request Jun 12, 2024
In earlier versions we were not considering transactions without data
batches as candidates for expiry, thats not the case any more with
redpanda-data#18076, adding a
regression test to future proof.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants