Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpk: add support for tombstone records in produce and consume #23264

Merged
merged 5 commits into from
Sep 17, 2024

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Sep 10, 2024

For rpk topic produce:

To differentiate tombstones record with a null value (null, as produced by rpk topic produce -Z) from records with empty string-values ("") in rpk topic produce, assign the empty-string value case to an empty-byte slice.

For rpk topic consume:

To differentiate tombstone records with null values (null, as produced by rpk topic produce -Z) from records with empty string values ("") in rpk topic consume, use a *string on the rpk side for JSON marshalling.

The output from rpk topic consume will show "" for records with empty-string values, while tombstone records will have their values omitted.

An example of the output from rpk topic consume with tombstone and non-tombstone records now looks like:

{
  "topic": "topic",
  "key": "this_is_a_tombstone_record",
  "timestamp": 1725999704063,
  "partition": 0,
  "offset": 0
}
{
  "topic": "topic",
  "key": "this_is_a_record_with_empty_string",
  "value": "",
  "timestamp": 1725999709283,
  "partition": 0,
  "offset": 1
}
{
  "topic": "topic",
  "key": "this_is_a_record_with_null_string",
  "value": "null",
  "timestamp": 1725999721460,
  "partition": 0,
  "offset": 2
}

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

Improvements

  • Add support for differentiating tombstone records from empty-string value records in rpk produce and rpk consume.

@@ -221,6 +221,9 @@ func (c *consumer) consume(ctx context.Context) {
r.Key = decKey
}
}
if r.Value == nil {
r.Value = []byte("null")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misread the file originally. I don't think we want this, this hijacks the output always for any null value.
I don't have a solid idea yet, but one idea is to add a formatting directive to print null or true if the value is null. I'm not sure yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can handle this case better if we change our Value to a *string type: https://gist.github.com/WillemKauf/459b2f7ece149a54a6b61f35d5706ec1

Now, our output is

{
  "topic": "topic",
  "key": "this_is_a_record_with_null_string",
  "value": "null",
  "timestamp": 1726000933223,
  "partition": 0,
  "offset": 3
}
{
  "topic": "topic",
  "key": "this_is_not_a_tombstone_record",
  "value": "",
  "timestamp": 1726000937808,
  "partition": 0,
  "offset": 4
}
{
  "topic": "topic",
  "key": "this_is_a_tombstone_record",
  "value": null,
  "timestamp": 1726000941565,
  "partition": 0,
  "offset": 5
}

Copy link
Contributor Author

@WillemKauf WillemKauf Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could either:

  1. Remove omitempty and have the above output
  2. Leave it, and have the output look like:
{
  "topic": "topic",
  "key": "this_is_a_record_with_null_string",
  "value": "null",
  "timestamp": 1726000933223,
  "partition": 0,
  "offset": 3
}
{
  "topic": "topic",
  "key": "this_is_not_a_tombstone_record",
  "value": "",
  "timestamp": 1726000937808,
  "partition": 0,
  "offset": 4
}
{
  "topic": "topic",
  "key": "this_is_a_tombstone_record",
  "timestamp": 1726000941565,
  "partition": 0,
  "offset": 5
}

Now, the tombstone record has its value omitted, but the record with an empty-string value is still included.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I'm ok with this change. I can make a corresponding change to franz-go to support querying null-ness vs. emptiness -- %v{is_null} or something -- but for the json output here in rpk, this is a fine fix.

Copy link
Contributor Author

@WillemKauf WillemKauf Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! This change has been made in the last force push to this PR (omitempty is kept, tombstones have no value printed, whereas empty string value records have their values print as "")

@WillemKauf
Copy link
Contributor Author

Force push to:

  • Update behavior for rpk topic consume. Please refer to cover letter for details.

To differentiate tombstone records with null values (`null`) from records
with empty string values (`""`), use a `*string` on the `rpk` side.

The output from `rpk topic consume` will show an empty string for records,
while tombstone records will have their values omitted.
To differentiate a record with an empty string-value (`""`) in
`rpk topic produce` from a record with a null value (`null`, as produced
by `rpk topic produce -Z`), assign the empty-string to an empty-byte slice.
@WillemKauf
Copy link
Contributor Author

Force push to rebase to upstream dev

@Deflaimun
Copy link
Contributor

Recommend adding a new paragraph in rpk topic produce and rpk topic consume to explain the new change. Your original description in the PR is already pretty good. The examples on the other hand could be added to docs website.

Something like (adjust when necessary) :
rpk topic produce handles null values and empty string values distinctly. Null values, when using rpk topic produce -Z, are represented as a special case for tombstone records, differentiating them from records with empty string values.

rpk topic consume handles null values and empty string values distinctly. Null values, when using rpk topic produce -Z, are represented as a special case for tombstone records, differentiating them from records with empty string values. The output from rpk topic consume will show "" for records with empty-string values, while tombstone records will have their values omitted.

To give better guidance around tombstone records.
@WillemKauf
Copy link
Contributor Author

Push to:

  • Added paragraphs to rpk topic produce and rpk topic consume that provides documentation around tombstone record behaviour.

@WillemKauf WillemKauf merged commit b47fc70 into redpanda-data:dev Sep 17, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants