-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(file source): fix message offset of opendal source #19721
Conversation
batch.push(SourceMessage { | ||
key: None, | ||
payload: Some(std::mem::take(&mut line_buf).into_bytes()), | ||
offset: offset.to_string(), | ||
offset: msg_offset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from my PoV, the meaning of "offset" for file source is too hard to understand and evaluate. (And my brain refuse to think about it.) Is it possible to change to sth like (start,end)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to change to sth like
(start,end)
?
Agree, will consider doing this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some unit tests or e2e tests to demonstrate the behavior>?
Signed-off-by: Richard Chien <stdrc@outlook.com>
f7b0682
to
099c8da
Compare
Will need over 1k files to reproduce the bug. I think the s3 test in main-cron workflow is enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rubber stamp
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
#19654 changed the
OpendalReader::stream_read_object
to yield lines instead of byte buffers, but there's a accidental change of the semantics of theSourceMessage::offset
for file source, which caused main-cron to fail.The change is that, previously, when
OpendalReader::stream_read_object
yieldSourceMessage
s of byte buffers, theoffset
is the STARTIG position of the BYTES, and aftersplit_stream
which splits these bytes by new-line, the newoffset
will be set to the ENDING position of each LINE, which is also the STARTING position of the NEXT LINE. #19654 changed the reader, and the offset is changed to always representing the STARTING position of each LINE, hence causing the error.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.