Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the incorrect size calculation logic of FileSink. #9429

Closed
wants to merge 1 commit into from

Conversation

wypb
Copy link
Contributor

@wypb wypb commented Apr 10, 2024

I was writing unit tests for the Textfile writer internally and found
that data written to MemorySink may be overwritten.

I investigated the reason and found that FileSink::writeImpl can accept multiple buffers.
https://github.com/facebookincubator/velox/blob/main/velox/dwio/common/FileSink.cpp#L65-L80
Every time the data of a buffer is written, we should update FileSink#size_
instead of updating after writing all buffers. Because MemorySink::write
relies on FileSink#size_ to write data to MemorySink#data_.
https://github.com/facebookincubator/velox/blob/main/velox/dwio/common/FileSink.cpp#L184-L189

CC: @mbasmanova @xiaoxmeng

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 10, 2024
Copy link

netlify bot commented Apr 10, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 89113c9
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66177612449100000824bdbd

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wypb Thank you for the fix.

@xiaoxmeng Meng, would you help review this fix?

CC: @duanmeng @bikramSingh91

Copy link
Collaborator

@duanmeng duanmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wypb Looks great % some nits.

velox/dwio/common/FileSink.cpp Outdated Show resolved Hide resolved
velox/dwio/common/tests/MemorySinkTest.cpp Outdated Show resolved Hide resolved
@wypb wypb force-pushed the fix_memory branch 2 times, most recently from 07944c7 to cb25213 Compare April 10, 2024 10:17
@xiaoxmeng
Copy link
Contributor

@wypb Thank you for the fix.

@xiaoxmeng Meng, would you help review this fix?

CC: @duanmeng @bikramSingh91

@mbasmanova sure, will do

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wypb good catch. Thanks for the fix % minors!

velox/dwio/common/FileSink.cpp Outdated Show resolved Hide resolved
velox/dwio/common/tests/MemorySinkTest.cpp Show resolved Hide resolved
buffers.back().append(chars[i]);
}

EXPECT_EQ(buffers.size(), 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: ASSERT__EQ put expected value second while EXPECT_EQ put the expected value first. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified it to ASSERT__EQ.

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wypb thanks for the update % one more nit for test!

velox/dwio/common/tests/MemorySinkTest.cpp Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng merged this pull request in d4f8d85.

Copy link

Conbench analyzed the 1 benchmark run on commit d4f8d855.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@wypb wypb deleted the fix_memory branch April 11, 2024 09:25
yanngyoung pushed a commit to yanngyoung/velox that referenced this pull request Apr 12, 2024
…tor#9429)

Summary:
I was writing unit tests for the Textfile writer internally and found
that data written to `MemorySink` may be overwritten.

I investigated the reason and found that `FileSink::writeImpl` can accept multiple buffers.
 https://github.com/facebookincubator/velox/blob/main/velox/dwio/common/FileSink.cpp#L65-L80
Every time the data of a buffer is written, we should update `FileSink#size_`
 instead of updating after writing all buffers. Because `MemorySink::write`
relies on `FileSink#size_` to write data to `MemorySink#data_`.
https://github.com/facebookincubator/velox/blob/main/velox/dwio/common/FileSink.cpp#L184-L189

CC: mbasmanova xiaoxmeng

Pull Request resolved: facebookincubator#9429

Reviewed By: amitkdutta, kewang1024

Differential Revision: D56004384

Pulled By: xiaoxmeng

fbshipit-source-id: c824c5e481866abef12f99d570d41a852bb8b2e0
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
…tor#9429)

Summary:
I was writing unit tests for the Textfile writer internally and found
that data written to `MemorySink` may be overwritten.

I investigated the reason and found that `FileSink::writeImpl` can accept multiple buffers.
 https://github.com/facebookincubator/velox/blob/main/velox/dwio/common/FileSink.cpp#L65-L80
Every time the data of a buffer is written, we should update `FileSink#size_`
 instead of updating after writing all buffers. Because `MemorySink::write`
relies on `FileSink#size_` to write data to `MemorySink#data_`.
https://github.com/facebookincubator/velox/blob/main/velox/dwio/common/FileSink.cpp#L184-L189

CC: mbasmanova xiaoxmeng

Pull Request resolved: facebookincubator#9429

Reviewed By: amitkdutta, kewang1024

Differential Revision: D56004384

Pulled By: xiaoxmeng

fbshipit-source-id: c824c5e481866abef12f99d570d41a852bb8b2e0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants