Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unnecessary to cache the batches that will be sent to Python in FlatMapGroupInPandas. #2238

Closed
firestarman opened this issue Apr 23, 2021 · 1 comment · Fixed by #2239
Assignees
Labels
bug Something isn't working

Comments

@firestarman
Copy link
Collaborator

firestarman commented Apr 23, 2021

Actually it should be an improvment more than a bug.
The code snip is as below,

        .map { groupBatch =>
          // Cache the input batches for release after writing done.
          queue.add(groupBatch, spillCallback)
          groupBatch
        }

The Python runner will close the batches after writing them to Python, so no need to cache them in the queue for release.
Need to clean this.

@firestarman firestarman added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 23, 2021
@firestarman firestarman self-assigned this Apr 23, 2021
@firestarman
Copy link
Collaborator Author

firestarman commented Apr 23, 2021

MapInPandas has the similar issue. Code snip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants