Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming Datagrid #498

Merged
merged 22 commits into from
Dec 16, 2024
Merged

Streaming Datagrid #498

merged 22 commits into from
Dec 16, 2024

Conversation

martinRenou
Copy link
Member

@martinRenou martinRenou commented Apr 8, 2024

Provide a new widget StreamingDatagrid which lazily requests data to the back-end upon scrolling/resizing the grid. It only requests data needed to render the current viewport.

This is a very early draft for discussion, it requires more testing.

Screencast.from.2024-04-08.11-34-30.webm

Wish-list for this PR to be merge-able:

  • Filtering/sorting should be done back-end side by default for StreamingDatagrid, simply because the front-end does not have the whole dataset. We may want to provide the ability to do back-end filtering for the main Datagrid widget too, but front-end filtering should stay the default to not break current applications.
  • Unit testing + Galata testing
  • Fix row headers
  • Provide a tick method to the Python API to allow the back-end to notify the front-end the data has changed
  • Fix nested hierarchy support

@ianthomas23
Copy link
Contributor

I've tried this out on my dev machine and it works really well.

I can help with the backend filtering/sorting if you wish, in this or a separate PR.

@martinRenou
Copy link
Member Author

martinRenou commented Apr 9, 2024

Thanks a lot for the review!

I can help with the backend filtering/sorting if you wish, in this or a separate PR.

💯 Feel free to push on my branch, I'm adding you as a collaborator on my fork

ipydatagrid/datagrid.py Outdated Show resolved Hide resolved
@ianthomas23
Copy link
Contributor

ianthomas23 commented Sep 6, 2024

I have added filtering and sorting on the backend when streaming. Here is a screencast (although the mouse pointer isn't shown which is unfortunate):

Screencast.from.2024-09-06.12-07-26.webm

Summary of changes:

  1. There are two new messages from the frontend to the backend, frontend-transforms when the transforms are changed, which transforms the data, syncs the new _row_count to the frontend and requests a tick to update the screen, and unique-values-request which sends the unique values of single column back to the frontend.
  2. The backend stores the transformed data so that it is not recalculated on every scroll in the UI, and each of the sort and filter transforms are applied using pandas functions.
  3. Getting unique column values it the frontend is now async because of the round-trip messages.
  4. TypeScript DataGridModel has a new _transform_on_backend boolean although it is only currently True for a StreamingDataGridModel eventually it might be an option for non-streaming datagrid.

Things to discuss and/or do:

  1. No new tests yet.
  2. The screen updates twice on a filter change as shown in the screencast, the first time because we have changed the row count but still have the old data, this triggers a request for the new data which is the second update. I'd like to do this better, but it may be good enough for an initial version.
  3. To send a unique-values-request message to the backend a StreamingViewBasedJSONModel calls DataGridModel.send, and to do this I am passing the _dataModel into the StreamingViewBasedJSONModel. It isn't good separation of concerns to have the StreamingViewBasedJSONModel know all about the DataGridModel just to use its message passing. I presume there is an alternative message-passing implementation that can be used here.
  4. Unique values are sent from backend to frontend as a serialised numpy array. This should be binary serialised instead. Fixed by d2150f4.
  5. Need to investigate programmatic changes on the backend (e.g. change of filters or underlying data) to see if they work with the new functionality.
  6. Initial click in the Filter by Value dialog does nothing. Fixed by 18e095d.

@martinRenou
Copy link
Member Author

Thanks a lot @ianthomas23 !! That looks exciting :)

I'll look into your changes in the coming days

Copy link
Member Author

@martinRenou martinRenou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ian! I have a small suggestion concerning the communication between the front-end / back-end

ipydatagrid/datagrid.py Outdated Show resolved Hide resolved
martinRenou and others added 14 commits October 29, 2024 10:05
Signed-off-by: martinRenou <martin.renou@gmail.com>
Signed-off-by: martinRenou <martin.renou@gmail.com>
Signed-off-by: martinRenou <martin.renou@gmail.com>
Signed-off-by: martinRenou <martin.renou@gmail.com>
Signed-off-by: martinRenou <martin.renou@gmail.com>
Signed-off-by: martinRenou <martin.renou@gmail.com>
Signed-off-by: martinRenou <martin.renou@gmail.com>
Signed-off-by: martinRenou <martin.renou@gmail.com>
+ Fix cases for "raw" buffers

Signed-off-by: martinRenou <martin.renou@gmail.com>
Signed-off-by: martinRenou <martin.renou@gmail.com>
Signed-off-by: Ian Thomas <ianthomas23@gmail.com>
@ianthomas23
Copy link
Contributor

This now has the transfer of unique values (for the filter by value dialog) from backend to frontend in binary format. This reuses the existing serialize/deserialize code even though it is simpler (single column only) rather than have multiple implementations. On the TypeScript side I've moved the deserialize code into a new file to avoid circular dependencies.

Sending of the selected values to filter on from the frontend to backend in the _transforms does not use binary serialization.

@ianthomas23
Copy link
Contributor

I've noticed that selection does not work for StreamingViewBasedJSONModel. data is an empty object here:
https://github.com/martinRenou/ipydatagrid/blob/d2150f40c2b0d7852c5117b932a3f9907e3d5499/js/core/viewbasedjsonmodel.ts#L517

@ianthomas23
Copy link
Contributor

CI is passing after commit 1248890 which comprises:

  • Removed raise RuntimeError in StreamingDataGrid.transform as this is now implemented (but not tested).
  • Changed arguments passed to ViewBasedJSONModel constructor in test code.
  • ViewBasedJSONModel is now async as StreamingViewBasedJSONModel needs to send a message to the backend and wait for the response.

We need to add new tests for the new functionality.

@ianthomas23
Copy link
Contributor

Here's a screencast showing select not working:

ipydatagrid.mp4

It shows the non-streaming case working as intended, you can select a column, row, individual cell or rectangle of cells. Then the streaming case in which you cannot select anything. There's an error shown in the console about an expected data structure being empty.

@martinRenou martinRenou changed the title DRAFT: Streaming Datagrid Streaming Datagrid Dec 16, 2024
@martinRenou
Copy link
Member Author

martinRenou commented Dec 16, 2024

It turns out selections do work, you only need to pass the proper argument to enable selections:

Uploading Screencast From 2024-12-16 11-51-47.mp4…

They do raise an exception JS side though. I suggest we open a follow-up issue on this and merge

@martinRenou martinRenou marked this pull request as ready for review December 16, 2024 11:26
@martinRenou martinRenou merged commit a92d285 into jupyter-widgets:main Dec 16, 2024
13 checks passed
@martinRenou martinRenou deleted the streaming branch December 16, 2024 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants