Fix memory explosion when auto-calculating canvas range extents with dask #717

jonmmease · 2019-02-23T18:50:23Z

This PR improves the Canvas autorange logic to avoid bringing each full x/y array into memory in order to compute the min/max values.

This fixes #668 for me. Here is the test case I I've been using to diagnose and test this PR. I started a Dask distributed instance on a large workstation and persisted the ~3-billion point OSM dataset into memory. After this persist, the Dask dashboard reports ~28GB memory used. Then I set up a VM with 8GB of RAM and connected it as a client of the distributed scheduler. I then performed a cvs.points aggregation without specifying x_range/y_range, causing the autorange logic to be invoked.

import dask.dataframe as dd
import datashader as ds
osm = dd.read_parquet('/path/to/osm-3billion.parq/').persist()
cvs = ds.Canvas()
agg = cvs.points(osm, x='x', y='y')

Before these changes, the memory usage of the client would climb steadily until the kernel died. With these changes, the aggregation completes successfully with no noticeable increase in memory usage on the client.

cc @jacobtomlinson

jbednar · 2019-02-23T22:14:50Z

Looks great, thanks!

fix memory explosion when computing extents with dask

ce0f9f6

jbednar added the in progress label Feb 23, 2019

jbednar merged commit 6ff6276 into master Feb 23, 2019

jbednar deleted the bug_668 branch February 23, 2019 22:15

jbednar removed the in progress label Feb 23, 2019

jonmmease mentioned this pull request Feb 24, 2019

Add pandas ExtensionArray for storing homogeneous ragged arrays #687

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memory explosion when auto-calculating canvas range extents with dask #717

Fix memory explosion when auto-calculating canvas range extents with dask #717

jonmmease commented Feb 23, 2019

jbednar commented Feb 23, 2019

Fix memory explosion when auto-calculating canvas range extents with dask #717

Fix memory explosion when auto-calculating canvas range extents with dask #717

Conversation

jonmmease commented Feb 23, 2019

jbednar commented Feb 23, 2019