Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dealing with very large dask graphs #266

Closed
rabernat opened this issue May 19, 2018 · 8 comments
Closed

dealing with very large dask graphs #266

rabernat opened this issue May 19, 2018 · 8 comments

Comments

@rabernat
Copy link
Member

Following up on today's early discussion with @mrocklin, here is an example of a calculation that is currently too "big" for our current pangeo.pydata.org cluster to handle.

It's a pretty simple case:

import xarray as xr
import gcsfs

# enable xarray's new cftime index for non-standard dates
xr.set_options(enable_cftimeindex=True)

# open the dataset (11.3 TB in 292000 chunks)
gcsmap = gcsfs.mapping.GCSMap('pangeo-data/cm2.6/control/temp_salt_u_v-5day_avg/')
ds = xr.open_zarr(gcsmap)

# calculate the squares of all the data variables
ds_squared = ds**2
# and put them back into the original dataset (doubling the size and number of tasks)
for var in ds_squared.data_vars:
    ds[var + '_squared'] = ds_squared[var]

# calculate the monthly-mean climatology and persist for further analysis / visualization
ds_mm_clim = ds.groupby('time.month').mean(dim='time')
# 186 GB in 4800 chunks
ds_mm_clim.persist()

I ran this on a cluster with 100 workers (~600 GB of memory). It got to the scheduler and showed up on the dashboard after ~10 minutes. There were over a million tasks. Workers started to crash and then the notebook crashed.

Some thoughts:

  • The dask graph itself must be huge. The workers only have 6GB of memory, and the notebook 14. Everyone is running out of memory. Do we just need way more memory? That can be accomplished by changing the image types. We could get orders of magnitude more memory if that's what it will take.
  • The graph must be very repetitive. Is there an opportunity for some sort of optimization / compression to reduce the size of the graph?
  • Do we really need to send the whole calculation through it once? Or is possible to "stream" this calculation somehow.
@rabernat
Copy link
Member Author

Possibly relevant observation: watching the dashboard, it looks like all the tasks are first sent to a single worker, who then redistributes them to the other workers. This seems like a possible failure point.

@mrocklin
Copy link
Member

I started writing this yesterday but was interrupted, here are some thoughts on the general topic: dask/dask#3514

@mrocklin
Copy link
Member

@rabernat along the lines of the first suggestion in the issue posted above, reduce per-task overhead, lets look at one of the errors from a previous issue:

distributed.protocol.pickle - INFO - Failed to deserialize b'\x80\x04\x95}\n\x00\x00\x00\x00\x00\x00\x8c\x14xarray.core.indexing\x94\x8c!ImplicitToExplicitIndexingAdapter\x94\x93\x94)\x81\x94}\x94(\x8c\x05array\x94h\x00\x8c\x17LazilyOuterIndexedArray\x94\x93\x94)\x81\x94}\x94(h\x05\x8c\x17xarray.coding.variables\x94\x8c\x19_ElementwiseFunctionArray\x94\x93\x94)\x81\x94}\x94(h\x05h\x07)\x81\x94}\x94(h\x05\x8c\x14xarray.backends.zarr\x94\x8c\x10ZarrArrayWrapper\x94\x93\x94)\x81\x94}\x94(\x8c\tdatastore\x94h\x11\x8c\tZarrStore\x94\x93\x94)\x81\x94}\x94(\x8c\x02ds\x94\x8c\x0ezarr.hierarchy\x94\x8c\x05Group\x94\x93\x94)\x81\x94(\x8c\rgcsfs.mapping\x94\x8c\x06GCSMap\x94\x93\x94)\x81\x94\x8c\ngcsfs.core\x94\x8c\rGCSFileSystem\x94\x93\x94)\x81\x94}\x94(\x8c\x07project\x94\x8c\rpangeo-181919\x94\x8c\x06access\x94\x8c\x0cfull_control\x94\x8c\x05scope\x94\x8c7https://www.googleapis.com/auth/devstorage.full_control\x94\x8c\x0bconsistency\x94\x8c\x04none\x94\x8c\x05token\x94N\x8c\x07session\x94\x8c\x1egoogle.auth.transport.requests\x94\x8c\x11AuthorizedSession\x94\x93\x94)\x81\x94}\x94(\x8c\x07headers\x94\x8c\x13requests.structures\x94\x8c\x13CaseInsensitiveDict\x94\x93\x94)\x81\x94}\x94\x8c\x06_store\x94\x8c\x0bcollections\x94\x8c\x0bOrderedDict\x94\x93\x94)R\x94(\x8c\nuser-agent\x94\x8c\nUser-Agent\x94\x8c\x16python-requests/2.18.4\x94\x86\x94\x8c\x0faccept-encoding\x94\x8c\x0fAccept-Encoding\x94\x8c\rgzip, deflate\x94\x86\x94\x8c\x06accept\x94\x8c\x06Accept\x94\x8c\x03*/*\x94\x86\x94\x8c\nconnection\x94\x8c\nConnection\x94\x8c\nkeep-alive\x94\x86\x94usb\x8c\x07cookies\x94\x8c\x10requests.cookies\x94\x8c\x11RequestsCookieJar\x94\x93\x94)\x81\x94}\x94(\x8c\x07_policy\x94\x8c\x0ehttp.cookiejar\x94\x8c\x13DefaultCookiePolicy\x94\x93\x94)\x81\x94}\x94(\x8c\x08netscape\x94\x88\x8c\x07rfc2965\x94\x89\x8c\x13rfc2109_as_netscape\x94N\x8c\x0chide_cookie2\x94\x89\x8c\rstrict_domain\x94\x89\x8c\x1bstrict_rfc2965_unverifiable\x94\x88\x8c\x16strict_ns_unverifiable\x94\x89\x8c\x10strict_ns_domain\x94K\x00\x8c\x1cstrict_ns_set_initial_dollar\x94\x89\x8c\x12strict_ns_set_path\x94\x89\x8c\x10_blocked_domains\x94)\x8c\x10_allowed_domains\x94N\x8c\x04_now\x94J\x93\x81\xffZub\x8c\x08_cookies\x94}\x94hkJ\x93\x81\xffZub\x8c\x04auth\x94N\x8c\x07proxies\x94}\x94\x8c\x05hooks\x94}\x94\x8c\x08response\x94]\x94s\x8c\x06params\x94}\x94\x8c\x06verify\x94\x88\x8c\x04cert\x94N\x8c\x08prefetch\x94N\x8c\x08adapters\x94hA)R\x94(\x8c\x08https://\x94\x8c\x11requests.adapters\x94\x8c\x0bHTTPAdapter\x94\x93\x94)\x81\x94}\x94(\x8c\x0bmax_retries\x94\x8c\x12urllib3.util.retry\x94\x8c\x05Retry\x94\x93\x94)\x81\x94}\x94(\x8c\x05total\x94K\x00\x8c\x07connect\x94N\x8c\x04read\x94\x89\x8c\x06status\x94N\x8c\x08redirect\x94N\x8c\x10status_forcelist\x94\x8f\x94\x8c\x10method_whitelist\x94(\x8c\x03GET\x94\x8c\x06DELETE\x94\x8c\x05TRACE\x94\x8c\x07OPTIONS\x94\x8c\x04HEAD\x94\x8c\x03PUT\x94\x91\x94\x8c\x0ebackoff_factor\x94K\x00\x8c\x11raise_on_redirect\x94\x88\x8c\x0fraise_on_status\x94\x88\x8c\x07history\x94)\x8c\x1arespect_retry_after_header\x94\x88ub\x8c\x06config\x94}\x94\x8c\x11_pool_connections\x94K\n\x8c\r_pool_maxsize\x94K\n\x8c\x0b_pool_block\x94\x89ub\x8c\x07http://\x94h\x7f)\x81\x94}\x94(h\x82h\x85)\x81\x94}\x94(h\x88K\x00h\x89Nh\x8a\x89h\x8bNh\x8cNh\x8d\x8f\x94h\x8fh\x96h\x97K\x00h\x98\x88h\x99\x88h\x9a)h\x9b\x88ubh\x9c}\x94h\x9eK\nh\x9fK\nh\xa0\x89ubu\x8c\x06stream\x94\x89\x8c\ttrust_env\x94\x88\x8c\rmax_redirects\x94K\x1eub\x8c\x06method\x94\x8c\x0egoogle_default\x94\x8c\rcache_timeout\x94N\x8c\x0e_listing_cache\x94}\x94ub\x8c0pangeo-data/cm2.6/control/temp_salt_u_v-5day_avg\x94\x86\x94b\x8c\x00\x94\x88N\x88Nt\x94b\x8c\n_read_only\x94\x88\x8c\r_synchronizer\x94N\x8c\x06_group\x94h\xb2\x8c\x06writer\x94\x8c\x16xarray.backends.common\x94\x8c\x0bArrayWriter\x94\x93\x94)\x81\x94}\x94(\x8c\x07sources\x94]\x94\x8c\x07targets\x94]\x94\x8c\x04lock\x94\x89ub\x8c\rdelayed_store\x94Nub\x8c\rvariable_name\x94\x8c\x04salt\x94\x8c\x05shape\x94(M\xb4\x05K2M\x8c\nM\x10\x0et\x94\x8c\x05dtype\x94\x8c\x05numpy\x94\x8c\x05dtype\x94\x93\x94\x8c\x02f4\x94K\x00K\x01\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94bub\x8c\x03key\x94h\x00\x8c\x0cBasicIndexer\x94\x93\x94)\x81\x94}\x94\x8c\x04_key\x94(\x8c\x08builtins\x94\x8c\x05slice\x94\x93\x94NNN\x87\x94R\x94h\xd8NNN\x87\x94R\x94h\xd8NNN\x87\x94R\x94h\xd8NNN\x87\x94R\x94t\x94sbub\x8c\x04func\x94\x8c\tfunctools\x94\x8c\x07partial\x94\x93\x94h\n\x8c\x0b_apply_mask\x94\x93\x94\x85\x94R\x94(h\xe7)}\x94(\x8c\x13encoded_fill_values\x94\x8f\x94(\x8c\x15numpy.core.multiarray\x94\x8c\x06scalar\x94\x93\x94h\xcdC\x04\xecx\xad\xe0\x94\x86\x94R\x94\x90\x8c\x12decoded_fill_value\x94G\x7f\xf8\x00\x00\x00\x00\x00\x00h\xc7h\xcduNt\x94b\x8c\x06_dtype\x94h\xcdubh\xd0h\xd2)\x81\x94}\x94h\xd5(h\xd8NNN\x87\x94R\x94h\xd8NNN\x87\x94R\x94h\xd8NNN\x87\x94R\x94h\xd8NNN\x87\x94R\x94t\x94sbub\x8c\x0bindexer_cls\x94h\x00\x8c\x0cOuterIndexer\x94\x93\x94ub.'

So that single task has 2.7KB of data. Multiplied by 1,000,000 tasks this alone is 2.7GB of memory at least. I suspect that identifying and resolving these sorts of inefficiencies throughout the stack is likely to be the lowest hanging fruit.

@jacobtomlinson
Copy link
Member

cc @AlexHilson

@rabernat
Copy link
Member Author

So that single task has 2.7KB of data. Multiplied by 1,000,000 tasks this alone is 2.7GB of memory at least. I suspect that identifying and resolving these sorts of inefficiencies throughout the stack is likely to be the lowest hanging fruit.

This seems like an xarray issue. How could xarray make its tasks smaller?

@mrocklin
Copy link
Member

There is no single thing. Generally you need to construct simple examples, then look at the tasks that they generate, serialize those tasks using pickle or cloudpickle and then look at the results. I'm seeing things about Zarr, cookie policies, tokens, etc.. That gives new directions to investigate.

@stale
Copy link

stale bot commented Jul 20, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 20, 2018
@stale
Copy link

stale bot commented Jul 27, 2018

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants