Dataset insertion benchmark #7223

hmaarrfk · 2022-10-26T21:33:56Z

TBH, i was going to expand this to other in-memory operations. Replace a variable? Drop a variable?

I thought the first version used was relevant as well:

ds = xr.Dataset(coords=coords) for v in names: ds[v] = ("time", value)

And this fast one from #7224 (comment) would also be nice to benchmark:

data_vars = {v: ("time", value) for v in names} ds = xr.Dataset(data_vars=data_vars, coords=coords)

The first case is captured by cause i am suggesting we time insertion for 0, 10, 100, 1000 existing variable. One can interpolate the bahvior.

I do believe that bencharmming all at once creation (the second case) is also interesting

I renamed it thinking that we were really timing the merge operations. This is also true for replacing, but not true for dropping.

Maybe we add dropping to a new dataset.py. This is all quite minor though! Shall we merge?

thinking that we were really timing the merge operations

It depends if you want to think about from the point of view of "xarray developer" or "xarray user".

As an "xarray user", i'm trying to "insert a totally new variable, no dims, no coords". I don't really care what is happening under the hood. This is what I originally had in mind. This is h

My goal with the original name was to test operations on in-memory xarrays (as opposed to those backed by a file, which seem to be done in dataset_io). I don't know if I have time to "add other benchmarks" at the moment. This seems good.

The name is the only thing that will survive. so if you are are ok with it.

I ran the benchmarks against main, #7221, and #7222. The benchmark seems to show the effect locally even with a single insertion. I've thus simplified the benchmark to do a single insertion, and not 5.

main

[ 50.00%] ··· merge.DatasetAddVariable.time_variable_insertion ok [ 50.00%] ··· ======== ========== param1 -------- ---------- 0 636±0μs 10 511±0μs 100 5.92±0ms 1000 41.0±0ms ======== ==========

#7221

[ 50.00%] ··· merge.DatasetAddVariable.time_variable_insertion ok [ 50.00%] ··· ======== ========== param1 -------- ---------- 0 609±0μs 10 1.32±0ms 100 4.01±0ms 1000 8.65±0ms ======== ==========

#7222

[ 50.00%] ··· merge.DatasetAddVariable.time_variable_insertion ok [ 50.00%] ··· ======== ========== param1 -------- ---------- 0 578±0μs 10 1.03±0ms 100 2.74±0ms 1000 5.37±0ms ======== ==========

~~Is there a way rename param1?~~ ok i renamed it.

$ asv run -E existing --quick --bench merge.DatasetAddVariable.time_variable_insertion · Discovering benchmarks · Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks) [ 0.00%] ·· Benchmarking existing-py_home_mark_mambaforge_envs_mcam_dev_bin_python [ 50.00%] ··· ...e.DatasetAddVariable.time_variable_insertion ok [ 50.00%] ··· =================== ========== existing_elements ------------------- ---------- 0 668±0μs 10 1.17±0ms 100 2.22±0ms 1000 5.38±0ms =================== ==========

-Original file line number
+Diff line change
@@ -0,0 +1,18 @@
+    import xarray as xr
+    class DatasetAddVariable:
+        param_names = ["existing_elements"]
+        params = [[0, 10, 100, 1000]]
+        def setup(self, existing_elements):
+            self.datasets = {}
+            # Dictionary insertion is fast(er) than xarray.Dataset insertion
+            d = {}
+            for i in range(existing_elements):
+                d[f"var{i}"] = i
+            self.dataset = xr.merge([d])
+        def time_variable_insertion(self, existin_elements):
+            dataset = self.dataset
+            dataset["new_var"] = 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset insertion benchmark #7223

Diff view

Diff view

There are no files selected for viewing

hmaarrfk Oct 26, 2022

Illviljan Oct 27, 2022

hmaarrfk Oct 27, 2022

dcherian Oct 27, 2022

hmaarrfk Oct 27, 2022

hmaarrfk Oct 27, 2022 •

edited

Loading

hmaarrfk Oct 27, 2022

Dataset insertion benchmark #7223

Dataset insertion benchmark #7223

Diff view

Diff view

There are no files selected for viewing

hmaarrfk Oct 26, 2022

Choose a reason for hiding this comment

Illviljan Oct 27, 2022

Choose a reason for hiding this comment

hmaarrfk Oct 27, 2022

Choose a reason for hiding this comment

dcherian Oct 27, 2022

Choose a reason for hiding this comment

hmaarrfk Oct 27, 2022

Choose a reason for hiding this comment

hmaarrfk Oct 27, 2022 • edited Loading

Choose a reason for hiding this comment

hmaarrfk Oct 27, 2022

Choose a reason for hiding this comment

hmaarrfk Oct 27, 2022 •

edited

Loading