Rechunker using Mailbox #710

JoranAngevaare · 2023-02-08T14:53:09Z

What is the problem / what does the code in this PR do
Rechunking of our data is much sought after by computing as our framework creates many small files that make them cumbersome to handle. Therefore we added a tool which does this in #686. We are now expanding it's use cases by implementing it in the DAQ workflow in XENONnT/straxen#1074.

For XENONnT/straxen#1074 I noticed the rather poor performance if all files are rechunked in serial. Therefore, I looked into parallelizing the load->saver + rechunk approach. I re-used the mailbox paradigm for this - as it was easier than just making some home-grown algorithm. Doing some parallelization here makes sense for it is similarly dependent on I/O as CPU intense (but not at the same time), so you might be waiting on I/O while the CPU is doing nothing and vice versa.

Additionally, I incorporated #709's idea of having a progress bar. I think it's quite neat.

Can you briefly describe how it works?
Use the mailbox system to load and save chunks of data. This allows easily running in multi-thread or multicore approaches.

Can you give a minimal working example (or illustrate with a figure)?
On one of the event builders servers if the DAQ, the speedup with modest number of threads ~5-10x.

On my own machine (I/O limited) the speedup is not as large:

for PAPA in process True False; 
   do echo ==== $PAPA ====; 
   rechunker --source 050347-raw_records-rfzvpzj4mf/ --dest test-rr --profile_memory --parallel $PAPA --max_workers 6 --rechunk True --target_size_mb 500 --compressor blosc; 
   echo "n files:";
   ls test-rr/050347-raw_records-rfzvpzj4mf/ | wc -l; 
   echo ======= ; 
done
==== process ====
Will write to test-rr and make sub-folder 050347-raw_records-rfzvpzj4mf
  0%|                                                              | 0/1007 [00:00<?, ?it/s]Rechunking 050347-raw_records-rfzvpzj4mf/ to test-rr/050347-raw_records-rfzvpzj4mf
Removing data in test-rr/050347-raw_records-rfzvpzj4mf to overwrite
100%|███████████████████████████████████████| 1007/1007 [03:07<00:00,  5.38it/s, 139.7 MB/s]
Re-compressed 050347-raw_records-rfzvpzj4mf/
...
Memory profiler says peak RAM usage was: 1308.0 MB
Took 190.2 s = 0.05 h
Bye, bye
n files:
52
=======
==== True ====
Will write to test-rr and make sub-folder 050347-raw_records-rfzvpzj4mf
  0%|                                                              | 0/1007 [00:00<?, ?it/s]Rechunking 050347-raw_records-rfzvpzj4mf/ to test-rr/050347-raw_records-rfzvpzj4mf
Removing data in test-rr/050347-raw_records-rfzvpzj4mf to overwrite
100%|███████████████████████████████████████| 1007/1007 [02:28<00:00,  6.78it/s, 175.8 MB/s]
Re-compressed 050347-raw_records-rfzvpzj4mf/
...
Memory profiler says peak RAM usage was: 1993.3 MB
Took 150.6 s = 0.04 h
Bye, bye
n files:
52
=======
==== False ====
Will write to test-rr and make sub-folder 050347-raw_records-rfzvpzj4mf
  0%|                                                              | 0/1007 [00:00<?, ?it/s]Rechunking 050347-raw_records-rfzvpzj4mf/ to test-rr/050347-raw_records-rfzvpzj4mf
Removing data in test-rr/050347-raw_records-rfzvpzj4mf to overwrite
100%|███████████████████████████████████████| 1007/1007 [03:45<00:00,  4.47it/s, 115.9 MB/s]
Re-compressed 050347-raw_records-rfzvpzj4mf/
..
Memory profiler says peak RAM usage was: 1341.5 MB
Took 226.3 s = 0.06 h
Bye, bye
n files:
52
=======

coveralls · 2023-02-08T15:30:49Z

Coverage: 92.164% (+0.4%) from 91.718% when pulling 06a3426 on mailbox_rechunker into 28553a7 on master.

jmosbacher

Sorry I dont have time to really test this but I read all the changes and its well written and organized. Given that this is a relatively isolated tool I think its safe to merge.

JoranAngevaare · 2023-02-13T13:52:18Z

Thanks Yossi, I agree, apart from the unit-tests, I tested it extensively on the DAQ.

The main use case, XENONnT/straxen#1074 comes with extra safeguards that should catch any failures before integrating any data into the pipeline. So far with the current data-scheme (run 050411), all is running very stably.

Joran Angevaare added 6 commits February 6, 2023 11:05

add pbar for rechunker

8c33799

use mailbox system to allow for parallel rechunking

5db61de

also test

231b65b

fix codefactor

966ce8b

fix it more

b0e208b

add timeout for tests

ad05af0

Joran Angevaare added 2 commits February 8, 2023 17:17

fix hang

5551961

add missing line

c916a22

JoranAngevaare requested a review from jmosbacher February 8, 2023 16:21

Joran Angevaare added 2 commits February 8, 2023 17:23

I like verbose pbars

4c3035d

fix test

bba1679

JoranAngevaare mentioned this pull request Feb 8, 2023

Restrax XENONnT/straxen#1074

Merged

6 tasks

jmosbacher approved these changes Feb 8, 2023

View reviewed changes

fix timeout issues

06a3426

JoranAngevaare merged commit 67f4df9 into master Feb 13, 2023

JoranAngevaare deleted the mailbox_rechunker branch February 13, 2023 13:52

JoranAngevaare mentioned this pull request Feb 17, 2023

Patch md access in the rechunker #711

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rechunker using Mailbox #710

Rechunker using Mailbox #710

JoranAngevaare commented Feb 8, 2023 •

edited

Loading

coveralls commented Feb 8, 2023 •

edited

Loading

jmosbacher left a comment

JoranAngevaare commented Feb 13, 2023

Rechunker using Mailbox #710

Rechunker using Mailbox #710

Conversation

JoranAngevaare commented Feb 8, 2023 • edited Loading

coveralls commented Feb 8, 2023 • edited Loading

jmosbacher left a comment

Choose a reason for hiding this comment

JoranAngevaare commented Feb 13, 2023

JoranAngevaare commented Feb 8, 2023 •

edited

Loading

coveralls commented Feb 8, 2023 •

edited

Loading