Restrax #1074

JoranAngevaare · 2022-08-21T11:59:56Z

Add restrax

In this PR, we change the bootstrax workflow. Instead of writing to disk directly, we will write the data twice, once using fast compressors, and once using arbitrary compressors.

We would change the workflow as follows:

Can you briefly describe how it works?

I've given a presentation here. Also see the dedicated documentation.

Current

Bootstrax rechunks certain datatypes live (i.e. in memory). As well does different levels of compression. Admix makes a 1:1 copy for rucio. In this scheme, we don’t rechunk the lowest level datatypes, those result in many files (one per chunk of live data). Therefore, we set the chunk size to 21s for background data.

Advantage:

Efficient, only write data once
Works under many different data rates

Disadvantage:

Many “small” files since you don’t want to keep too much datatypes in memory
21 second chunks (dictated by no-raw records rechunking) are (too) slow for SNEWs, shorter makes the online monitor data more live
Decision of desired compressor is directly at the start of processing
Many configs needed for different sources which we use to “guess” what chunk size we should use.

Proposed

Bootstrax first write all datatypes to disk -> An extra process restrax to compresses and rechunk whatever data to whatever size. In this case we can set the chunk size to any duration (e.g. 3 seconds)

Advantage:

Arbitrary filesize of each datatype
Arbitrary compressor of each datatype
5 second chunks possible for all datarates (different calibrations sources)
In case of high rates, restrax could software-veto by cutting out raw-records based on posrec/peak size. This would require much more development.
If restrax writes to datamanager, we are getting a free lunch.

Disadvantage:

Roughly, twice the I/O at the eventbuilders (unless restrax writes to datamanager). Some extra CPU for (de)compressing, extra memory is a few times the set target_size_mb, there are no surprises here and memory usage is very predictible.
More complex workflow
Not tested, needs (some) development. Restrax would be ~100 lines of extra code, mostly for bookkeeping the data field in the rundoc.

Additional changes:

Update the bootstrax fields used to communicate with admix to allow this intermittent step
Move some duplicate functions from bootstrax into daq_core.py to prevent duplication. Thereby making the way for Refactor bootstrax #479 to be implemented
rename args.execute in ajax to args.production for consistency.

Further requirements:

Test this framework!
More log statements in restrax
We might want to make the Restrax.get_compressor_and_size more fancy after discussing with computing. It's current implementation should probably already improve their lives a lot
Don't infer the compressor for raw-records anymore. Only use blosc (fastest) or lz4 in case of >2 GB chunks.
Merge Rechunker using Mailbox AxFoundation/strax#710, this feature makes restrax 5-10x faster
Add documentation to the cited page

coveralls · 2022-08-21T12:22:45Z

Coverage: 93.527% (-0.01%) from 93.538% when pulling f2cca33 on restraxer into 234b787 on master.

Co-authored-by: Joran Angevaare <j.angevaare@nikhef.nl>

JoranAngevaare · 2023-02-16T16:02:10Z

Thanks @mflierm and @cfuselli for the great reviews and excellent ideas. I've implemented all of them (unless explicitly stated otherwise).

You can update the configs via (assuming you are on one of the ebs):

from straxen import daq_core
db = daq_core.DataBases()
db.daq_db['restrax_config'].update_one({'name': 'restrax_config'},
                                       {'$set': {'user': 'angevaare', 'last_modified': daq_core.now()}})
db.daq_db['restrax_config'].find_one()

We could consider making a page on the website for this

JoranAngevaare · 2023-03-03T13:32:58Z

For bookkeeping:
two additional commits were added to solve an issue in compression:

Joran Angevaare added 4 commits August 20, 2022 22:32

example restraxer

2020a86

fix some duplication

e8b3b1a

remove more duplication

35b9928

add to bin

24ed478

Joran Angevaare and others added 19 commits August 21, 2022 09:24

fix typo

80a9ea9

Merge branch 'master' into restraxer

06fd65f

refactor

472458c

add fixes

0985b77

fix the arguments

8ab8330

start with last run

02dbccc

update logging

085c198

add more checks

a5cc3fb

clean log messages

0ebe586

add some logic

6796660

fix ajax

ecc7b4f

add docs

7336e76

Merge branch 'restraxer' of github.com:XENONnT/straxen into restraxer

75debc8

chunkier chunks

f85eeac

updates

310a786

change the compression at bootstrax

29551e1

fix typing

1166b4d

minor fixes

6c90e69

start with the largest first

372a5e8

JoranAngevaare mentioned this pull request Feb 8, 2023

Rechunker using Mailbox AxFoundation/strax#710

Merged

JoranAngevaare added 2 commits February 8, 2023 16:20

use strax#710

1c6817b

push a few tweaks

3560990

JoranAngevaare mentioned this pull request Feb 9, 2023

Restrax updates #1137

Merged

JoranAngevaare and others added 2 commits February 9, 2023 11:16

extra updates - resolve conflicts (#1137)

4361e51

Co-authored-by: Joran Angevaare <j.angevaare@nikhef.nl>

leaner compare processess

a4045bb

Joran Angevaare added 6 commits February 16, 2023 13:42

update comment

321d888

bypass mode

04020d0

update and clean

ede5905

update and clean

1f608ab

add config to daq db

cceccae

make configurable

15ac52b

JoranAngevaare marked this pull request as draft February 16, 2023 14:44

JoranAngevaare and others added 4 commits February 16, 2023 17:01

debug

622659c

extra checks

d7bc4dc

cleanup

0b275bd

make a few methods safe

e20b3fc

JoranAngevaare marked this pull request as ready for review February 17, 2023 08:48

JoranAngevaare and others added 13 commits February 17, 2023 11:05

only run once!

5b6ee5c

more documentation

699721c

update the docs

84de02b

final tweaks

604200b

no data tweak

21b189b

sneaky bugfix bootstrax

a954e6b

another sneaky fix

73ab3a8

use total seconds

e5bbc7d

Fix stray commit

cccf28d

up versions and comments

d79b395

update type hinting

83468cc

return direct hints

48106a1

update defaults

f2cca33

JoranAngevaare merged commit dfa021d into master Feb 24, 2023

JoranAngevaare deleted the restraxer branch March 3, 2023 13:32

JoranAngevaare mentioned this pull request Mar 8, 2023

Small patches to restrax module #1143

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restrax #1074

Restrax #1074

JoranAngevaare commented Aug 21, 2022 •

edited

Loading

coveralls commented Aug 21, 2022 •

edited

Loading

JoranAngevaare commented Feb 16, 2023

JoranAngevaare commented Mar 3, 2023

Restrax #1074

Restrax #1074

Conversation

JoranAngevaare commented Aug 21, 2022 • edited Loading

Add restrax

Can you briefly describe how it works?

Current

Advantage:

Disadvantage:

Proposed

Advantage:

Disadvantage:

Additional changes:

Further requirements:

coveralls commented Aug 21, 2022 • edited Loading

JoranAngevaare commented Feb 16, 2023

JoranAngevaare commented Mar 3, 2023

JoranAngevaare commented Aug 21, 2022 •

edited

Loading

coveralls commented Aug 21, 2022 •

edited

Loading