Progress callback is called too soon ? #741

eblis · 2022-07-29T10:23:46Z

Python version

('python=3.9.12 (tags/v3.9.12:b28265d, Mar 23 2022, 23:52:46) [MSC v.1929 64 '
 'bit (AMD64)]')
'os=Windows-10-10.0.19044-SP0'
'numpy=1.23.1'
ldf is not supported
xls is not supported
xlsx is not supported
'asammdf=7.1.0'

Code

Code snippet

logger = logging.get_logger(__name__)

def load_progress(progress_bar: tqdm.tqdm, current, total):
    if progress_bar.total != total:
        progress_bar.clear()
        progress_bar.reset(total=total)
    progress_bar.update(1)
    progress_bar.refresh()

loading = tqdm.tqdm(total=100, leave=False, desc=f"Loading {file.name}")
progress_function = functools.partial(load_progress, loading)
try:
    with MDF(file, callback=progress_function) as mdf_file:
        loading.close()  # done with load progress
        logger.info(f"Reading finished for {file}")

Description

The progress for loading the file finishes a lot faster than the file is actually loaded.
The progress for my file (but the file itself doesn't matter from what I can tell) has total set to 13903 or similar and the progress finishes in a few seconds, but the progress bar is closed and the log message saying Reading finished for {file} is only written minutes later

The text was updated successfully, but these errors were encountered:

danielhrisca · 2022-07-29T11:50:21Z

current and total are not confined in the interval 0..100
Can this be a problem for your progress bar?

eblis · 2022-07-29T14:40:12Z

If the total value reported by asammdf is different than the current one (default 100) the progress bar is reset to use the new total as the maximum. And every time it's called the progress is increased by 1.
If you do a progress_bar.write(f"{current}: {total}") in the callback function you can see that value goes from 0 to whatever total has, but the file hasn't finished loading at that point, it still takes a lot longer after callback has been called for the last time

eblis · 2022-08-01T14:28:09Z

So it turns out that the problem is the sort call after reading the data.

I added some quick print statements to the _read() method (for mdf_v4) and this is what I got. I measured the time it took from the start of the method call to the self._sort() call which seems to load the data, the time it took for the self._sort() call and then the time after sorting until the end, which seems to be handling some kind of events.

Reading took 9.03205919265747 seconds
Sorting took 1453.9986987113953 seconds
Events took 0.37953495979309 seconds

The bulk of the processing time is for DTBLOCKs

eblis · 2022-08-01T15:00:28Z

It takes around 4s for one iteration of the following for block (I added the progress stuff):

            for info in group.get_data_blocks():
                if self._callback:
                    progress += 1
                    self._callback(progress, size)

                dtblock_address, dtblock_raw_size, dtblock_size, block_type, param = (
                    info.address,
                    info.original_size,
                    info.compressed_size,
                    info.block_type,
                    info.param,
                )
...

I have 575 groups in total so it will take about 30-35 minutes.

danielhrisca · 2022-08-01T15:22:19Z

It takes around 4s for one iteration of the following for block (I added the progress stuff):

            for info in group.get_data_blocks():
                if self._callback:
                    progress += 1
                    self._callback(progress, size)

                dtblock_address, dtblock_raw_size, dtblock_size, block_type, param = (
                    info.address,
                    info.original_size,
                    info.compressed_size,
                    info.block_type,
                    info.param,
                )
...

I have 575 groups in total so it will take about 30-35 minutes.

Damn, how big is the file?

eblis · 2022-08-01T15:25:09Z

About 574 MBs

danielhrisca · 2022-08-05T07:33:00Z

please try the development branch code

eblis · 2022-08-05T08:31:35Z

I had a look at the changes.
From what I can tell the new changes will have 2 new steps for the progress, one of which is the whole sort() call. But in my case the sort took the longest time by far, with these changes it will look like the progress bar is stuck for a long long time on the second to last step.

I tried to think of a way to update the progress while sorting is taking place.
Would something like this be possible ?
Add 100 steps (or some other, large, fixed number) for the sorting of the data.
Then calculate the individual progress increment for one data block, for example if we have 55 data blocks and 100 steps in total it would be increment = sort_steps / data_blocks, sort_steps being the total number (100) and data_blocks being the total number of data blocks by calculating the number of data blocks in each group and adding all of them together.
In our example it would be increment = 100 / 55 -> increment=1.818181

Then you could call the progress function inside the sort() call, updating by the increment every time.
This would make the progress smoother, but it will mean that the increment value is no longer an integer, it's a float (you could round the value when calling the progress function) if the callback function really expects ints (but then again why should it?).

Added progress information while sorting all the groups. - Added 102 extra steps in the load progress, 100 for sorting and 2 after sorting is completed. - When incrementing the progress updates during sorting we first compute a step, based on the total number of steps we have (100), the number of groups and the total items and then we increment each sorting by the step count. - If the previous step count (integer) is greater than the current step (integer) then we can update the progress. This avoids calling the callback function with floating values and avoids calling the callback function with the same current value of the progress bar - i.e. we only call when there's an update in progress.

eblis · 2022-09-27T09:58:52Z

I have created a pull request with my proposed solution implemented.
I've tweaked the solution a bit so it calls callback function with integers instead of floats and only calls the callback function if the progress (as int) has increased.

eblis · 2022-10-06T11:13:40Z

You can use the following script to check the progress updates.

import functools
import pathlib

import asammdf
from tqdm import tqdm


def load_progress(progress_bar: tqdm, current, total):
    if progress_bar.total != total:
        progress_bar.clear()
        progress_bar.reset(total=total)
    progress_bar.update(1)


file = pathlib.Path("d:/tmp/mf4/scramble.scrambled.mf4")
loading = tqdm(total=100, leave=False, desc=f"Loading {file.name}")

progress_function = functools.partial(load_progress, loading)
with asammdf.MDF(file, callback=progress_function) as mf4_file:
    loading.close()  # done with load progress

Fix for issue #741.

danielhrisca · 2023-01-05T07:32:33Z

in the version 7.3.0 you will have to change the argument name to progress

with asammdf.MDF(file, progress=progress_function) as mf4_file:
    loading.close()  # done with load progress

danielhrisca added a commit that referenced this issue Oct 7, 2022

Merge pull request #772 from eblis/feature/load-progress

abf904b

Fix for issue #741.

danielhrisca closed this as completed in ef4762a Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progress callback is called too soon ? #741

Progress callback is called too soon ? #741

eblis commented Jul 29, 2022

danielhrisca commented Jul 29, 2022

eblis commented Jul 29, 2022

eblis commented Aug 1, 2022 •

edited

Loading

eblis commented Aug 1, 2022

danielhrisca commented Aug 1, 2022

eblis commented Aug 1, 2022

danielhrisca commented Aug 5, 2022

eblis commented Aug 5, 2022

eblis commented Sep 27, 2022

eblis commented Oct 6, 2022

danielhrisca commented Jan 5, 2023

Progress callback is called too soon ? #741

Progress callback is called too soon ? #741

Comments

eblis commented Jul 29, 2022

Python version

Code

Code snippet

Description

danielhrisca commented Jul 29, 2022

eblis commented Jul 29, 2022

eblis commented Aug 1, 2022 • edited Loading

eblis commented Aug 1, 2022

danielhrisca commented Aug 1, 2022

eblis commented Aug 1, 2022

danielhrisca commented Aug 5, 2022

eblis commented Aug 5, 2022

eblis commented Sep 27, 2022

eblis commented Oct 6, 2022

danielhrisca commented Jan 5, 2023

eblis commented Aug 1, 2022 •

edited

Loading