Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time overhead of Alazar measurements #113

Open
emartinez opened this issue Jun 12, 2018 · 11 comments
Open

Time overhead of Alazar measurements #113

emartinez opened this issue Jun 12, 2018 · 11 comments

Comments

@emartinez
Copy link
Contributor

emartinez commented Jun 12, 2018

@jenshnielsen @ThorvaldLarsen @nataliejpg See below a profile of a measurement I am doing with the Alazar card. The bottomline is:

  • 126.496 s are spent acquiring, of which:
  • 96.024 s are spent at _call_dll (actual measurement time)
  • 12.384 s are spent at pre_start_capture, most of which is a call to demodulator.py:53(init)
  • 13.981 s are spent at post_acquire (mostly at sample_to_volt_u12 and demodulate)

For reference, a naïve implementation of the demodulation routine (a single matrix product) takes a couple hundred ms. Some suggestions for optimization are:

  • Call the Demodulator constructor in the AlazarChannel constructor instead of the acquisition code.
  • Optimize sample_to_volt_u12? It's not clear to me what exactly takes that long in that method.
  • Have a low-overhead alternative to Demodulator, for people who do not need the additional features.

1586515 function calls (1582362 primitive calls) in 138.488 seconds

Ordered by: cumulative time
List reduced from 1184 to 100 due to restriction <100>

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 138.488 138.488 {built-in method builtins.exec}
1 0.003 0.003 138.488 138.488 :1()
1 0.013 0.013 138.486 138.486 scan.py:57(do2d)
42/40 0.000 0.000 126.577 3.164 parameter.py:252(call)
128/9 0.001 0.000 126.499 14.055 parameter.py:330(get_wrapper)
1 0.000 0.000 126.496 126.496 alazar_multidim_parameters.py:99(get_raw)
1 0.005 0.005 126.496 126.496 ATS.py:616(acquire)
211 96.024 0.455 96.024 0.455 ATS.py:915(_call_dll)
1 0.575 0.575 13.981 13.981 ATSChannelController.py:264(post_acquire)
1 1.254 1.254 13.405 13.405 ATSChannelController.py:295(handle_alazar_channel)
1 0.375 0.375 12.384 12.384 ATSChannelController.py:196(pre_start_capture)
1 9.690 9.690 11.986 11.986 demodulator.py:53(init)
138 9.091 0.066 9.091 0.066 {method 'commit' of 'sqlite3.Connection' objects}
112 0.002 0.000 8.151 0.073 sqlite_base.py:237(atomic_transaction)
2 0.004 0.002 7.287 3.644 measurements.py:184(flush_data_to_database)
2 0.000 0.000 7.284 3.642 data_set.py:357(add_results)
2 0.003 0.002 7.278 3.639 sqlite_base.py:397(insert_many_values)
1 0.000 0.000 7.163 7.163 measurements.py:287(exit)
1 0.026 0.026 5.597 5.597 ATSChannelController.py:366(_to_volts)
1 5.570 5.570 5.570 5.570 acq_helpers.py:4(sample_to_volt_u12)
1 4.288 4.288 5.400 5.400 demodulator.py:90(demodulate)
100 4.088 0.041 4.088 0.041 ATSChannelController.py:254(handle_buffer)
1 0.000 0.000 3.530 3.530 measurements.py:241(enter)
4 3.408 0.852 3.408 0.852 numeric.py:1076(outer)
4 0.000 0.000 2.708 0.677 data_set.py:241(add_parameter)
4 0.000 0.000 2.705 0.676 sqlite_base.py:1101(add_parameter)
161 0.001 0.000 1.865 0.012 sqlite_base.py:214(transaction)
161 1.742 0.011 1.864 0.012 {method 'execute' of 'sqlite3.Cursor' objects}
24 0.000 0.000 1.628 0.068 contextlib.py:85(exit)
56 0.000 0.000 1.628 0.029 {built-in method builtins.next}
46 0.000 0.000 1.628 0.035 sqlite_base.py:264(atomic)
2495 1.162 0.000 1.162 0.000 {method 'reduce' of 'numpy.ufunc' objects}
2 0.000 0.000 1.155 0.577 fromnumeric.py:2806(mean)
2 0.000 0.000 1.154 0.577 _methods.py:53(_mean)
4 0.000 0.000 1.081 0.270 sqlite_base.py:1140(_add_parameters_to_layout_and_deps)

@Dominik-Vogel
Copy link
Contributor

Thank you @emartinez for this nice timing overview.
It would be really interesting to see the number of samples and records and buffers, so that we can make an estimate of the lower limits of the timing.
I fully agree that moving the allocation of the demodulation reference to a place where it is only called once is a very good idea and can save us 10% of the time.
The other 10% for the actual demodulation, I think are harder to address.
It is also unclear to me what takes that long in sample_to_volt, it is all numpy functions and should therefore run fast. It is basially a np.right_shift, a multiplication and a conversion with np.float64.
The demodulator is also implemented with scipy and effectively runs optimized c code. To compare it to the bare matrix multiplication is not fair, as the mulitplication is the fast part of the demodulation. The slow part is the following low pass filter (which is crucial).

I would suggest that we focus on fixing the allocation as a first step as this is comparably easy and it is certain to yield improvements.

@nataliejpg
Copy link
Collaborator

Thanks for this esteban. Moving the the call to the demodulator out of the pre_start_capture seems like a really obvious and easy fix. I'd be interested to know how long the filtering takes compared to the matrix multiplication and if there is any reason why turning off the filtering (and instead just taking the average) takes longer if you do it via the demodulator rather than by hand matrix multiplication (and by hand I don't actually mean by hand).

It would also be really good to map out the different ways of taking data and to get some duration estimate as well and a feeling for dead time during the measurement and how much data we are taking. My thinking is in the direction of working out the likelihood of being able to demodulate on the fly and of course also the memory limitations. We could work out a plan for how to tackle this maybe next week or the week after (and work out who does what)?

@jenshnielsen
Copy link
Contributor

It's not quite as simple as moving the init of the demodulation. You cannot know until get is called which demod frequencies are used. This is not to say that it can't be done but we would need to automatically reinit the demodulator when ever the demod frequencies are changed

@nataliejpg
Copy link
Collaborator

the demod freq of a channel is a parameter of it though isn't it? so we could change the set_cmd to update or reinit the demodulator at that point rather than on acquire? Or as part of prepare channel? What do you think? Also I noticed that from the syntax it looks like demodulator still thinks it can demodulate at multiple frequencies but since each channel has max one demodulation frequency this will never happen, is that true?

@jenshnielsen
Copy link
Contributor

@nataliejpg Yes that's what i suggested.

Also I noticed that from the syntax it looks like demodulator still thinks it can demodulate at multiple frequencies but since each channel has max one demodulation frequency this will never happen, is that true?

No that is wrong AFAIK, There is only one demodulator that does the demodulation at all the channels at the same time

@ThorvaldLarsen
Copy link
Collaborator

Thanks for the clarification I wasn't aware a single demodulator did everything.

@Dominik-Vogel the slowest part of the demodulator is not the scipy.filter (this doesnt even show up in the list above) rather of the 5.4s 65% of the time is spent at np.outer() that expands the data to have same size as the demodulation matrix (the cosines and sines).
We currently have about 20% overhead on measurements. I think optimizing the matrix multiplication (remove np.outer) and moving as much as possible from pre_start_aquire to prepare_channel will get us below 10%. This will also make the code more readable as everything is now defined by channels it would make sense to also have the demodulation live at the channel level.

Also of note the one of the reasons for Wolfgang/Gijs to come to CPH on Friday is to see what we can learn from each other on how to best to take data with the alazar card. So it might be useful to think about getting the alazar channels into main qcodes.

@ThorvaldLarsen
Copy link
Collaborator

ThorvaldLarsen commented Jun 13, 2018

Just to follow up on the point from @Dominik-Vogel about how many buffers/records/samples are used. The overhead is not surprisingly very dependent on the amount of averages and whether the data is averaged over buffers (this is cheap as it is done on the fly during measurement and there is less data to demodulate).
From this it is clear that the filtering is the most expensive part of the demodulation. Also depending on the specifics of how the measurement is implemented there are different parts that could be optimized. In long term it might be interesting to look at doing more inside the handle_buffer function as this is running on the fly without interrupting the measurement (as long as it is faster than the buffers are filled).

To give a few numbers with averages over buffers and samples:

  • 2000 buffers, 501 records, 2176 samples, 51 times (steps in a do1d)
    Gave overhead of 3.2 % with about half of that being life plotting.

  • 200 buffers, 501 records, 2176 samples, 251 times
    Gave overhead of 33.1 % with about half of that being life plotting.
    Digging into main numbers for the 33% overhead:

Function time Note
total time 837.0 s Minimum measurement runtime 628.7 s
ATS.py:616(acquire) 744 s Total time at alazar
base.py:259(update) 65.1 s Update liveplot
parameter.py:383(set_wrapper) 16.2 s Setting decadac

in ATS.py:616(acquire):

Function Time
ATS.py:616(_call_dll) 394 s
ATS_ctrl (pre_start_capture) 27 s
ATS.py (handle_buffer) 251 s
ATS_ctrl (post_acquire) 66 s

in ATS_ctrl (post_acquire)

Function Time
ATS_ctrl (_to_volts) 4.4 s
demodulator (filter_win) 48.8 s
numeric.py (outer) 2.0 s

@nataliejpg
Copy link
Collaborator

so more time in handle buffer than in post_acquire!? that's interesting. Next time I get on a fridge I'll compare filter on to filter off (but within the demodulator). And try to make some timer for dead time that we could be using to demodulate on the fly. :) Thanks @ThorvaldLarsen

@ThorvaldLarsen
Copy link
Collaborator

I am not entirely sure what is happening during calls to handle_buffer and ATS call_dll. Neither of them takes long enough to complete the full measurement so alazar must be acquiring data during both function calls.

@jenshnielsen
Copy link
Contributor

The actual data acquisition happens asynchronously while the copying is happening in handle buffer so it's a bit more complicated. Also @ThorvaldLarsen I fixed the table

@nataliejpg
Copy link
Collaborator

ah ok, that makes more sense, thanks @jenshnielsen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants