Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try preventing square overflow and give debug info otherwise #224

Merged
merged 1 commit into from
Oct 25, 2021

Conversation

taldcroft
Copy link
Member

Description

This is an attempt to squash the intermittent overflow on square warning that shows up in our ska job watch. There was an actual problem I uncovered, which is that the square was done using the MSID dtype instead of float64. Maybe that's the problem, but if not I added some debug code that might let us track down the source of the issue.

Testing

  • Passes unit tests on MacOS
  • Functional testing

Functional testing

This is very difficult to truly test with this exact code because replicating the warning requires running an update on a full eng archive. I believe that the warning is being emitted into STDERR in a place that is not co-located with the STDOUT logging, so I have no idea which MSID is actually the problem.

With that in mind, I wrote a mock script that has essentially the same code:

from cheta import fetch
import numpy as np
import warnings

# Representative Msid object
msid = fetch.Msid('aoattqt1', '2021:001', '2021:001:00:01:00')

vals = msid.vals[:3]
dts = np.ones(len(vals), dtype=np.float64)
sum_dts = np.sum(dts)
mean = np.sum(dts * vals) / sum_dts
vals[0] = 1e200

vals_minus_mean = vals.astype(np.float64) - mean

with warnings.catch_warnings(record=True) as warns:
    sigma_sq = np.sum(dts * vals_minus_mean**2) / sum_dts
    if warns:
        print(repr(warns[0].message))
        print(f'{msid=}')
        print(f'{np.max(np.abs(vals_minus_mean))=}')
        print(f'{vals_minus_mean.dtype=}')

Running this gives:

(ska3) ➜  eng_archive git:(handle-square-overflow) python test_catch_overflow_warning.py 
RuntimeWarning('overflow encountered in square')
msid=<Msid start=2021:001:00:00:00.000 stop=2021:001:00:01:00.000 len=58 dtype=float64>
np.max(np.abs(vals_minus_mean))=1e+200
vals_minus_mean.dtype=dtype('float64')

@taldcroft
Copy link
Member Author

On reflection I figured out a way to find the bad values, since they result in an inf in the stds attribute of the daily stats. The source is actual bad values of OBC telemetry like AOSUNER3=-5.32e+32 at 2021:256:12:23:44.333. Confirmed that this is in MAUDE as well.

@jeanconn
Copy link
Contributor

I'm assuming you ran the mock code on the discovered bad values?

@taldcroft
Copy link
Member Author

I'm assuming you ran the mock code on the discovered bad values?

Not exactly. Note that the mock code injects a 1e200 into the dataset to force an overflow. The code in this PR does two things: ensure float64 and print debug info for an overflow. It turns out that the "ensure float64" will prevent all known occurrences of overflow which happen in a float32 (float32 **2 will always fit in a float64).

So AFAIK the new debug code will never run, but maybe there are float64's that overflow? If that happens, I expect the debug code to work based on the mock code.

Copy link
Contributor

@jeanconn jeanconn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the float32's it was helpful for me to just see

In [30]: np.sum(dts * (vals - mean) ** 2) / sum_dts                                                                      
<ipython-input-30-ce6820353470>:1: RuntimeWarning: overflow encountered in square
  np.sum(dts * (vals - mean) ** 2) / sum_dts
Out[30]: inf

In [31]: np.sum(dts * (vals.astype(np.float64) - mean) ** 2) / sum_dts                                                   
Out[31]: 6.081264884868255e+64

for that AOSUNER3 case. Good to go.

@taldcroft taldcroft merged commit 084fdc6 into master Oct 25, 2021
@taldcroft taldcroft deleted the handle-square-overflow branch October 25, 2021 20:13
@javierggt javierggt mentioned this pull request Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants