-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Quality Issue Arises When Using tinterpol #966
Comments
Thanks for letting us know! Is there a time range I can use with that notebook to reproduce the issue? Are you saying that tinterpol is returning negative values, even when the inputs are all non-negative? I suppose this could happen even with linear interpolation, if data is being extrapolated outside the time range of the input data. We are currently working on merging several of the pyspedas/pytplot interpolation methods into a single routine,. and allowing the user to specify the desired behavior via keyword arguments. It sounds like this example would make a good test case. |
I am able to recreate the error with MMS FPI data in the range ["2018-07-01/01:00", "2018-07-02/00:10"] Yes, all inputs are nonnegative (albeit looking at it again now it appears most of the issue data points are zeros) and tinterpol returns a negative value. |
Hmm, actually I'm seeing some large negative values in the Tpara and Tperp variables. I've changed the time range to this:
Then, after the "Extract data values" step, I added a cell with:
And the result I'm seeing is:
There seem to be about ten(ish) negative values at various indices in both the Tpara and Tperp input variables. So I'm not surprised that there are negative values in the interpolated data. Is it possible we're looking at different data...are you still working with MMS1 in your version of the notebook? |
Looking at mine again I am also getting some very large negative numbers and I was looking at fast data not burst. If I look at the noninterpolated temps from burst data I see the following.
|
I've managed to recreate my original issue (with number density instead of temperature). With the same time range and the following code
and looking at my output
Every time it happens the corresponding term in the non-interpolated (des_numb.y) data is a zero. |
Yes, I can reproduce the issue with the density variables. It's weird...as you say, the input data value is 0, the time it's interpolating to is an exact match between the two variables, yet the interpolated value is ever so slightly negative. The actual interpolation is being done with the xarray.interp() method (which itself seems to be calling scipy.interp1d). I'll have to step through it a few more times, to see if anything weird is happening to the times or data values before they get handed off to xarray.interp(). The timestamps are being passed as np.datetime64[ns] objects -- maybe xarray is doing some sort of lossy conversion? Meanwhile, until we find a solution (or at least an explanation) for the weird interpolation results, I think it might be a good idea to implement a sanitization step (perhaps using pytplot.clip() or pyspedas.yclip()) to ensure no negative values are being passed to other libraries that can't handle them. But I don't think this is something that should happen within the interpolation routine itself. |
I think I might have found the issue! I have the mms-examples repo loaded into a separate virtual environment from my pyspedas development environment. When I tried turning the notebook example into a unit test in my pyspedas environment, I couldn't reproduce the negative values in mms1_des_numberdensity_brst-itrp...the minimum values were 0.0, just as we would have expected. The mms-examples environment I was using had some older versions of some of the package dependencies....in particular, the one I think was causing the problem was the cdflib package. cdflib had a major version release a while back, but there were some breaking changes, so we kept the cdflib version pinned at < 1.0.0 until the pyspedas and pytplot-mpl-temp code could be updated. Now pyspedas and pytplot-mpl-temp can work with the latest cdflib versions, so the cdflib < 1.0.0 restriction has been removed. Updating pyspedas or pytplot-mpl-temp versions probably would not have automatically updated the cdflib package. So, I suspect you still might have cdflib 0.4.9 in your environment. More recent versions of cdflib have changed the handling of CDF timestamps to have nanosecond rather than microsecond precision. Could you make a note of what versions of pyspedas, pytplot-mpl-temp, and cdflib you're running in the environment where you're seeing the interpolation problem? Then, try updating to the latest available versions of all three packages, and see if the issue goes away (and I'm pretty sure it will). If not, let me know, and we'll keep digging. |
Apologies I was on vacation and off my computer for a few days. My packages are as follows pyspedas 1.5.17 I still updated all packages just to be safe and the issue is still occuring. Adding another wrinkle to the issue, earlier you mentioned that the times are an exact match between the two variables, that is true for the "fast" data. However, that is not true for the "brst" data and the interp issue still arises. The "brst" data times are slightly separated and there are also more of these negative values occuring. Interestingly too is that the negative values are not occuring at the same indices in the 'brst' data as they are in the 'fast' data.
and
|
Thanks for the additional info -- guess I'll reopen this an take another look. |
I can't reproduce the issue. Here's my test case:
This test passes, with no negative values found. Are we both still looking at MMS1, loaded with the same options? Here's more info about my environment: Python: 3.9.18 |
xarray 2023.6.0 Funny, the issue disappeared by including the
line in the FPI load statement. Can't seem to see any differences that may cause that. The times are still slightly different and I believe that center measurement only affects the time point of the measurements, correct? |
Thanks for checking! I can reproduce the issue if I set center_measurement to False. I changed the debugging loop to this:
and here's the output I'm getting for the first negative value:
Note that I'm using pytplot.get instead of pytplot.get_data here: this preserves the original np.datetime64 time values and the astropy units added in cdf_to_tplot, matching what xarray.interp() is working on internally. I guess either of those factors could be contributing to the unexpected result we're seeing. Next step will be to construct a standalone example that only uses numpy, xarray, and/or scipy to reproduce the issue, and see where that leads us. |
OK, I'm able to reproduce the issue purely using numpy, xarray, and scipy. Here's a test case I extracted from the MMS data:
Output: A similar example calling scipy.interpolate.interp1d directly (after converting the np.datetime64 objects to float64) gives the same result. (xarray is calling interp1d internally). If I change the type of values_input from float32 to float64, we get the expected result of 0. scipy.interpolate.interp is called out in the scipy documentation as a "legacy" routine, so it's unlikely they'll be interested in fixing this issue. I wouldn't hold my breath waiting for xarray to switch to a different interpolator, either. numpy.interp seems to do the right thing in this specific case, but it only does 1-D piecewise linear interpolation, and doesn't offer any nonlinear or nearest-neighbor options. The workaround, for now, is basically what I suggested before: sanitize the interpolated data before passing it to anything that expects non-negative, non-nan, or other restricted domains. We'll be sure to take this case into account as we develop our one-size-fits-all, general-purpose interpolation routine for pyspedas, but it's going to take a while before we get there.... |
Reopening here to track the issue I filed on the xarray repo. They're going to look into using a different scipy interpolation method that doesn't suffer from this issue. |
One of the xarray developers has responded to the issue I filed over there. They point out that using method='slinear' in xarray.interp() causes scipy to use a slighty different algorithm (spline interpolation with k=1), which seems less susceptible to the issue where output values slightly differ from input values when the input and output timestamps are identical. tinterpol() already supports method='slinear' . It just passes the argument as-is to xarray.interp(), and eventually to scipy.interp1d(). We should add a note to the tinterpol() documentation describing the issue, and when it might be appropriate to use 'slinear' over 'linear'. |
I've added a note about 'linear' vs. 'slinear' interpolation to the docstring of tinterpol(), and added a reference to scipy interp1d in the readthedocs documentation. The changes have already been pushed out to pyspedas.readthedocs.io, and will appear in the next pyspedas release, most likely within the next week or so. I would suggest using 'slinear' in the tinterpol calls for your plasmapy application....it should fix the issue with the negative values that were causing problems downstream. |
Following mostly the example mms-examples/advanced/Plasma calculations with PlasmaPy.ipynb, when applying tinterpol to electron temperture it sometimes created negative values.
I was able to track the issue to a specific FPI data quality flag bit 7 (low electron number density). This data flag is used when (after subtracting the spacecraft photoelectron signal) the ambient number density is extremely low (i.e. nonphysical). This low density, when used to produce the electron temperature produces very low temperature values. Than applying tinterpol to these very low tempertures can produce negative (very near zero) values. These negative values crash plasmapy.formulary.dimensionless.beta in calculating the plasma beta.
Easiest way to fix this would be include a check in tinterpol that takes the absolute value of any negative (very near zero) values. Could also make them exactly zero but I am unsure as to if plasmapy.formulary.dimensionless.beta can handle zero. Also should probably include some type of warning if this is preformed.
The text was updated successfully, but these errors were encountered: