-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
variableOffsetPP bug #734
Comments
The code clearly assumes time_deployment_start == time_turned_on. Maybe it was the case before. Your solution is fine, but will still break if the instrument is only turned on after 25% of the time series. Likely never the case but still possible to raise an error.
I completely agree. I would ignore all the information in the UI window. The logic for the difference here doesn't make any sense. I wonder if we should get rid of it. In particular, I can't fathom how one would set an offset/scale - for the whole time series - using as an informational basis the difference of two instrument values, over a day of sampling, which is arbitrarily set to a day after 25% of the series. This is even worse because the UI is not interactive and if you play with the offset/values nothing happens, so no clue about the actual transformation outcomes until you got to the main window. My rationale of why this routine exists is for the basic transformation of data or simple recalibration (e.g. CPHL), where people already fitted a model to the data and know well enough what they are doing (they already visualized, fitted a model, etc). Thinking a bit more, even the min/max/mean are useless here since they cannot be used alone to define any linear parameter. For example, even in the simplest case - say we know a variable cannot be negative -to just choose an offset to get rid of negative values is a dubious step. You need more information, more interactivity, more visualisation to make the decision. Hence, most of the information in the window is incomplete at best. |
I am using it to apply a depth offset to aquadopp instruments where I don't have enough other instruments with good pressure to infer a depth. Getting the timeDriftPP values right also requires a few restarts to make sure the correct values are applied in the PP routine. |
Exactly. On one hand, when you need to use the routine, you already loaded the variables, and you already know the parameters by using/fitting the data somewhere else. In this case, the UI information is useless - you already made a decision. On the other, when you don't know yet that you need to scale the variable (first load), the information is useless too because you can't decide just on that information. This is even worse, given the data state is close to raw (e.g. include out of water data).
Yeap - I know your pain :) We shouldn't allow data transformation before looking at the data or doing quality control. The workflow related to these PP routines is clearly broken. The right thing to do is:
I have raised this in the past, but it's always hard to commit to it given the backlog of things and the difficulty to estimate the effort required. |
When re-processing some older data, I have an instrument which didn't switch on until some time after deployment. When I used the variableOffsetPP routine, it failed at line 194 for this instrument because timeForDiff is set to the time _deployment_start value.
Looking at the comments in this code, it does suggest that the timeForDiff value comes from 'differences over a day from ¼ of deployment'. So I'm not sure why the timeForDiff is then reset to time_deployment_start.
I have commented out lines 175-188.
I think the time selection step for getting this information could be problematic in many instances when an instrument fails or has no data on that particular day.
The gui window that pops up with a summary of the min/max/mean and differences from adjacent instruments could do with some work. It is confusing (min and max and mean appear to be for the whole deployment while the diff is for just the day ¼ of the way in). I'm not sure of the value of this information from my perspective.
However, the routine is useful for correcting offsets and drifts.
We could avoid all potential bugs with the time period selection by not including it. How many others use this routine?
The text was updated successfully, but these errors were encountered: