Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different size of matrix using create_float_source for raw and adjusted data #218

Open
kamwal opened this issue May 5, 2022 · 11 comments
Open
Labels
argo-core About core variables (P, T, S) argo-deep About deep variables (anything below 2000db) forQCexpert Argo QC expertise is required invalid This doesn't seem right stale No activity over the last 90 days

Comments

@kamwal
Copy link

kamwal commented May 5, 2022

I have found a differences in size of the matrix generated by the ds.argo.create_float_source code for generating the Wong matrix .mat for OWC analysis. The differences appear for .mat matrix for raw data and adjusted data.

    ds.argo.create_float_source(force='raw')
    ds.argo.create_float_source(force='adjusted')

The equivalent matlab code to generate float source is giving the same size of output for raw and adjusted data https://github.com/euroargodev/dm_floats/blob/master/src/ow_source/create_float_source.m

The WMO float examples where the issue have been detected: 3901928, 3900797,3900799
The mismatch between size of matrices including raw and adjusted data lead to problems in extracting differences and comparing data during checks of the quality of adjusted data.

I am using the argopy v0.1.11 version.

@kamwal kamwal added invalid This doesn't seem right argo-core About core variables (P, T, S) argo-deep About deep variables (anything below 2000db) labels May 5, 2022
@gmaze
Copy link
Member

gmaze commented May 16, 2022

Hi @kamwal
Could you please share here the files generated with the float source Matlab code ?

@gmaze
Copy link
Member

gmaze commented May 16, 2022

@kamwal
I looked at the output for WMO= 3901928
The difference is in the selection of data, or not, from the last 8 profiles:
Screenshot 2022-05-16 at 08 53 05

And if I look to the netcdf file content with:

from argopy import DataFetcher as ArgoDataFetcher
WMO = 3901928
argo_loader = ArgoDataFetcher(src='gdac', cache=True, mode='expert', dataset='phy').float(WMO)
ds = argo_loader.load().data
dsp =  ds.argo.point2profile()
dsp.where(dsp['CYCLE_NUMBER']==164, drop=True)

I see the data mode to Delayed and the adjusted salinity full of NaNs with QC=4,
that's why the raw=adjusted option do not select these profiles
So I guess the question is more why the Matlab code select these ?

@kamwal
Copy link
Author

kamwal commented May 16, 2022

3901928.zip

Thanks for looking at this.

I think it is done to don't create any issues with a mismatch of the size of the matrix for all parameters. Sometimes for some floats the QC =4 is applied not to all parameters (PRES, SAL, TEMP) like here, but only to one parameter like PSAL.
Having the same size of matrices for raw and adjusted is easier for further comparison of these two datasets.

@gmaze
Copy link
Member

gmaze commented May 16, 2022

After discussion with @cabanesc , this appears to be motivated by the post analysis use of the .mat source files: the D netcdf files are created for profiles in the source files !
Hence no D files for profiles not reported in the source file (even if full of NaNs).

I don't know how OWC handle this, but we could fix argopy to make sure to report as many profiles as before all the filtering, and fields would be full of NaNs.

@kamwal
Copy link
Author

kamwal commented Jun 8, 2022

Yes, thanks it would be very helpful

@gmaze gmaze added this to the Go from alpha to beta milestone Jun 8, 2022
@gmaze
Copy link
Member

gmaze commented Jun 8, 2022

@kamwal note that I have no idea when I'll be able to fix this ...

@github-actions
Copy link

github-actions bot commented Sep 7, 2022

This issue was marked as staled automatically because it has not seen any activity in 90 days

@github-actions github-actions bot added the stale No activity over the last 90 days label Sep 7, 2022
@gmaze gmaze removed the stale No activity over the last 90 days label Sep 23, 2022
@gmaze
Copy link
Member

gmaze commented Nov 2, 2022

I don't know how OWC handle this, but we could fix argopy to make sure to report as many profiles as before all the filtering, and fields would be full of NaNs.

Although after some thoughts I'm not sure anymore if this is the way to go, since this approach is messing up matrix content with different file uses (OWC analysis vs D file production). Basically I'm cold feet with reproducing a flawed workflow based on the Matlab software.

@gmaze gmaze added the forQCexpert Argo QC expertise is required label Nov 2, 2022
@github-actions
Copy link

This issue was marked as staled automatically because it has not seen any activity in 90 days

@github-actions github-actions bot added the stale No activity over the last 90 days label Jan 31, 2023
Copy link

This issue was closed automatically because it has not seen any activity in 365 days

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2024
@gmaze gmaze reopened this Apr 15, 2024
@github-actions github-actions bot removed the stale No activity over the last 90 days label Apr 16, 2024
Copy link

This issue was marked as staled automatically because it has not seen any activity in 90 days

@github-actions github-actions bot added the stale No activity over the last 90 days label Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
argo-core About core variables (P, T, S) argo-deep About deep variables (anything below 2000db) forQCexpert Argo QC expertise is required invalid This doesn't seem right stale No activity over the last 90 days
Projects
None yet
Development

No branches or pull requests

2 participants