Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value Error of conflicting sizes for dimension 'time' and 'distance over ground' #116

Closed
joslater opened this issue Sep 22, 2022 · 16 comments · Fixed by #118
Closed

Value Error of conflicting sizes for dimension 'time' and 'distance over ground' #116

joslater opened this issue Sep 22, 2022 · 16 comments · Fixed by #118

Comments

@joslater
Copy link

In running the example-slocum scripts using the example data, a value error arises which reads "ValueError: conflicting sizes for dimension 'time': length 3487 on 'distance_over_ground' and length 3489 on {'time': 'time'}." I have not manipulated the example scripts given at all and have the required packages installed. Attached is a screenshot of the terminal after running process_deploymentRealTime.py.
Screen Shot 2022-09-22 at 12 36 20 PM

@jklymak
Copy link
Member

jklymak commented Sep 22, 2022

Hi @joslater, I recently had similar issues and fixed it in #113 to #115 - do you have the ability to try the main branch? (download the code into a pyglider directory, and then do pip install -e . to do an "editable" pip install.).

@joslater
Copy link
Author

Ah I see, my installation was not "editable" and therefore did not reflect the recent change. Thank you for your help!

@jklymak
Copy link
Member

jklymak commented Sep 22, 2022

Does it work for you now? Useful to know that applies to other derived quantities....

@joslater
Copy link
Author

It works perfectly now! Thanks again!

@joslater
Copy link
Author

I may have spoken too soon. After updating, the example data and two of my other datasets worked great. I then tried one other and ran into the same issue, except the variable causing the issue was 'conductivity' instead of 'distance over ground.'
Screen Shot 2022-09-22 at 2 55 38 PM

@jklymak
Copy link
Member

jklymak commented Sep 23, 2022

OK, right now we are doing

if len(time) < ds.sizes['time']:
  ...

but maybe it should just be a strict not equals? Can you change that line in your dev version to if len(time) != ds.sizes['time'], and if that works, open a PR with the change?

@jklymak jklymak reopened this Sep 23, 2022
@joslater
Copy link
Author

I tried making that change and I am now met with this error "ValueError: cannot reindex or align along dimension 'time' because the (pandas) index has duplicate values"

@jklymak
Copy link
Member

jklymak commented Sep 23, 2022

Does the data set have sci_m_present_time? Also, suggest you look at the values of time that are returned from the conductivity sensor and see if they make sense; I'd just add print statements just above this point, but you could also save or plot time and ds.time. I've not seen a case where there were duplicate values - I imagine there is a way to reindex dropping the duplicate indices, but it would be good to understand what is happening.

What I think is happening is that you don't have sci_m_present_time, and the science computer has more data than the nav computer, and dbdreader is interpolating the times, but at the end and beginning just tacking on a bunch of duplicate times. That should be easy to work around, but it would be nice to understand the exact nature of the problem before throwing data out....

@joslater
Copy link
Author

I believe the dataset does have sci_m_present_time. I haven't gotten a chance to delve into the times yet but I will be able to, as well as confirm the sci_m_present_time, early next week.

@jklymak
Copy link
Member

jklymak commented Sep 23, 2022

@joslater sure no rush - you could also share the offending data set. Finally we have occasionally just not processed the first few files because the glider was sitting on the dock anyway or something.

@joslater
Copy link
Author

I haven't been able to find anything out of the ordinary. I can share the dataset with you. How would you like me to share it and would you just like the delayed raw data and cache files?

@jklymak
Copy link
Member

jklymak commented Sep 28, 2022

Share in whatever way is easiest for you other than email 😀

@joslater
Copy link
Author

Here is the link to a private OneDrive folder with the data: https://wmedu-my.sharepoint.com/:f:/g/personal/joslater_wm_edu/Erri1Opekx1GmDcjzDssA7oBto1RqKOC0J_bUjfNGZU_JA?email=jklymak%40gmail.com&e=CDj0pG Let me know if this works!

@jklymak
Copy link
Member

jklymak commented Sep 29, 2022

OK, may take a bit of re-architecting, but I think this is relatively easy to fix. Do you have a yaml file that works for this?

@joslater
Copy link
Author

Awesome! I added a yaml file to the shared folder. Let me know if there's anything else I can do!

@jklymak jklymak linked a pull request Oct 2, 2022 that will close this issue
@jklymak
Copy link
Member

jklymak commented Oct 2, 2022

@joslater this seems to get past the first step in your files. Your files seem to have numerous issues with them. You may need to pre-process the initial time series file before moving onto the gridding steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants