Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on Slicing Data! #4

Closed
ElizabethP712 opened this issue Jul 24, 2023 · 7 comments
Closed

Question on Slicing Data! #4

ElizabethP712 opened this issue Jul 24, 2023 · 7 comments

Comments

@ElizabethP712
Copy link
Collaborator

Question on Slicing Data Tables

So I have been able to figure out how to slice the profiles of a float based on the times they ascended. I was able to do it in a roundabout way with the .isel command to slice the profiles into the ones with the correct times similar to SMILE after doing trial and error, but I would like to automate the process with code.

I figured instead of using .isel, I would want to actually use .sel where you can select a range of coordinate values (which is what I thought time classifies as in the point2profile version of the ARGO data), instead of focusing on dimensions. However, I ran into an error and was wondering if you could help problem shoot (and/or give me some alternate methods of approach).

Here is the code that works:
WorkingCodeSlicing

Here is the code that is giving me issues:
IssueCodeSlicing

Thanks!

@ElizabethP712
Copy link
Collaborator Author

Oh, also I am using the ArgoDataFetcher as my source.
argo_loader = ArgoDataFetcher(src='erddap', parallel=True, qc=1)

@emiliom
Copy link
Owner

emiliom commented Jul 25, 2023

Do you have a notebook on the repo that you can point me to?

In the meantime, after some research on this, here's a suggestion. Try doing data1 = data1.set_xindex("TIME") before your sliced_data statement. If it works, I'll explain.

@ElizabethP712
Copy link
Collaborator Author

Yes, I just uploaded a cleaner version of the notebook I was using (there were plenty other things I was trying that were not working, so let me know if you want access to that one too). It should be called "GitHubIssues(Clean Version)"! I will go ahead and try what you gave me as well.

@ElizabethP712
Copy link
Collaborator Author

That worked! I noticed that "TIME" was added to the indexes now. From my own research online, I have come to understand indexes as the data that is filtered or limited along a certain dimension of the dataset (which is how the data is organized). But if xindex is now TIME, what would N_PROF and N_LEVELS be? Is x related at all to the coordinates x,y,z or dimensions?

@emiliom
Copy link
Owner

emiliom commented Jul 25, 2023

That worked!

Woo-hoo!! FYI I didn't know about set_xindex. I actually had a need for something like it myself a few days ago, but didn't land on that solution. I will definitely use it in the future.

I noticed that "TIME" was added to the indexes now. From my own research online, I have come to understand indexes as the data that is filtered or limited along a certain dimension of the dataset (which is how the data is organized). But if xindex is now TIME, what would N_PROF and N_LEVELS be? Is x related at all to the coordinates x,y,z or dimensions?

First, the x doesn't refer to an x coordinate or dimension, though that's a very reasonable assumption. Instead, it seems to stand for "Xarray-compatible index". See https://docs.xarray.dev/en/stable/generated/xarray.DataArray.set_xindex.html

Your understanding of indexes is on the right track. More specifically, indexes are set up for dimensions coordinates to enable convenient and fast filtering using .sel. I've learned that an index is automatically set up for each coordinate that has dimension with a matching name, such as N_PROF. Those are the bold fonts or stars that you see in your screenshots. Indexes for "non-dimension" coordinates are not automatically generated, and .set_xindex does that manually.

@emiliom
Copy link
Owner

emiliom commented Jul 25, 2023

Yes, I just uploaded a cleaner version of the notebook I was using (there were plenty other things I was trying that were not working, so let me know if you want access to that one too). It should be called "GitHubIssues(Clean Version)"! I will go ahead and try what you gave me as well.

Thanks. Very nice and clean notebook, and kudos for running it "top to bottom" before sharing it (ie, cell numbers start at 1 and increment sequentially)!

@ElizabethP712
Copy link
Collaborator Author

I am glad I was of help discovering a new line of code too that was already in play. And I do intend on uploading more notebooks (I have a lot of them), but I need to clean and shorten many of them that had random explorations thrown in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants