-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Observations to/from Pandas DataFrame #542
Comments
I have some data conversion utilities over here, which will do part of what you want: https://github.com/pyinat/pyinaturalist-convert (full docs here). It's in a separate library because it has quite a few extra dependencies. Example of loading observations into a dataframe: from pyinaturalist import get_observations
from pyinaturalist_convert import to_dataframe
response = get_observations(user_id='jkcook', page='all')
df = to_dataframe(response) P.S., there's a "secret" namespace alias from pyinat import get_observations, to_dataframe That flattens out several pieces of nested data (photos, identifications, taxon, etc.) to make it a bit easier to work with and save in a tabular format. From there, you can export and re-read it in whatever data format you prefer. Personally I've found parquet to be the most useful for observation data (in terms of performance and disk usage for larger datasets). Example: import pandas as pd
df.to_parquet('observations.parquet')
df = pd.read_parquet('observations.parquet') I don't yet have any features for turning that back into a format for creating/updating observations, but that's something I could add. I may not have time to work on that this week, but I do have some ideas, so I'll get back to you on that. |
@JWCook Whoa! thanks I am going to explore more. Thanks! Perhaps just documenting the availability of these pandas features would be enough for now. |
Let me know if that dataframe format isn't exactly what you need. So far that library has mainly been tailored for my own usage, but I'm definitely willing to make changes there to accommodate other use cases. As for docs, I was thinking of putting together a tutorial notebook that uses both of these libraries... but it seems like every time I start on that, I find something else I want to fix or polish first before showing it off! |
@JWCook Perhaps https://github.com/pyinat/pyinaturalist-convert project could be called as pyinaturalist-utils and add all extraneous special case features such as this one to it. What do you reckon? |
@JWCook Recommend closing this issue as well. |
Please kindly provide helper functions to quickly turn
get_observations
into a pandas dataframe. Similar function could enable users to quickly feed the values of dataframe intocreate_observations
Rationale: Pandas is the swiss army knife of data processing, it would enable experience users to do analysis of data extracted from iNaturalist. This feature enables users who have found the currently helper functions for creating histograms, etc limiting for their use cases.
Use case
Problem: Extract your observations and sort them based on
observed_at
andcreated_at
to find any duplicates.Another usecase would be to quickly create large amount of observations from a CSV file/XLS file.
I think for doing bulk uploads pandas dataframes are ideal as you can track which observations have been uploads and restarting dataframe with index number of the row..
The text was updated successfully, but these errors were encountered: