Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observations to/from Pandas DataFrame #542

Closed
arky opened this issue Jan 20, 2024 · 5 comments
Closed

Observations to/from Pandas DataFrame #542

arky opened this issue Jan 20, 2024 · 5 comments
Labels
enhancement New feature or request
Milestone

Comments

@arky
Copy link

arky commented Jan 20, 2024

Please kindly provide helper functions to quickly turn get_observations into a pandas dataframe. Similar function could enable users to quickly feed the values of dataframe into create_observations

Rationale: Pandas is the swiss army knife of data processing, it would enable experience users to do analysis of data extracted from iNaturalist. This feature enables users who have found the currently helper functions for creating histograms, etc limiting for their use cases.

Use case

Problem: Extract your observations and sort them based on observed_at and created_at to find any duplicates.

# Replace with your own username
USERNAME = 'jkcook'

response = get_observations(user_id=USERNAME, page='all')
my_observations = Observation_df.from_json_list(response)
my_observations #This is now a Pandas Dataframe

Another usecase would be to quickly create large amount of observations from a CSV file/XLS file.

 df = pandas.read_csv('myfile.csv')
create_observation(
     access_token=token
     from_df = df
)

I think for doing bulk uploads pandas dataframes are ideal as you can track which observations have been uploads and restarting dataframe with index number of the row..

@arky arky added the enhancement New feature or request label Jan 20, 2024
@JWCook
Copy link
Member

JWCook commented Jan 23, 2024

I have some data conversion utilities over here, which will do part of what you want: https://github.com/pyinat/pyinaturalist-convert (full docs here). It's in a separate library because it has quite a few extra dependencies.

Example of loading observations into a dataframe:

from pyinaturalist import get_observations
from pyinaturalist_convert import to_dataframe

response = get_observations(user_id='jkcook', page='all')
df = to_dataframe(response)

P.S., there's a "secret" namespace alias pyinat that includes modules from both libraries (if installed):

from pyinat import get_observations, to_dataframe

That flattens out several pieces of nested data (photos, identifications, taxon, etc.) to make it a bit easier to work with and save in a tabular format. From there, you can export and re-read it in whatever data format you prefer. Personally I've found parquet to be the most useful for observation data (in terms of performance and disk usage for larger datasets). Example:

import pandas as pd

df.to_parquet('observations.parquet')
df = pd.read_parquet('observations.parquet')

I don't yet have any features for turning that back into a format for creating/updating observations, but that's something I could add. I may not have time to work on that this week, but I do have some ideas, so I'll get back to you on that.

@arky
Copy link
Author

arky commented Jan 23, 2024

@JWCook Whoa! thanks I am going to explore more. Thanks!

Perhaps just documenting the availability of these pandas features would be enough for now.

@JWCook
Copy link
Member

JWCook commented Jan 23, 2024

Let me know if that dataframe format isn't exactly what you need. So far that library has mainly been tailored for my own usage, but I'm definitely willing to make changes there to accommodate other use cases.

As for docs, I was thinking of putting together a tutorial notebook that uses both of these libraries... but it seems like every time I start on that, I find something else I want to fix or polish first before showing it off!

@arky
Copy link
Author

arky commented Jan 24, 2024

@JWCook Perhaps https://github.com/pyinat/pyinaturalist-convert project could be called as pyinaturalist-utils and add all extraneous special case features such as this one to it.

What do you reckon?

@arky
Copy link
Author

arky commented Mar 8, 2024

@JWCook Recommend closing this issue as well.

@JWCook JWCook closed this as completed Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants