-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Preserve attrs in to_dataframe() #5335
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @snowman2 !
I added one question.
xarray/tests/test_dataarray.py
Outdated
attrs={"long_name": "Description of data array", "_FillValue": -1}, | ||
) | ||
df = arr.to_dataframe() | ||
assert df[df.columns[0]].attrs == { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these set per column or for the whole dataframe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both actually. It is on the dataframe and the series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we test for both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a test for both in xarray/tests/test_dataset.py
. I don't believe it is applicable in xarray/tests/test_dataarray.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive me if I'm confusing things.
How does pandas handle attrs? Here, it looks like it's the same on both the series and the dataframe — is that always the case? Or do we need to test both?
In [8]: df.attrs = dict(a=2)
In [9]: df
Out[9]:
test
y x
1 2 0.0
3 0.0
4 0.0
5 0.0
6 0.0
2 2 0.0
3 0.0
4 0.0
5 0.0
6 0.0
3 2 0.0
3 0.0
4 0.0
5 0.0
6 0.0
4 2 0.0
3 0.0
4 0.0
5 0.0
6 0.0
5 2 0.0
3 0.0
4 0.0
5 0.0
6 0.0
In [10]: df.attrs
Out[10]: {'a': 2}
In [11]: df['test'].attrs
Out[11]: {'a': 2}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more tests to ensure the expected behavior occurs.
xarray/tests/test_dataset.py
Outdated
) | ||
df = ds.to_dataframe() | ||
assert df.attrs == {"test": "test"} | ||
assert df[df.columns[0]].attrs == { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, it looks like it's the same on both the series and the dataframe — is that always the case?
I hadn't checked that behavior. It is different here. 🤷♂️
Was there any reason this stalled? It looked like a good start! |
I believe the issue is that |
They were thinking of removing it at one point: pandas-dev/pandas#52166, also dask/dask#11146 perhaps we should punt until someone really really wants it? |
Yes, looks like the conclusion from the pandas issue is they want to keep it but the support is spotty. Probably we close this unless someone comes to save it, but I would vote to merge a PR that did this — I can't see a downside... |
hi, I just saw that this discussion has been picking up. I work on the framework mentioned in this comment pandas-dev/pandas#52166 (comment) and we would be very happy if the dataframe-level
Would this PR preserve the |
pre-commit run --all-files
whats-new.rst