Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Preserve attrs in to_dataframe() #5335

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

snowman2
Copy link
Contributor

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @snowman2 !

I added one question.

attrs={"long_name": "Description of data array", "_FillValue": -1},
)
df = arr.to_dataframe()
assert df[df.columns[0]].attrs == {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these set per column or for the whole dataframe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both actually. It is on the dataframe and the series.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test for both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a test for both in xarray/tests/test_dataset.py. I don't believe it is applicable in xarray/tests/test_dataarray.py .

Copy link
Collaborator

@max-sixty max-sixty May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive me if I'm confusing things.

How does pandas handle attrs? Here, it looks like it's the same on both the series and the dataframe — is that always the case? Or do we need to test both?

In [8]: df.attrs = dict(a=2)

In [9]: df
Out[9]:
     test
y x
1 2   0.0
  3   0.0
  4   0.0
  5   0.0
  6   0.0
2 2   0.0
  3   0.0
  4   0.0
  5   0.0
  6   0.0
3 2   0.0
  3   0.0
  4   0.0
  5   0.0
  6   0.0
4 2   0.0
  3   0.0
  4   0.0
  5   0.0
  6   0.0
5 2   0.0
  3   0.0
  4   0.0
  5   0.0
  6   0.0

In [10]: df.attrs
Out[10]: {'a': 2}

In [11]: df['test'].attrs
Out[11]: {'a': 2}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more tests to ensure the expected behavior occurs.

)
df = ds.to_dataframe()
assert df.attrs == {"test": "test"}
assert df[df.columns[0]].attrs == {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, it looks like it's the same on both the series and the dataframe — is that always the case?

I hadn't checked that behavior. It is different here. 🤷‍♂️

@dcherian
Copy link
Contributor

dcherian commented May 19, 2021

Thanks @snowman2

If you're interested, see #3497 for the inverse problem of using pandas attrs when constructing Xarray objects (in a future PR) :)

@max-sixty
Copy link
Collaborator

Was there any reason this stalled? It looked like a good start!

@max-sixty max-sixty added the plan to close May be closeable, needs more eyeballs label Aug 28, 2024
@keewis
Copy link
Collaborator

keewis commented Aug 28, 2024

I believe the issue is that pandas.DataFrame does not support column attrs (or did not? I didn't check whether that changed since then). DataFrame-level attrs should work, though.

@dcherian
Copy link
Contributor

They were thinking of removing it at one point: pandas-dev/pandas#52166, also dask/dask#11146

perhaps we should punt until someone really really wants it?

@max-sixty
Copy link
Collaborator

Yes, looks like the conclusion from the pandas issue is they want to keep it but the support is spotty.

Probably we close this unless someone comes to save it, but I would vote to merge a PR that did this — I can't see a downside...

@giovp
Copy link

giovp commented Sep 15, 2024

hi, I just saw that this discussion has been picking up. I work on the framework mentioned in this comment pandas-dev/pandas#52166 (comment) and we would be very happy if the dataframe-level attrs would be added back to dask-dataframes. We don't use the column-level attrs but do use the df-level attrs. Currently, the solution we implement to use the latest dask is that we ask users to change the configs like so

dask.config.set({'dataframe.query-planning': False})

Would this PR preserve the attrs also in the dask-expr backend?

@headtr1ck headtr1ck added needs discussion and removed plan to close May be closeable, needs more eyeballs labels Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Preserve attrs when converting to pandas dataframe
6 participants