Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: 'index=True' is only valid when 'orient' is 'split', 'table', 'index', or 'columns' #6197

Closed
exs-avianello opened this issue Aug 31, 2023 · 3 comments · Fixed by #6200
Assignees

Comments

@exs-avianello
Copy link

Describe the bug

Saving a dataset .to_json() fails with a ValueError since the latest pandas release (2.1.0)

In their latest release we have:

Improved error handling when using DataFrame.to_json() with incompatible index and orient arguments (GH 52143)

i.e. an error is now raised for invalid combinations of index and orient.

This means that unfortunately the custom logic at this line might sometimes lead to contradictions:

index = self.to_json_kwargs.pop("index", False if orient in ["split", "table"] else True)

e.g. for the default case orient=records leads to index=True, which now raises a ValueError

Steps to reproduce the bug

import datasets


if __name__ == '__main__':

    dataset = datasets.Dataset.from_dict({"A": [1, 2, 3], "B": [4, 5, 6]})
    dataset.to_json("dataset.json")
>>>
ValueError: 'index=True' is only valid when 'orient' is 'split', 'table', 'index', or 'columns'.

Expected behavior

The dataset is successfully saved as .json

Environment info

python >= 3.9
pandas >= 2.1.0

@albertvillanova albertvillanova self-assigned this Aug 31, 2023
@albertvillanova
Copy link
Member

Thanks for reporting. We are investigating it.

@albertvillanova
Copy link
Member

This issue is caused by latest pandas release 2.1.0 (released yesterday Aug 30).

See: https://github.com/huggingface/datasets/actions/runs/6035484010/job/16375932085?pr=6198

@albertvillanova
Copy link
Member

albertvillanova commented Sep 1, 2023

People using previous releases of datasets should pin pandas in their local environment:

python -m pip install 'pandas<2.1.0'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants