Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add headers paramater to read_json and read_csv #36754

Closed
wants to merge 7 commits into from

Conversation

Antetokounpo
Copy link

@Antetokounpo Antetokounpo commented Sep 30, 2020

This adds the option to specify headers when reading a csv or a json file from an URL in Python3.
Let me know if new tests are needed.

@pep8speaks
Copy link

pep8speaks commented Sep 30, 2020

Hello @Antetokounpo! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-10-03 17:20:18 UTC

@rhshadrach rhshadrach added Enhancement IO CSV read_csv, to_csv IO JSON read_json, to_json, json_normalize labels Oct 1, 2020
return urllib.request.urlopen(*args, **kwargs)
# Request class is only available in Python3, which
# allows headers to be specified
if hasattr(urllib.request, "Request"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas no longer supports Python2

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the check then.

@@ -176,6 +183,7 @@ def get_filepath_or_buffer(
compression: CompressionOptions = None,
mode: ModeVar = None, # type: ignore[assignment]
storage_options: StorageOptions = None,
headers: dict = {},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more specific using Dict? What types are the keys and values?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's HTTP Headers, so it should be str for keys, and the type for value can vary I think.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests for the change? That is typically the first thing we look for as reviewers

Also not sure about adding this to the API - what values would a user typically provide here? Documenting that in the docstrings would help for sure

@@ -148,9 +148,9 @@ def urlopen(*args, **kwargs):
Lazy-import wrapper for stdlib urlopen, as that imports a big chunk of
the stdlib.
"""
import urllib.request
from urllib.request import Request, urlopen as _urlopen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it makes the code more concise, but if you think the old way is better I can change it back.

@WillAyd
Copy link
Member

WillAyd commented Oct 23, 2020

I'm not sure we would add new arguments to the function signatures like this. There has been some updates to the original issue and I think would prefer to just document how this can be handled instead of changing these functions - can you adjust the PR accordingly?

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Nov 23, 2020
@arw2019
Copy link
Member

arw2019 commented Nov 27, 2020

Closing in favor of #37966

@arw2019 arw2019 closed this Nov 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO CSV read_csv, to_csv IO JSON read_json, to_json, json_normalize Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Change Pandas User-Agent and add possibility to set custom http_headers to pd.read_* functions
5 participants