read_csv from HTTPs + basic-auth + custom port throws an error (urlopen error) #16716

skynss · 2017-06-17T00:35:07Z

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.read_csv('https://user-name:pwd@my.fqdn.com:8080/get_content.csv')
# fails with urlopen error [Errno 11003] getaddrinfo failed Python2.7 as well as Python 3.x

Problem description

HTTPS basic auth is very common. This URL format works in Excel, other text editors, etc. This url works in requests library. It seems like the scenario doesnt work because the underlying urlopen doesnt work

# I think it fails because the underlying code here fails in pandas/io/common.py:
from urllib2 import urlopen as _urlopen
b = _urlopen('https://un-name:pwd@my.fqdn.com:8080/get_content.csv')
# fails with urlopen error [Errno 11003] getaddrinfo failed

see stackoverflow issue urllib basic auth
Only way to overcome this is using requests + StringIO ?

Expected Output

be able to get a CSV loaded dataframe

Output of `pd.show_versions()`

python: 2.7.13.final.0 python-bits: 64 pandas: 0.20.1

The text was updated successfully, but these errors were encountered:

gfyoung · 2017-06-18T20:20:09Z

HTTPS basic auth is very common

Apparently not with pandas 😉

But joking aside, that's an awkward problem to have (can't replicate since I have no endpoint against which to test this). Also, this is not easy to test without publicly providing credentials to an endpoint.

I read the SO post, and I was wondering: can you confirm that requests was the only way for you to get the request to work? I read a bunch of solutions in that discussion, so I just wanted to check.

If that is the case, I suppose we could implement a wrapper that makes the request and then returns the response content in place of urlopen, though doing it in a clean way will be the challenge. Not to mention, we would need to explicitly add requests as a Python 2.x dependency.

skynss · 2017-06-18T21:00:39Z

With requests and StringIO ot works well. I am travelling today but over next 1-2 days i can post a reproducable test url or simple reprocode. I suspect it repros with http and no real auth.. because i suspect this is simply a parsing error in hostname and port even before outbound call initiates. Can confirm in next 1-2 days when i am back unless you beat me to it

gfyoung · 2017-06-18T21:15:55Z

Yeah if you could provide repro code that would be useful since it is hard to reproduce without our own endpoint. However, if you could confirm none of the other solutions proposed in the SO work besides the requests ones, that would also be useful.

skynss · 2017-06-19T07:35:52Z

It appears to me that an existing url is not needed, because the issue is just in the parsing of the URL. eg:

df = pd.read_csv('http://username:pwd@cnn.com:8080/get_content.csv')
# returns URLError: <urlopen error [Errno 11003] getaddrinfo failed>
df = pd.read_csv('http://username:pwd@cnn.com/get_content.csv')
# returns InvalidURL: nonnumeric port: 'pwd@cnn.com' ( failing to find port number)

However, simplest way to fake it is:
Create a text file: aaa.csv and put some csv contents

with open('/path1/aaa.csv', 'w') as f:
  f.write( 'animal,bird\ncat,pigeon\nmonkey,swan')

cd /path1
run

python -m SimpleHTTPServer 8080

Now the url http://localhost:8080/aaa.csv should exist

import requests
import pandas as pd
u1 = 'http://localhost:8080/aaa.csv'
print( requests.get(u1).text) # as expected prints contents of aaa.csv
df = pd.read_csv(u1)  # as expected, loads contents of aaa.csv into df

u2 = 'http://uname:pwd@localhost:8080/aaa.csv'
print( requests.get(u2).text) # as expected prints contents of aaa.csv (ignores unnecessary uname pwd)
df = pd.read_csv(u2)  # URLError: <urlopen error [Errno 11003] getaddrinfo failed>

I'll provide a working positive test case url later tomorrow once I set it up. It would be good to engineer a fix to allow self-signed certs for testing too - eg: requests has verify=False. Helpful for testing.

TomAugspurger · 2017-06-19T11:14:28Z

So since we can't use requests, we would follow https://stackoverflow.com/a/4188709/1889400

detect and extract the username / password in the url
setup a urllib.Request context for the (stripped) url in

pandas/pandas/io/common.py

Line 197 in 125c414

if _is_url(filepath_or_buffer):
encode the user / password and add to the Authorization header
urlopen the request context

gfyoung · 2017-06-19T14:46:51Z

@TomAugspurger :

Why can't we use requests ?
Does this work with the localhost repro example that @skynss provided?

TomAugspurger · 2017-06-19T14:50:27Z

Why can't we use requests ?

This alone isn't worth adding it as a dependency I think.

gfyoung · 2017-06-19T14:52:36Z

This alone isn't worth adding it as a dependency I think.

That's very different from "we can't use requests"

For starters, it's builtin with Python 3.x, so we would only be adding it as a dependency for Python 2.x. However, if we could find a way to use just urlopen, that would be nice though.

TomAugspurger · 2017-06-19T14:53:54Z

Does this work with the localhost repro example that @skynss provided?

Can't really test that since I'd have to write the implementation first :) I think it'll work though.

For starters, it's builtin with Python 3.x,

I don't think it is... It's documented as the recommended way for making high-level http calls though (but still requires a separate install).

gfyoung · 2017-06-19T14:56:15Z

I don't think it is... It's documented as the recommended way for making high-level http calls though (but still requires a separate install).

Oh right, I stand corrected. 😄

skynss · 2017-06-19T20:15:41Z

Here are working urls using self signed certs. Would be good to get it to work with self-signed certs too as that can be very useful especially in testing. Please let me know once you are done testing.. so I can shut down the demo because thats all it is up for.

import pandas as pd
import requests
from io import StringIO

# both urls use self signed cert. Both will remain working for few days
u1 = 'https://pandasusr:pandaspwd@pandastest.mooo.com:5000/aaa.csv' # non default ssl port 
u2 = 'https://pandasusr:pandaspwd@pandastest.mooo.com/aaa.csv' # default ssl port


r1 = requests.get(u1, verify=False)
print(r1.text) # prints ok
df1 = pd.read_csv(StringIO(r1.text))  # works

r2 = requests.get(u1, verify=False)
print(r2.text) # prints ok
df2 = pd.read_csv(StringIO(r2.text)) # works

# without requests
df1 = pd.read_csv(u1) # URLError: <urlopen error [Errno 11003] getaddrinfo failed>

df2 = pd.read_csv(u2)  # InvalidURL: nonnumeric port: 'pandaspwd@pandastest.mooo.com'

skynss · 2017-06-30T21:26:45Z

Is anyone going to use the live endpoints referred above (and below) to repro and test? If not, I will shut down the server in the next 4 days. I havent seen anyone attempt to use it over past 11 days.


u1 = 'https://pandasusr:pandaspwd@pandastest.mooo.com:5000/aaa.csv' # non default ssl port 
u2 = 'https://pandasusr:pandaspwd@pandastest.mooo.com/aaa.csv' # default ssl port

gfyoung · 2017-06-30T21:29:25Z

@skynss : Sorry about that! I imagine that outside work has caught up with a bunch of us (including myself). I'll see if I can look at it later today.

gfyoung · 2017-07-01T10:24:15Z

@skynss : Can replicate the issues you were experiencing.

skynss · 2017-07-02T04:47:28Z

Please run the source code I pasted above.. and I can replicate the issues I am experiencing.

gfyoung · 2017-07-02T04:48:33Z

Please run the source code I pasted above.. and I can replicate the issues I am experiencing.

Did you read the comment I made above here?

skynss · 2017-07-02T23:05:15Z

I am not sure which comment you are referring to.. as the link doesnt work. But trying my best to answer..

I read the SO post, and I was wondering: can you confirm that requests was the only way for you to get the request to work? I read a bunch of solutions in that discussion, so I just wanted to check

The only way I know that worked for me is 1) get txt content 2) load it in StringIO 3) give the StringIO buffer to pandas to read. For step 1) to get text content, I imagine any method would work. I used requests and that worked. And the repro code above follows that step. And I just verified that if I copy paste the repro code, it replicates the issue.

If I didnt answer your question, kindly re-state your question.

gfyoung · 2017-07-03T01:16:18Z

@skynss : Not sure why the link doesn't work. It's just a URL to an earlier comment I made. However, I'm wondering if you can access those files by passing into urlopen authentication as @TomAugspurger mentioned here.

A solution that doesn't use requests would be to do that instead, so that works for you, we could go that way.

skynss · 2017-07-03T02:04:11Z

The following working code does not depend on requests library. Just copy / paste and run to verify.

import urllib2, base64, ssl
from urlparse import urlparse
#from io import StringIO # python 3.x
from StringIO import StringIO
import pandas as pd

def split_uname_from_url(url_with_uname):
	o = urlparse( url_with_uname)
	uname = o.username
	pwd = o.password
	# create url without username and pwd
        usrch = '{}:{}@{}'.format( uname, pwd, o.hostname)
	url_no_usrpwd = url_with_uname.replace( usrch , o.hostname)
	return uname, pwd, url_no_usrpwd

def get_https_basic_auth_ignore_invalid_cert( url_with_uname, verify_ssl=True):
	uname, pwd, url_no_usrpwd = split_uname_from_url(url_with_uname)
	print('Calling [{}] -- uname:[{}] -- pwd[{}]'.format(url_no_usrpwd, uname, pwd))
	request = urllib2.Request( url_no_usrpwd )
	base64string = base64.encodestring('%s:%s' % (uname, pwd)).replace('\n', '')
	request.add_header("Authorization", "Basic %s" % base64string)   
	# I hope pandas can support self signed certs too 
        # because it is very difficult to get official SSL certs in testing scenarios
	if verify_ssl:
		result = urllib2.urlopen(request) 
	else: # in case of self signed SSL certificates. 
		result = urllib2.urlopen(request, context=ssl._create_unverified_context() )
	txt = result.read()
	return txt


url_with_uname = 'https://pandasusr:pandaspwd@pandastest.mooo.com:5000/aaa.csv'
csv_txt = get_https_basic_auth_ignore_invalid_cert( url_with_uname, verify_ssl=False) 
df = pd.read_csv( StringIO(csv_txt.strip()) ) # forgot to close the StringIO buffer

gfyoung · 2017-07-07T07:53:35Z

@skynss @gfyoung : This works, but it's very Python 2-oriented and would need to be generalized for Python 3. I think if you can do that, that would make things a lot easier to incorporate.

skynss · 2017-07-07T17:48:17Z

The following code:

Works on py 2.x py 3.x (verified on 3.6.x)
Is live working demo that you can instantly test by simply copy and paste.
does not depend on requests library
works with both valid signed SSL certs as well as allow bypass of verification for self-signed SSL certs

import sys
import ssl

def split_uname_from_url(url_with_uname):
    try:
        from urlparse import urlparse
    except:
        from urllib.parse import urlparse
    o = urlparse( url_with_uname)
    uname = o.username
    pwd = o.password
    # create url without username and pwd   
    usrch = '{}:{}@{}'.format( uname, pwd, o.hostname)
    url_no_usrpwd = url_with_uname.replace( usrch , o.hostname)
    return uname, pwd, url_no_usrpwd

def get_https_basic_auth_ignore_invalid_cert( url_with_uname, verify_ssl=True):
    uname, pwd, url_no_usrpwd = split_uname_from_url(url_with_uname)
    print('Calling [{}] -- uname:[{}] -- pwd[{}]'.format(url_no_usrpwd, uname, pwd))
    if sys.version_info[0] < 3:
        fn= get_py2_https_basic_auth_ignore_invalid_cert
    else:
        fn = get_py3_https_basic_auth_ignore_invalid_cert
    return fn( uname, pwd, url_no_usrpwd, verify_ssl=verify_ssl)

def get_py2_https_basic_auth_ignore_invalid_cert( uname, pwd, url_no_usrpwd, verify_ssl=True):
    import urllib2, base64
    request = urllib2.Request( url_no_usrpwd )
    base64string = base64.encodestring('%s:%s' % (uname, pwd)).replace('\n', '')
    request.add_header("Authorization", "Basic %s" % base64string)   
    # I hope pandas can support self signed certs too 
    if verify_ssl:
        result = urllib2.urlopen(request) 
    else: # in case of self signed SSL certificates. 
        result = urllib2.urlopen(request, context=ssl._create_unverified_context() )
    return result.read()

def get_py3_https_basic_auth_ignore_invalid_cert( uname, pwd, url_no_usrpwd, verify_ssl=True):
    import urllib.request
    passman = urllib.request.HTTPPasswordMgrWithDefaultRealm()
    passman.add_password(None, url_no_usrpwd, uname, pwd)
    authhandler = urllib.request.HTTPBasicAuthHandler(passman)
    if verify_ssl:
        opener = urllib.request.build_opener(authhandler)
    else:
        context = ssl.create_default_context()
        context.check_hostname = False
        context.verify_mode = ssl.CERT_NONE
        opener = urllib.request.build_opener(authhandler, urllib.request.HTTPSHandler(context=context))
    urllib.request.install_opener(opener)
    res = urllib.request.urlopen(url_no_usrpwd)
    return res.read().decode('utf-8')


def csv_to_df(csv_txt, **kwargs):
    '''
    @param csv_txt: text of csv rows.
    @param kwargs: to pass to pd.read_csv
    @return df
    '''
    import pandas as pd
    try: 
        from StringIO import StringIO #python2.7
    except:  
        from io import StringIO #python3.x. 
    buf = None
    df = None
    try:
        buf = StringIO(csv_txt)
        df = pd.read_csv(buf, **kwargs)
    finally:
        if buf:
            try:
                buf.close()
            except:
                pass
    return df

    
url_with_uname = 'https://pandasusr:pandaspwd@pandastest.mooo.com:5000/aaa.csv'
csv_txt = get_https_basic_auth_ignore_invalid_cert( url_with_uname, verify_ssl=False) 
df = csv_to_df( csv_txt.strip() ) 
print(df.to_string(index=False))

gfyoung · 2017-07-07T17:50:16Z

@skynss : Awesome! Thanks for doing this (I can check this later today). Now that we have something that acts as a workaround, I think the next step is seeing whether you can incorporate parts of this into the existing codebase. Want to give that a shot?

skynss · 2017-07-07T21:03:28Z

@gfyoung I am in midst of travel. Feel free to check in incorporate, i wont get chance to look until couple of days.

skynss · 2017-07-08T21:38:05Z

The following is updated code which makes it easier to merge into
pandas/io/commons.py and call _urlopen with new params
I am not sure how to capture verify_ssl=False user input from read_csv to
allow user to bypass SSL cert verification.

import sys
is_py3 = sys.version_info[0] >= 3 # replace with 'compat.PY3'

## BEGIN SECTION modifications to pandas/io/common.py
import ssl
import base64


if is_py3:
    from urllib.parse import urlparse as parse_url
    from urllib.request import urlopen, build_opener, install_opener, \
            HTTPPasswordMgrWithDefaultRealm, HTTPBasicAuthHandler, HTTPSHandler
    _urlopen = urlopen  
else:
    from urlparse import urlparse as parse_url
    from urllib2 import urlopen as _urlopen
    from urllib2 import Request
    from contextlib import contextmanager, closing  # noqa
    from functools import wraps  # noqa

    # @wraps(_urlopen)
    @contextmanager
    def urlopen(*args, **kwargs):
        with closing(_urlopen(*args, **kwargs)) as f:
            yield f


def split_uname_from_url(url_with_uname):
    o = parse_url( url_with_uname)
    usrch = '{}:{}@{}'.format( o.username, o.password, o.hostname)
    url_no_usrpwd = url_with_uname.replace( usrch , o.hostname)
    return o.username, o.password, url_no_usrpwd

def get_urlopen_args( url_with_uname, verify_ssl=True):
    uname, pwd, url_no_usrpwd = split_uname_from_url(url_with_uname)
    print('Calling [{}] -- uname:[{}] -- pwd[{}]'.format(url_no_usrpwd, uname, pwd))
    if is_py3:
        fn= get_urlopen_args_py3
    else:
        fn = get_urlopen_args_py2
    req, kwargs = fn( uname, pwd, url_no_usrpwd, verify_ssl=verify_ssl)
    return req, kwargs

def get_urlopen_args_py2( uname, pwd, url_no_usrpwd, verify_ssl=True):
    req = Request( url_no_usrpwd )
    base64string = base64.encodestring('{}:{}'.format(uname, pwd)).replace('\n', '')
    req.add_header("Authorization", "Basic {}".format( base64string) )   
    # I hope pandas can support self signed certs too 
    kwargs = {}
    if not verify_ssl:
        kwargs['context'] = ssl._create_unverified_context()
    return req, kwargs

def get_urlopen_args_py3( uname, pwd, url_no_usrpwd, verify_ssl=True):
    passman = HTTPPasswordMgrWithDefaultRealm()
    passman.add_password(None, url_no_usrpwd, uname, pwd)
    authhandler = HTTPBasicAuthHandler(passman)
    if verify_ssl:
        opener = build_opener(authhandler)
    else:
        context = ssl.create_default_context()
        context.check_hostname = False
        context.verify_mode = ssl.CERT_NONE
        opener = build_opener(authhandler, HTTPSHandler(context=context))
    install_opener(opener)
    return url_no_usrpwd, {}

## END SECTION modifications to pandas/io/common.py

def call_urlopen( url_with_uname, verify_ssl=False):
    # in get_filepath_or_buffer prior to calling _urlopen get params
    # not sure where to obtain verify_ssl from pd.read_csv 
    req, kwargs = get_urlopen_args(url_with_uname, verify_ssl) 
    resp = _urlopen(req , **kwargs)
    return resp_to_csv(resp)

    
def resp_to_csv(resp):
    csv = resp.read()
    if is_py3:
        csv = csv.decode('utf-8')
    return csv

    
def csv_to_df(csv_txt, **kwargs):
    '''
    @param csv_txt: text of csv rows.
    @param kwargs: to pass to pd.read_csv
    @return df
    '''
    import pandas as pd
    try: 
        from StringIO import StringIO #python2.7
    except:  
        from io import StringIO #python3.x. 
    buf = None
    df = None
    try:
        buf = StringIO(csv_txt)
        df = pd.read_csv(buf, **kwargs)
    finally:
        if buf:
            try:
                buf.close()
            except:
                pass
    return df

url_with_uname = 'https://pandasusr:pandaspwd@pandastest.mooo.com:5000/aaa.csv'
csv_txt = call_urlopen( url_with_uname, verify_ssl=False) 
df = csv_to_df( csv_txt.strip() ) 
print(df.to_string(index=False))

I am not quiet familiar with process involved around contributing to pandas - so please feel free to take over.

jreback · 2017-07-08T21:39:33Z

http://pandas.pydata.org/pandas-docs/stable/contributing.html

skynss · 2017-07-13T03:19:16Z

@jreback Thx. Followed it. @gfyoung I forked and modified the codebase with modifications to the best of my ability. here it is: https://github.com/skynss/pandas/tree/basic-auth-https-self-signed

I don't know how to check in live test scenario so I am going to leave that out.
Can one of you folks help verify I am ready? Here is verification that tests read_csv, read_json, read_excel and read_html
It supports having username and password either contained within a url or passed separately as params to read_csv.. etc functions. Usually username, pwd will be passed as parameter.. so I decided to add that support.

# pip install --upgrade https://github.com/skynss/pandas/archive/basic-auth-https-self-signed.zip
# live working test that tests both scenarios:
# pd.read_csv('https://uname:pwd@fqdn:<port>/fname.csv', verify_ssl=False)
# pd.read_csv('https://fqdn:<port>/fname.csv', username='uname', password='pwd', verify_ssl=False)

import pandas as pd

uname='pandasusr'
pwd='pandaspwd'
url = 'https://{}pandastest.mooo.com:5000/'
verify_ssl=False

def get_df(url, uname, pwd, verify_ssl, pd_read_fn, fname):
  furl = url + fname
  kwargs = {}
  if uname:
    kwargs['username']=uname
  if pwd:
    kwargs['password']=pwd
  if verify_ssl is not None:
    kwargs['verify_ssl']=verify_ssl
  print('\n' +furl)
  df = pd_read_fn(furl, **kwargs)
  if type(df) is list: # html
    df = df[0]
  print(df.to_string(index=False))
  print(df.to_json())

fparams = [ (pd.read_csv,   'aaa.csv'),
             (pd.read_json,  'jdoc.json'),
             (pd.read_excel, 'ex_doc.xlsx'),
             (pd.read_html,  'html_file.html') ]

for pd_read_fn, fname in fparams:
  u = url.format('{}:{}@'.format(uname, pwd))
  get_df( u, None, None, verify_ssl, pd_read_fn, fname) #1 url with username/pwd as part of url
  u2 = url.format('')
  get_df( u2, uname, pwd, verify_ssl, pd_read_fn, fname) # url with username/pwd  as params

gfyoung · 2017-07-13T03:38:57Z

@skynss : Thanks for doing this! A couple of points:

To truly verify that your code works, I imagine that we would need to set up our own endpoint with credentials that we are comfortable sharing since it will only be storing (hopefully) non-sensitive data.
It's unfortunate that you have to duplicate this logic of username, password, and SSL verification across all of these methods. I wonder if there might be an easier way to abstract this similar to what requests does by just accepting tuple instead of separate parameters.
You will definitely need to condense your get_urlopen_args functions. Having three functions for one thing will not work for any reviewer. Try coming up with ways to streamline your code and look for places where you truly need to differentiate between Python 2 and 3.

One place to examine: why can't you use the Request class in Python 3? It still exists under urllib.Request. Whether you looked into it or not, an explanation in this issue would be useful.

skynss · 2017-07-13T05:36:56Z

@gfyoung Thx - implemented the changes. Please view. I had tried the 'Request' from py3 already and it didnt work. But I kept the py3 code the way it is because it seems extensible and correct way long term. I changed auth to match requests lib, and kept verify_ssl separate just like requests.
Here is updated test code that uses the auth pair

import pandas as pd

uname='pandasusr'
pwd='pandaspwd'
url = 'https://{}pandastest.mooo.com:5000/'
verify_ssl=False

def get_df(url, uname, pwd, verify_ssl, pd_read_fn, fname):
  furl = url + fname
  kwargs = {}
  if uname or pwd:
    kwargs['auth']=(uname, pwd)
  if verify_ssl is not None:
    kwargs['verify_ssl']=verify_ssl
  print('\n' +furl)
  df = pd_read_fn(furl, **kwargs)
  if type(df) is list: # html
    df = df[0]
  print(df.to_string(index=False))
  print(df.to_json())

fparams = [ (pd.read_csv, 'aaa.csv'), (pd.read_json, 'jdoc.json'), (pd.read_excel, 'ex_doc.xlsx'), (pd.read_html, 'html_file.html') ]

for pd_read_fn, fname in fparams:
  u = url.format('{}:{}@'.format(uname, pwd))
  get_df( u, None, None, verify_ssl, pd_read_fn, fname)
  u2 = url.format('')
  get_df( u2, uname, pwd, verify_ssl, pd_read_fn, fname)

gfyoung · 2017-07-13T05:55:31Z

@skynss : Cool! Thanks for the auth inputs. I think mirroring the requests library is a good idea because it should be more intuitive for people given that requests is pretty widely used. Nevertheless:

What was the reason for removing your test test_split_url_extract_uname_pwd ? I think you should be able to test that functionality.
I'm not convinced by your response regarding the Request class for two reasons:

Your comment in the code does not make mention it not working. You don't need to add this in the comments, but what happens when you used the Request object in Python 3? From what I understand, the change from Python 2 to 3 was largely a refactoring. Thus, I would be surprised if that implementation failed for Python 3.
I don't agree with your argument about the Python 3 implementation being more extensible. The main issue I have is the fact you have to install an opener as the global default via install_opener. Why do we need to change the user's Python environment to make this work? The Python 2 implementation is self-contained, and in the interests of modularity, that would be preferred.

Could you add your test code that you provide in this issue discussion as a file on your branch? It won't get merged, but as this is not a PR yet, you can just put it anywhere for now. 😄

skynss · 2017-07-13T09:44:16Z

@gfyoung

I was on the fence if the test was meaningful. I was trying to compact the code into 1 function. But added it back
you are right.. Modified. Much simpler and cleaner code now. I thought it didnt work initially with similar code earlier.. but it is now.
added test code to test_basic_auth_self_signed.py

TomAugspurger · 2017-07-13T12:00:45Z

Sorry I haven't been following this closely, but

I don't know how to check in live test scenario so I am going to leave that out.

We don't need to test this against a live server. We just need to ensure that we structure the request properly, and mock the actual GET request is made. Then we can verify that the right bits of information were in the get.

@skynss when you're able, could you submit a pull request? It'll make reviewing this easier.

gfyoung · 2017-07-13T14:56:11Z

We don't need to test this against a live server. We just need to ensure that we structure the request properly, and mock the actual GET request is made.

@TomAugspurger : Do we mock any requests in our current tests? I think we should still test that it actually works against some endpoint. That's the best way to confirm that it works and is not just a special case for @skynss IMO. We could just decorate such a test with @tm.network.

gfyoung · 2017-07-13T14:58:41Z

@skynss when you're able, could you submit a pull request? It'll make reviewing this easier.

@skynss : I second @TomAugspurger on this now. The code looks in a lot better shape. We'll definitely make more changes to it, but it's at a stage where I think we can review it now. A couple of things you will need to do:

Remove the DO NOT MERGE file you added. Just keep that somewhere out of your branch (e.g. on your Desktop) for future use.
One of the major comments is going to be documentation. If you could, please write docstrings similar to the ones you'll find in the codebase with a one-line description, listing of parameters, description of the returned object, and any explicit errors raised if any.

TomAugspurger · 2017-07-13T16:03:16Z

Do we mock any requests in our current tests?

At least one in

pandas/pandas/tests/io/formats/test_printing.py

Line 125 in 9d13227

class TestTableSchemaRepr(object):

for mocking IPython.

gfyoung · 2017-07-13T16:11:07Z

@TomAugspurger : I looked at the code. How does it actually use the mock library?

TomAugspurger · 2017-07-13T16:15:26Z

@TomAugspurger : I looked at the code. How does it actually use the mock library?

Huh, it kind of looks like it's not actually used anymore :)

gfyoung · 2017-07-13T16:17:47Z

So my question still stands:

Do we mock any requests in our current tests?

A quick GitHub search suggests not AFAICT.

skynss · 2017-07-13T19:58:22Z

@TomAugspurger just created a pull request
@gfyoung Done

ron819 · 2018-08-06T08:33:50Z

There is also same request for read_json
#10526

mroeschke · 2023-03-29T18:57:38Z

Since #36688 has been addressed which should address this issue so closing

TomAugspurger added Enhancement IO CSV read_csv, to_csv Effort Low labels Jun 19, 2017

gfyoung mentioned this issue Jul 13, 2017

MAINT: Remove unused mock import #16908

Merged

skynss mentioned this issue Jul 13, 2017

BUG, ENH: Read Data From Password-Protected URL's and allow self signed SSL certs #16910

Closed

4 tasks

skynss mentioned this issue Jul 19, 2017

BUG: read_csv fails some http servers if port number is specified #17019

Closed

skynss mentioned this issue Jul 26, 2017

Use python-requests library, if installed, to support broad http(s) scenarios including Basic Authentication #17087

Closed

4 tasks

jorisvandenbossche mentioned this issue Sep 21, 2017

Add basic support for remote resources to read_file geopandas/geopandas#531

Merged

jreback added this to the Next Major Release milestone Apr 11, 2018

ocefpaf mentioned this issue Jun 15, 2018

wrap urlopen with requests #21504

Closed

4 tasks

ocefpaf mentioned this issue Oct 9, 2019

use requests when it is installed #28874

Closed

4 tasks

jbrockmendel removed Difficulty Intermediate labels Oct 21, 2019

jbrockmendel added the IO Network Local or Cloud (AWS, GCS, etc.) IO Issues label Dec 11, 2019

KristinaRiemer mentioned this issue Oct 16, 2020

Convert one python notebook to script genophenoenvo/terraref-datasets#109

Closed

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

mroeschke closed this as completed Mar 29, 2023

read_csv from HTTPs + basic-auth + custom port throws an error (urlopen error) #16716

read_csv from HTTPs + basic-auth + custom port throws an error (urlopen error) #16716

Comments

skynss commented Jun 17, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

gfyoung commented Jun 18, 2017 • edited Loading

skynss commented Jun 18, 2017

gfyoung commented Jun 18, 2017 • edited Loading

skynss commented Jun 19, 2017

TomAugspurger commented Jun 19, 2017

gfyoung commented Jun 19, 2017 • edited Loading

TomAugspurger commented Jun 19, 2017 • edited by jorisvandenbossche Loading

gfyoung commented Jun 19, 2017

TomAugspurger commented Jun 19, 2017

gfyoung commented Jun 19, 2017

skynss commented Jun 19, 2017

skynss commented Jun 30, 2017

gfyoung commented Jun 30, 2017

gfyoung commented Jul 1, 2017

skynss commented Jul 2, 2017

gfyoung commented Jul 2, 2017

skynss commented Jul 2, 2017 • edited Loading

gfyoung commented Jul 3, 2017

skynss commented Jul 3, 2017 • edited Loading

gfyoung commented Jul 7, 2017

skynss commented Jul 7, 2017

gfyoung commented Jul 7, 2017 • edited Loading

skynss commented Jul 7, 2017

skynss commented Jul 8, 2017

jreback commented Jul 8, 2017

skynss commented Jul 13, 2017

gfyoung commented Jul 13, 2017 • edited Loading

skynss commented Jul 13, 2017

gfyoung commented Jul 13, 2017 • edited Loading

skynss commented Jul 13, 2017

TomAugspurger commented Jul 13, 2017

gfyoung commented Jul 13, 2017

gfyoung commented Jul 13, 2017

TomAugspurger commented Jul 13, 2017

gfyoung commented Jul 13, 2017

TomAugspurger commented Jul 13, 2017

gfyoung commented Jul 13, 2017

skynss commented Jul 13, 2017

ron819 commented Aug 6, 2018

mroeschke commented Mar 29, 2023

Output of `pd.show_versions()`

gfyoung commented Jun 18, 2017 •

edited

Loading

gfyoung commented Jun 18, 2017 •

edited

Loading

gfyoung commented Jun 19, 2017 •

edited

Loading

TomAugspurger commented Jun 19, 2017 •

edited by jorisvandenbossche

Loading

skynss commented Jul 2, 2017 •

edited

Loading

skynss commented Jul 3, 2017 •

edited

Loading

gfyoung commented Jul 7, 2017 •

edited

Loading

gfyoung commented Jul 13, 2017 •

edited

Loading

gfyoung commented Jul 13, 2017 •

edited

Loading