Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv from HTTPs + basic-auth + custom port throws an error (urlopen error) #16716

Closed
skynss opened this issue Jun 17, 2017 · 40 comments
Closed
Labels
Enhancement IO CSV read_csv, to_csv IO Network Local or Cloud (AWS, GCS, etc.) IO Issues

Comments

@skynss
Copy link

skynss commented Jun 17, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.read_csv('https://user-name:pwd@my.fqdn.com:8080/get_content.csv')
# fails with urlopen error [Errno 11003] getaddrinfo failed Python2.7 as well as Python 3.x

Problem description

HTTPS basic auth is very common. This URL format works in Excel, other text editors, etc. This url works in requests library. It seems like the scenario doesnt work because the underlying urlopen doesnt work

# I think it fails because the underlying code here fails in pandas/io/common.py:
from urllib2 import urlopen as _urlopen
b = _urlopen('https://un-name:pwd@my.fqdn.com:8080/get_content.csv')
# fails with urlopen error [Errno 11003] getaddrinfo failed

see stackoverflow issue urllib basic auth
Only way to overcome this is using requests + StringIO ?

Expected Output

be able to get a CSV loaded dataframe

Output of pd.show_versions()

python: 2.7.13.final.0 python-bits: 64 pandas: 0.20.1
@gfyoung
Copy link
Member

gfyoung commented Jun 18, 2017

HTTPS basic auth is very common

Apparently not with pandas 😉

But joking aside, that's an awkward problem to have (can't replicate since I have no endpoint against which to test this). Also, this is not easy to test without publicly providing credentials to an endpoint.

I read the SO post, and I was wondering: can you confirm that requests was the only way for you to get the request to work? I read a bunch of solutions in that discussion, so I just wanted to check.

If that is the case, I suppose we could implement a wrapper that makes the request and then returns the response content in place of urlopen, though doing it in a clean way will be the challenge. Not to mention, we would need to explicitly add requests as a Python 2.x dependency.

@skynss
Copy link
Author

skynss commented Jun 18, 2017

With requests and StringIO ot works well. I am travelling today but over next 1-2 days i can post a reproducable test url or simple reprocode. I suspect it repros with http and no real auth.. because i suspect this is simply a parsing error in hostname and port even before outbound call initiates. Can confirm in next 1-2 days when i am back unless you beat me to it

@gfyoung
Copy link
Member

gfyoung commented Jun 18, 2017

Yeah if you could provide repro code that would be useful since it is hard to reproduce without our own endpoint. However, if you could confirm none of the other solutions proposed in the SO work besides the requests ones, that would also be useful.

@skynss
Copy link
Author

skynss commented Jun 19, 2017

It appears to me that an existing url is not needed, because the issue is just in the parsing of the URL. eg:

df = pd.read_csv('http://username:pwd@cnn.com:8080/get_content.csv')
# returns URLError: <urlopen error [Errno 11003] getaddrinfo failed>
df = pd.read_csv('http://username:pwd@cnn.com/get_content.csv')
# returns InvalidURL: nonnumeric port: 'pwd@cnn.com' ( failing to find port number)

However, simplest way to fake it is:
Create a text file: aaa.csv and put some csv contents

with open('/path1/aaa.csv', 'w') as f:
  f.write( 'animal,bird\ncat,pigeon\nmonkey,swan')

cd /path1
run

python -m SimpleHTTPServer 8080

Now the url http://localhost:8080/aaa.csv should exist

import requests
import pandas as pd
u1 = 'http://localhost:8080/aaa.csv'
print( requests.get(u1).text) # as expected prints contents of aaa.csv
df = pd.read_csv(u1)  # as expected, loads contents of aaa.csv into df

u2 = 'http://uname:pwd@localhost:8080/aaa.csv'
print( requests.get(u2).text) # as expected prints contents of aaa.csv (ignores unnecessary uname pwd)
df = pd.read_csv(u2)  # URLError: <urlopen error [Errno 11003] getaddrinfo failed>

I'll provide a working positive test case url later tomorrow once I set it up. It would be good to engineer a fix to allow self-signed certs for testing too - eg: requests has verify=False. Helpful for testing.

@TomAugspurger
Copy link
Contributor

So since we can't use requests, we would follow https://stackoverflow.com/a/4188709/1889400

  1. detect and extract the username / password in the url
  2. setup a urllib.Request context for the (stripped) url in
    if _is_url(filepath_or_buffer):
  3. encode the user / password and add to the Authorization header
  4. urlopen the request context

@gfyoung
Copy link
Member

gfyoung commented Jun 19, 2017

@TomAugspurger :

  1. Why can't we use requests ?

  2. Does this work with the localhost repro example that @skynss provided?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 19, 2017

Why can't we use requests ?

This alone isn't worth adding it as a dependency I think.

@gfyoung
Copy link
Member

gfyoung commented Jun 19, 2017

This alone isn't worth adding it as a dependency I think.

That's very different from "we can't use requests"

For starters, it's builtin with Python 3.x, so we would only be adding it as a dependency for Python 2.x. However, if we could find a way to use just urlopen, that would be nice though.

@TomAugspurger
Copy link
Contributor

Does this work with the localhost repro example that @skynss provided?

Can't really test that since I'd have to write the implementation first :) I think it'll work though.

For starters, it's builtin with Python 3.x,

I don't think it is... It's documented as the recommended way for making high-level http calls though (but still requires a separate install).

@gfyoung
Copy link
Member

gfyoung commented Jun 19, 2017

I don't think it is... It's documented as the recommended way for making high-level http calls though (but still requires a separate install).

Oh right, I stand corrected. 😄

@skynss
Copy link
Author

skynss commented Jun 19, 2017

Here are working urls using self signed certs. Would be good to get it to work with self-signed certs too as that can be very useful especially in testing. Please let me know once you are done testing.. so I can shut down the demo because thats all it is up for.

import pandas as pd
import requests
from io import StringIO

# both urls use self signed cert. Both will remain working for few days
u1 = 'https://pandasusr:pandaspwd@pandastest.mooo.com:5000/aaa.csv' # non default ssl port 
u2 = 'https://pandasusr:pandaspwd@pandastest.mooo.com/aaa.csv' # default ssl port


r1 = requests.get(u1, verify=False)
print(r1.text) # prints ok
df1 = pd.read_csv(StringIO(r1.text))  # works

r2 = requests.get(u1, verify=False)
print(r2.text) # prints ok
df2 = pd.read_csv(StringIO(r2.text)) # works

# without requests
df1 = pd.read_csv(u1) # URLError: <urlopen error [Errno 11003] getaddrinfo failed>

df2 = pd.read_csv(u2)  # InvalidURL: nonnumeric port: 'pandaspwd@pandastest.mooo.com'

@skynss
Copy link
Author

skynss commented Jun 30, 2017

Is anyone going to use the live endpoints referred above (and below) to repro and test? If not, I will shut down the server in the next 4 days. I havent seen anyone attempt to use it over past 11 days.


u1 = 'https://pandasusr:pandaspwd@pandastest.mooo.com:5000/aaa.csv' # non default ssl port 
u2 = 'https://pandasusr:pandaspwd@pandastest.mooo.com/aaa.csv' # default ssl port

@gfyoung
Copy link
Member

gfyoung commented Jun 30, 2017

@skynss : Sorry about that! I imagine that outside work has caught up with a bunch of us (including myself). I'll see if I can look at it later today.

@gfyoung
Copy link
Member

gfyoung commented Jul 1, 2017

@skynss : Can replicate the issues you were experiencing.

@skynss
Copy link
Author

skynss commented Jul 2, 2017

Please run the source code I pasted above.. and I can replicate the issues I am experiencing.

@gfyoung
Copy link
Member

gfyoung commented Jul 2, 2017

Please run the source code I pasted above.. and I can replicate the issues I am experiencing.

Did you read the comment I made above here?

@skynss
Copy link
Author

skynss commented Jul 2, 2017

I am not sure which comment you are referring to.. as the link doesnt work. But trying my best to answer..

I read the SO post, and I was wondering: can you confirm that requests was the only way for you to get the request to work? I read a bunch of solutions in that discussion, so I just wanted to check

The only way I know that worked for me is 1) get txt content 2) load it in StringIO 3) give the StringIO buffer to pandas to read. For step 1) to get text content, I imagine any method would work. I used requests and that worked. And the repro code above follows that step. And I just verified that if I copy paste the repro code, it replicates the issue.

If I didnt answer your question, kindly re-state your question.

@gfyoung
Copy link
Member

gfyoung commented Jul 3, 2017

@skynss : Not sure why the link doesn't work. It's just a URL to an earlier comment I made. However, I'm wondering if you can access those files by passing into urlopen authentication as @TomAugspurger mentioned here.

A solution that doesn't use requests would be to do that instead, so that works for you, we could go that way.

@skynss
Copy link
Author

skynss commented Jul 3, 2017

The following working code does not depend on requests library. Just copy / paste and run to verify.

import urllib2, base64, ssl
from urlparse import urlparse
#from io import StringIO # python 3.x
from StringIO import StringIO
import pandas as pd

def split_uname_from_url(url_with_uname):
	o = urlparse( url_with_uname)
	uname = o.username
	pwd = o.password
	# create url without username and pwd
        usrch = '{}:{}@{}'.format( uname, pwd, o.hostname)
	url_no_usrpwd = url_with_uname.replace( usrch , o.hostname)
	return uname, pwd, url_no_usrpwd

def get_https_basic_auth_ignore_invalid_cert( url_with_uname, verify_ssl=True):
	uname, pwd, url_no_usrpwd = split_uname_from_url(url_with_uname)
	print('Calling [{}] -- uname:[{}] -- pwd[{}]'.format(url_no_usrpwd, uname, pwd))
	request = urllib2.Request( url_no_usrpwd )
	base64string = base64.encodestring('%s:%s' % (uname, pwd)).replace('\n', '')
	request.add_header("Authorization", "Basic %s" % base64string)   
	# I hope pandas can support self signed certs too 
        # because it is very difficult to get official SSL certs in testing scenarios
	if verify_ssl:
		result = urllib2.urlopen(request) 
	else: # in case of self signed SSL certificates. 
		result = urllib2.urlopen(request, context=ssl._create_unverified_context() )
	txt = result.read()
	return txt


url_with_uname = 'https://pandasusr:pandaspwd@pandastest.mooo.com:5000/aaa.csv'
csv_txt = get_https_basic_auth_ignore_invalid_cert( url_with_uname, verify_ssl=False) 
df = pd.read_csv( StringIO(csv_txt.strip()) ) # forgot to close the StringIO buffer

@gfyoung
Copy link
Member

gfyoung commented Jul 7, 2017

@skynss @gfyoung : This works, but it's very Python 2-oriented and would need to be generalized for Python 3. I think if you can do that, that would make things a lot easier to incorporate.

@skynss
Copy link
Author

skynss commented Jul 7, 2017

The following code:

  • Works on py 2.x py 3.x (verified on 3.6.x)
  • Is live working demo that you can instantly test by simply copy and paste.
  • does not depend on requests library
  • works with both valid signed SSL certs as well as allow bypass of verification for self-signed SSL certs
import sys
import ssl

def split_uname_from_url(url_with_uname):
    try:
        from urlparse import urlparse
    except:
        from urllib.parse import urlparse
    o = urlparse( url_with_uname)
    uname = o.username
    pwd = o.password
    # create url without username and pwd   
    usrch = '{}:{}@{}'.format( uname, pwd, o.hostname)
    url_no_usrpwd = url_with_uname.replace( usrch , o.hostname)
    return uname, pwd, url_no_usrpwd

def get_https_basic_auth_ignore_invalid_cert( url_with_uname, verify_ssl=True):
    uname, pwd, url_no_usrpwd = split_uname_from_url(url_with_uname)
    print('Calling [{}] -- uname:[{}] -- pwd[{}]'.format(url_no_usrpwd, uname, pwd))
    if sys.version_info[0] < 3:
        fn= get_py2_https_basic_auth_ignore_invalid_cert
    else:
        fn = get_py3_https_basic_auth_ignore_invalid_cert
    return fn( uname, pwd, url_no_usrpwd, verify_ssl=verify_ssl)

def get_py2_https_basic_auth_ignore_invalid_cert( uname, pwd, url_no_usrpwd, verify_ssl=True):
    import urllib2, base64
    request = urllib2.Request( url_no_usrpwd )
    base64string = base64.encodestring('%s:%s' % (uname, pwd)).replace('\n', '')
    request.add_header("Authorization", "Basic %s" % base64string)   
    # I hope pandas can support self signed certs too 
    if verify_ssl:
        result = urllib2.urlopen(request) 
    else: # in case of self signed SSL certificates. 
        result = urllib2.urlopen(request, context=ssl._create_unverified_context() )
    return result.read()

def get_py3_https_basic_auth_ignore_invalid_cert( uname, pwd, url_no_usrpwd, verify_ssl=True):
    import urllib.request
    passman = urllib.request.HTTPPasswordMgrWithDefaultRealm()
    passman.add_password(None, url_no_usrpwd, uname, pwd)
    authhandler = urllib.request.HTTPBasicAuthHandler(passman)
    if verify_ssl:
        opener = urllib.request.build_opener(authhandler)
    else:
        context = ssl.create_default_context()
        context.check_hostname = False
        context.verify_mode = ssl.CERT_NONE
        opener = urllib.request.build_opener(authhandler, urllib.request.HTTPSHandler(context=context))
    urllib.request.install_opener(opener)
    res = urllib.request.urlopen(url_no_usrpwd)
    return res.read().decode('utf-8')


def csv_to_df(csv_txt, **kwargs):
    '''
    @param csv_txt: text of csv rows.
    @param kwargs: to pass to pd.read_csv
    @return df
    '''
    import pandas as pd
    try: 
        from StringIO import StringIO #python2.7
    except:  
        from io import StringIO #python3.x. 
    buf = None
    df = None
    try:
        buf = StringIO(csv_txt)
        df = pd.read_csv(buf, **kwargs)
    finally:
        if buf:
            try:
                buf.close()
            except:
                pass
    return df

    
url_with_uname = 'https://pandasusr:pandaspwd@pandastest.mooo.com:5000/aaa.csv'
csv_txt = get_https_basic_auth_ignore_invalid_cert( url_with_uname, verify_ssl=False) 
df = csv_to_df( csv_txt.strip() ) 
print(df.to_string(index=False))

@gfyoung
Copy link
Member

gfyoung commented Jul 7, 2017

@skynss : Awesome! Thanks for doing this (I can check this later today). Now that we have something that acts as a workaround, I think the next step is seeing whether you can incorporate parts of this into the existing codebase. Want to give that a shot?

@skynss
Copy link
Author

skynss commented Jul 7, 2017

@gfyoung I am in midst of travel. Feel free to check in incorporate, i wont get chance to look until couple of days.

@skynss
Copy link
Author

skynss commented Jul 8, 2017

The following is updated code which makes it easier to merge into
pandas/io/commons.py and call _urlopen with new params
I am not sure how to capture verify_ssl=False user input from read_csv to
allow user to bypass SSL cert verification.

import sys
is_py3 = sys.version_info[0] >= 3 # replace with 'compat.PY3'

## BEGIN SECTION modifications to pandas/io/common.py
import ssl
import base64


if is_py3:
    from urllib.parse import urlparse as parse_url
    from urllib.request import urlopen, build_opener, install_opener, \
            HTTPPasswordMgrWithDefaultRealm, HTTPBasicAuthHandler, HTTPSHandler
    _urlopen = urlopen  
else:
    from urlparse import urlparse as parse_url
    from urllib2 import urlopen as _urlopen
    from urllib2 import Request
    from contextlib import contextmanager, closing  # noqa
    from functools import wraps  # noqa

    # @wraps(_urlopen)
    @contextmanager
    def urlopen(*args, **kwargs):
        with closing(_urlopen(*args, **kwargs)) as f:
            yield f


def split_uname_from_url(url_with_uname):
    o = parse_url( url_with_uname)
    usrch = '{}:{}@{}'.format( o.username, o.password, o.hostname)
    url_no_usrpwd = url_with_uname.replace( usrch , o.hostname)
    return o.username, o.password, url_no_usrpwd

def get_urlopen_args( url_with_uname, verify_ssl=True):
    uname, pwd, url_no_usrpwd = split_uname_from_url(url_with_uname)
    print('Calling [{}] -- uname:[{}] -- pwd[{}]'.format(url_no_usrpwd, uname, pwd))
    if is_py3:
        fn= get_urlopen_args_py3
    else:
        fn = get_urlopen_args_py2
    req, kwargs = fn( uname, pwd, url_no_usrpwd, verify_ssl=verify_ssl)
    return req, kwargs

def get_urlopen_args_py2( uname, pwd, url_no_usrpwd, verify_ssl=True):
    req = Request( url_no_usrpwd )
    base64string = base64.encodestring('{}:{}'.format(uname, pwd)).replace('\n', '')
    req.add_header("Authorization", "Basic {}".format( base64string) )   
    # I hope pandas can support self signed certs too 
    kwargs = {}
    if not verify_ssl:
        kwargs['context'] = ssl._create_unverified_context()
    return req, kwargs

def get_urlopen_args_py3( uname, pwd, url_no_usrpwd, verify_ssl=True):
    passman = HTTPPasswordMgrWithDefaultRealm()
    passman.add_password(None, url_no_usrpwd, uname, pwd)
    authhandler = HTTPBasicAuthHandler(passman)
    if verify_ssl:
        opener = build_opener(authhandler)
    else:
        context = ssl.create_default_context()
        context.check_hostname = False
        context.verify_mode = ssl.CERT_NONE
        opener = build_opener(authhandler, HTTPSHandler(context=context))
    install_opener(opener)
    return url_no_usrpwd, {}

## END SECTION modifications to pandas/io/common.py

def call_urlopen( url_with_uname, verify_ssl=False):
    # in get_filepath_or_buffer prior to calling _urlopen get params
    # not sure where to obtain verify_ssl from pd.read_csv 
    req, kwargs = get_urlopen_args(url_with_uname, verify_ssl) 
    resp = _urlopen(req , **kwargs)
    return resp_to_csv(resp)

    
def resp_to_csv(resp):
    csv = resp.read()
    if is_py3:
        csv = csv.decode('utf-8')
    return csv

    
def csv_to_df(csv_txt, **kwargs):
    '''
    @param csv_txt: text of csv rows.
    @param kwargs: to pass to pd.read_csv
    @return df
    '''
    import pandas as pd
    try: 
        from StringIO import StringIO #python2.7
    except:  
        from io import StringIO #python3.x. 
    buf = None
    df = None
    try:
        buf = StringIO(csv_txt)
        df = pd.read_csv(buf, **kwargs)
    finally:
        if buf:
            try:
                buf.close()
            except:
                pass
    return df

url_with_uname = 'https://pandasusr:pandaspwd@pandastest.mooo.com:5000/aaa.csv'
csv_txt = call_urlopen( url_with_uname, verify_ssl=False) 
df = csv_to_df( csv_txt.strip() ) 
print(df.to_string(index=False))

I am not quiet familiar with process involved around contributing to pandas - so please feel free to take over.

@jreback
Copy link
Contributor

jreback commented Jul 8, 2017

@skynss
Copy link
Author

skynss commented Jul 13, 2017

@jreback Thx. Followed it. @gfyoung I forked and modified the codebase with modifications to the best of my ability. here it is: https://github.com/skynss/pandas/tree/basic-auth-https-self-signed

I don't know how to check in live test scenario so I am going to leave that out.
Can one of you folks help verify I am ready? Here is verification that tests read_csv, read_json, read_excel and read_html
It supports having username and password either contained within a url or passed separately as params to read_csv.. etc functions. Usually username, pwd will be passed as parameter.. so I decided to add that support.

# pip install --upgrade https://github.com/skynss/pandas/archive/basic-auth-https-self-signed.zip
# live working test that tests both scenarios:
# pd.read_csv('https://uname:pwd@fqdn:<port>/fname.csv', verify_ssl=False)
# pd.read_csv('https://fqdn:<port>/fname.csv', username='uname', password='pwd', verify_ssl=False)

import pandas as pd

uname='pandasusr'
pwd='pandaspwd'
url = 'https://{}pandastest.mooo.com:5000/'
verify_ssl=False

def get_df(url, uname, pwd, verify_ssl, pd_read_fn, fname):
  furl = url + fname
  kwargs = {}
  if uname:
    kwargs['username']=uname
  if pwd:
    kwargs['password']=pwd
  if verify_ssl is not None:
    kwargs['verify_ssl']=verify_ssl
  print('\n' +furl)
  df = pd_read_fn(furl, **kwargs)
  if type(df) is list: # html
    df = df[0]
  print(df.to_string(index=False))
  print(df.to_json())

fparams = [ (pd.read_csv,   'aaa.csv'),
             (pd.read_json,  'jdoc.json'),
             (pd.read_excel, 'ex_doc.xlsx'),
             (pd.read_html,  'html_file.html') ]

for pd_read_fn, fname in fparams:
  u = url.format('{}:{}@'.format(uname, pwd))
  get_df( u, None, None, verify_ssl, pd_read_fn, fname) #1 url with username/pwd as part of url
  u2 = url.format('')
  get_df( u2, uname, pwd, verify_ssl, pd_read_fn, fname) # url with username/pwd  as params

@gfyoung
Copy link
Member

gfyoung commented Jul 13, 2017

@skynss : Thanks for doing this! A couple of points:

  1. To truly verify that your code works, I imagine that we would need to set up our own endpoint with credentials that we are comfortable sharing since it will only be storing (hopefully) non-sensitive data.

  2. It's unfortunate that you have to duplicate this logic of username, password, and SSL verification across all of these methods. I wonder if there might be an easier way to abstract this similar to what requests does by just accepting tuple instead of separate parameters.

  3. You will definitely need to condense your get_urlopen_args functions. Having three functions for one thing will not work for any reviewer. Try coming up with ways to streamline your code and look for places where you truly need to differentiate between Python 2 and 3.

One place to examine: why can't you use the Request class in Python 3? It still exists under urllib.Request. Whether you looked into it or not, an explanation in this issue would be useful.

@skynss
Copy link
Author

skynss commented Jul 13, 2017

@gfyoung Thx - implemented the changes. Please view. I had tried the 'Request' from py3 already and it didnt work. But I kept the py3 code the way it is because it seems extensible and correct way long term. I changed auth to match requests lib, and kept verify_ssl separate just like requests.
Here is updated test code that uses the auth pair

import pandas as pd

uname='pandasusr'
pwd='pandaspwd'
url = 'https://{}pandastest.mooo.com:5000/'
verify_ssl=False

def get_df(url, uname, pwd, verify_ssl, pd_read_fn, fname):
  furl = url + fname
  kwargs = {}
  if uname or pwd:
    kwargs['auth']=(uname, pwd)
  if verify_ssl is not None:
    kwargs['verify_ssl']=verify_ssl
  print('\n' +furl)
  df = pd_read_fn(furl, **kwargs)
  if type(df) is list: # html
    df = df[0]
  print(df.to_string(index=False))
  print(df.to_json())

fparams = [ (pd.read_csv, 'aaa.csv'), (pd.read_json, 'jdoc.json'), (pd.read_excel, 'ex_doc.xlsx'), (pd.read_html, 'html_file.html') ]

for pd_read_fn, fname in fparams:
  u = url.format('{}:{}@'.format(uname, pwd))
  get_df( u, None, None, verify_ssl, pd_read_fn, fname)
  u2 = url.format('')
  get_df( u2, uname, pwd, verify_ssl, pd_read_fn, fname)

@gfyoung
Copy link
Member

gfyoung commented Jul 13, 2017

@skynss : Cool! Thanks for the auth inputs. I think mirroring the requests library is a good idea because it should be more intuitive for people given that requests is pretty widely used. Nevertheless:

  1. What was the reason for removing your test test_split_url_extract_uname_pwd ? I think you should be able to test that functionality.

  2. I'm not convinced by your response regarding the Request class for two reasons:

  • Your comment in the code does not make mention it not working. You don't need to add this in the comments, but what happens when you used the Request object in Python 3? From what I understand, the change from Python 2 to 3 was largely a refactoring. Thus, I would be surprised if that implementation failed for Python 3.

  • I don't agree with your argument about the Python 3 implementation being more extensible. The main issue I have is the fact you have to install an opener as the global default via install_opener. Why do we need to change the user's Python environment to make this work? The Python 2 implementation is self-contained, and in the interests of modularity, that would be preferred.

  1. Could you add your test code that you provide in this issue discussion as a file on your branch? It won't get merged, but as this is not a PR yet, you can just put it anywhere for now. 😄

@skynss
Copy link
Author

skynss commented Jul 13, 2017

@gfyoung

  1. I was on the fence if the test was meaningful. I was trying to compact the code into 1 function. But added it back
  2. you are right.. Modified. Much simpler and cleaner code now. I thought it didnt work initially with similar code earlier.. but it is now.
  3. added test code to test_basic_auth_self_signed.py

@TomAugspurger
Copy link
Contributor

Sorry I haven't been following this closely, but

I don't know how to check in live test scenario so I am going to leave that out.

We don't need to test this against a live server. We just need to ensure that we structure the request properly, and mock the actual GET request is made. Then we can verify that the right bits of information were in the get.

@skynss when you're able, could you submit a pull request? It'll make reviewing this easier.

@gfyoung
Copy link
Member

gfyoung commented Jul 13, 2017

We don't need to test this against a live server. We just need to ensure that we structure the request properly, and mock the actual GET request is made.

@TomAugspurger : Do we mock any requests in our current tests? I think we should still test that it actually works against some endpoint. That's the best way to confirm that it works and is not just a special case for @skynss IMO. We could just decorate such a test with @tm.network.

@gfyoung
Copy link
Member

gfyoung commented Jul 13, 2017

@skynss when you're able, could you submit a pull request? It'll make reviewing this easier.

@skynss : I second @TomAugspurger on this now. The code looks in a lot better shape. We'll definitely make more changes to it, but it's at a stage where I think we can review it now. A couple of things you will need to do:

  1. Remove the DO NOT MERGE file you added. Just keep that somewhere out of your branch (e.g. on your Desktop) for future use.

  2. One of the major comments is going to be documentation. If you could, please write docstrings similar to the ones you'll find in the codebase with a one-line description, listing of parameters, description of the returned object, and any explicit errors raised if any.

@TomAugspurger
Copy link
Contributor

Do we mock any requests in our current tests?

At least one in

class TestTableSchemaRepr(object):
for mocking IPython.

@gfyoung
Copy link
Member

gfyoung commented Jul 13, 2017

@TomAugspurger : I looked at the code. How does it actually use the mock library?

@TomAugspurger
Copy link
Contributor

@TomAugspurger : I looked at the code. How does it actually use the mock library?

Huh, it kind of looks like it's not actually used anymore :)

@gfyoung
Copy link
Member

gfyoung commented Jul 13, 2017

So my question still stands:

Do we mock any requests in our current tests?

A quick GitHub search suggests not AFAICT.

@skynss
Copy link
Author

skynss commented Jul 13, 2017

@TomAugspurger just created a pull request
@gfyoung Done

@ron819
Copy link

ron819 commented Aug 6, 2018

There is also same request for read_json
#10526

@jbrockmendel jbrockmendel added the IO Network Local or Cloud (AWS, GCS, etc.) IO Issues label Dec 11, 2019
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@mroeschke
Copy link
Member

Since #36688 has been addressed which should address this issue so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO CSV read_csv, to_csv IO Network Local or Cloud (AWS, GCS, etc.) IO Issues
Projects
None yet
7 participants