Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAC csv parsing error #1401

Closed
bsipocz opened this issue Apr 3, 2019 · 2 comments · Fixed by #2423
Closed

OAC csv parsing error #1401

bsipocz opened this issue Apr 3, 2019 · 2 comments · Fixed by #2423

Comments

@bsipocz
Copy link
Member

bsipocz commented Apr 3, 2019

The cause of this issue is the HTML tag in the alias filed, that doesn't parse properly when using the csv data format.

Suggested workaround: remote data_format kwarg as it shouldn't matter for the end user what server output format we parse internally.

Cross-reference: astrocatalogs/OACAPI#11

==============

There are a few new(ish) test failures with OAC: https://travis-ci.org/astropy/astroquery/jobs/512564435#L4631

______________________ TestOACClass.test_query_object_csv ______________________
self = <astroquery.oac.tests.test_oac_remote.TestOACClass object at 0x7f8ef4fdfb00>
    def test_query_object_csv(self):
>       phot = OAC.query_object(event='SN2014J')
astroquery/oac/tests/test_oac_remote.py:32: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
astroquery/utils/class_or_instance.py:25: in f
    return self.fn(obj, *args, **kwds)
astroquery/utils/process_asyncs.py:29: in newmethod
    result = self._parse_result(response, verbose=verbose)
astroquery/oac/core.py:487: in _parse_result
    output_response = self._format_output(raw_output)
astroquery/oac/core.py:451: in _format_output
    output = json.loads(raw_output)
/home/travis/miniconda/envs/test/lib/python3.7/json/__init__.py:348: in loads
    return _default_decoder.decode(s)
/home/travis/miniconda/envs/test/lib/python3.7/json/decoder.py:337: in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <json.decoder.JSONDecoder object at 0x7f8f03188fd0>
s = 'event,escapevelocity,stellarclass,lumdist,ra,references,maxappmag,hostdec,host,instruments,sources,redshift,hostra,ma...y et al.,Steve J. Fossey",,2014/01/21,300,,,"+69:40:25.9,+69:40:26.00,+69:40:26.0",,e,-18.4,,0.1358,SN2014J,"Ia,Ia-HV"'
idx = 0
    def raw_decode(self, s, idx=0):
        """Decode a JSON document from ``s`` (a ``str`` beginning with
        a JSON document) and return a 2-tuple of the Python
        representation and the index in ``s`` where the document ended.
    
        This can be used to decode a JSON document from a string that may
        have extraneous data at the end.
    
        """
        try:
            obj, end = self.scan_once(s, idx)
        except StopIteration as err:
>           raise JSONDecodeError("Expecting value", s, err.value) from None
E           json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
/home/travis/miniconda/envs/test/lib/python3.7/json/decoder.py:355: JSONDecodeError
----------------------------- Captured stdout call -----------------------------
The API did not return a valid CSV output! 
Outputing JSON-compliant dictionary instead.
____________________ TestOACClass.test_query_region_box_csv ____________________
self = <astroquery.oac.tests.test_oac_remote.TestOACClass object at 0x7f8ef400cb38>
    def test_query_region_box_csv(self):
        phot = OAC.query_region(coordinates=self.test_coords,
                                width=self.test_width,
>                               height=self.test_height)
astroquery/oac/tests/test_oac_remote.py:53: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
astroquery/utils/class_or_instance.py:25: in f
    return self.fn(obj, *args, **kwds)
astroquery/utils/process_asyncs.py:29: in newmethod
    result = self._parse_result(response, verbose=verbose)
astroquery/oac/core.py:487: in _parse_result
    output_response = self._format_output(raw_output)
astroquery/oac/core.py:451: in _format_output
    output = json.loads(raw_output)
/home/travis/miniconda/envs/test/lib/python3.7/json/__init__.py:348: in loads
    return _default_decoder.decode(s)
/home/travis/miniconda/envs/test/lib/python3.7/json/decoder.py:337: in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <json.decoder.JSONDecoder object at 0x7f8f03188fd0>
s = 'event,escapevelocity,stellarclass,lumdist,ra,references,maxappmag,hostdec,host,instruments,sources,redshift,hostra,ma...y et al.,Steve J. Fossey",,2014/01/21,300,,,"+69:40:25.9,+69:40:26.00,+69:40:26.0",,e,-18.4,,0.1358,SN2014J,"Ia,Ia-HV"'
idx = 0
    def raw_decode(self, s, idx=0):
        """Decode a JSON document from ``s`` (a ``str`` beginning with
        a JSON document) and return a 2-tuple of the Python
        representation and the index in ``s`` where the document ended.
    
        This can be used to decode a JSON document from a string that may
        have extraneous data at the end.
    
        """
        try:
            obj, end = self.scan_once(s, idx)
        except StopIteration as err:
>           raise JSONDecodeError("Expecting value", s, err.value) from None
E           json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
/home/travis/miniconda/envs/test/lib/python3.7/json/decoder.py:355: JSONDecodeError
----------------------------- Captured stdout call -----------------------------
The API did not return a valid CSV output! 
Outputing JSON-compliant dictionary instead.
@bsipocz
Copy link
Member Author

bsipocz commented Apr 3, 2019

cc @guillochon

@pllim
Copy link
Member

pllim commented Mar 6, 2020

I came across this while investigating #1672. Here is more info:

import csv
import json
from astroquery.oac import OAC as self

request_payload = OAC._args_to_payload('SN2014J', None, None, None, 'csv')
response = self._request('GET', self.URL, data=json.dumps(request_payload), timeout=self.TIMEOUT, headers=self.HEADERS)
raw_output = response.text
split_output = raw_output.splitlines()
columns = list(csv.reader([split_output[0]], delimiter=',', quotechar='"'))[0]
rows = split_output[1:]
test_row = list(csv.reader([rows[0]], delimiter=',', quotechar='"'))[0]
>>> len(columns)
38
>>> len(test_row)
39
>>> raw_output
'event,download,escapevelocity,lumdist,claimedtype,redshift,instruments,hostdec,discoverer,stellarclass,spectraltype,galactocentricvelocity,spectralink,maxappmag,dec,catalog,ebv,maxdate,propermotiondec,photolink,velocity,maxabsmag,propermotionra,name,hostoffsetdist,masses,xraylink,host,ra,boundprobability,hostoffsetang,hostra,alias,sources,color,discoverdate,radiolink,references\nSN2014J,e,,2.998,"Ia,Ia-HV","0.000677,0.000739,0.000841","IRAC (I1, I2), UVOT (W2, W1, M2, U, B, V), VLA, CLEAR, U, B, g, g\', V, r, r\', R, i, i\', I, z, z\'",+69:40:47,"Fossey et al.,Steve J. Fossey",,,,"78,-8.49,351",8.98,"+69:40:25.9,+69:40:26.0",sne,0.1358,2014/01/31,,"1482,-15.1,442",300,-18.4,,SN2014J,0.8813,,,"NGC 3034,M82","09:55:42.12,09:55:42.14",,55.63,09:55:52,"SN2014J,PSN J09554214+6940260,<a id="PSNJ09554214+6940260">PSN J09554214+6940260</a>,iPTF14jj",,,2014/01/21,0,"2012PASP..124..668Y,2015arXiv151006596G,2014AJ....148....1Z,2018PASP..130f4101V,2016MNRAS.457.1000S"'
>>> columns                                                                                                                                                                                                                                                                                                            
['event',
 'download',
 'escapevelocity',
 'lumdist',
 'claimedtype',
 'redshift',
 'instruments',
 'hostdec',
 'discoverer',
 'stellarclass',
 'spectraltype',
 'galactocentricvelocity',
 'spectralink',
 'maxappmag',
 'dec',
 'catalog',
 'ebv',
 'maxdate',
 'propermotiondec',
 'photolink',
 'velocity',
 'maxabsmag',
 'propermotionra',
 'name',
 'hostoffsetdist',
 'masses',
 'xraylink',
 'host',
 'ra',
 'boundprobability',
 'hostoffsetang',
 'hostra',
 'alias',
 'sources',
 'color',
 'discoverdate',
 'radiolink',
 'references']
>>> rows                                                                                                                                                                                                                                                                                                               
['SN2014J,e,,2.998,"Ia,Ia-HV","0.000677,0.000739,0.000841","IRAC (I1, I2), UVOT (W2, W1, M2, U, B, V), VLA, CLEAR, U, B, g, g\', V, r, r\', R, i, i\', I, z, z\'",+69:40:47,"Fossey et al.,Steve J. Fossey",,,,"78,-8.49,351",8.98,"+69:40:25.9,+69:40:26.0",sne,0.1358,2014/01/31,,"1482,-15.1,442",300,-18.4,,SN2014J,0.8813,,,"NGC 3034,M82","09:55:42.12,09:55:42.14",,55.63,09:55:52,"SN2014J,PSN J09554214+6940260,<a id="PSNJ09554214+6940260">PSN J09554214+6940260</a>,iPTF14jj",,,2014/01/21,0,"2012PASP..124..668Y,2015arXiv151006596G,2014AJ....148....1Z,2018PASP..130f4101V,2016MNRAS.457.1000S"']

The output is obviously CSV, not JSON, but because of the len mismatch, you get this error. Hope this helps.

@bsipocz bsipocz changed the title OAC test errors with JSONDecodeError OAC csv parsing error Jun 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants