Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError when inserting uncertainties ufloat object into a DataFrame #20993

Closed
KevinStrobel opened this issue May 9, 2018 · 1 comment
Closed
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Usage Question

Comments

@KevinStrobel
Copy link

Code Sample

from uncertainties import ufloat, UFloat
import pandas as pd

def test(name, key, value):
    dframe = pd.DataFrame()
    dframe.loc[name, key] = value
    return dframe

test('000_osci', 'laser_%', ufloat(99, 4))

Problem description

Dear pandas developers,

I encountered a problem when inserting ufloat objects (from uncertainties) into pandas DataFrames.
This problem occurred when upgrading from pandas 0.20.3 to 0.22.0.

It seems that these objects are not recognised as valid to be inserted into the DataFrame.
This problem occurred to me, running this code on my Ubuntu 17.10 system (pandas 0.20.3)
and on a separate workstation running Debian Buster (pandas 0.22.0).
Details are shown below.

Downgrading pandas on the workstation is not an option.
So either a fix in a new version of pandas or any help, for how to successfully import pandas from a local folder would be amazing.

Best regards,
Kevin

Output for pandas 0.20.3

INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.13.0-39-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: de_DE.UTF-8

pandas: 0.20.3
pytest: 3.2.1
pip: 10.0.1
setuptools: 38.6.0
Cython: 0.26.1
numpy: 1.14.0
scipy: 0.19.1
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.4.1
dateutil: 2.7.0
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.0
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: None
pandas_gbq: None
pandas_datareader: None

In [2]: test('000_osci', 'laser_%', ufloat(99, 4))
Out[2]: 
         laser_%
000_osci  99+/-4

# Upgrading on the Ubuntu system led to the same issue as shown below

Output for pandas 0.22.0

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.candidate.1 python-bits: 64 OS: Linux OS-release: 4.16.0-1-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.3.2
pip: 9.0.1
setuptools: 39.0.1
Cython: 0.26.1
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: 1.7.4
patsy: 0.4.1+dev
dateutil: 2.6.1
pytz: 2018.4
blosc: None
bottleneck: None
tables: 3.4.3
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.2.5
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

In [4]: test('000_osci', 'laser_%', ufloat(99, 4))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-a04eaba2194c> in <module>()
----> 1 test('000_osci', 'laser_%', ufloat(99, 4))

<ipython-input-1-9598b1bd3e67> in test(name, key, value)
      1 def test(name, key, value):
      2     dframe = pd.DataFrame()
----> 3     dframe.loc[name, key] = value
      4     return dframe

/usr/lib/python3/dist-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    192             key = com._apply_if_callable(key, self.obj)
    193         indexer = self._get_setitem_indexer(key)
--> 194         self._setitem_with_indexer(indexer, value)
    195 
    196     def _has_valid_type(self, k, axis):

/usr/lib/python3/dist-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
    359                         new_indexer = convert_from_missing_indexer_tuple(
    360                             indexer, self.obj.axes)
--> 361                         self._setitem_with_indexer(new_indexer, value)
    362 
    363                         return self.obj

/usr/lib/python3/dist-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
    309                 val = list(value.values()) if isinstance(value,
    310                                                          dict) else value
--> 311                 take_split_path = not blk._can_hold_element(val)
    312 
    313         if isinstance(indexer, tuple) and len(indexer) == len(self.obj.axes):

/usr/lib/python3/dist-packages/pandas/core/internals.py in _can_hold_element(self, element)
   1836         tipo = maybe_infer_dtype_type(element)
   1837         if tipo is not None:
-> 1838             return (issubclass(tipo.type, (np.floating, np.integer)) and
   1839                     not issubclass(tipo.type, (np.datetime64, np.timedelta64)))
   1840         return (

TypeError: issubclass() arg 1 must be a class
@jreback
Copy link
Contributor

jreback commented May 10, 2018

The uncertainty looks almost but not quite like a float, in this place it is testing whether it is actually a float scalar. I would say the interface to uncertainty is slightly odd here as its a function that is not duck-like enough. (compare this to say a decimal.Decimal which actually would be coerced to a float). So is pandas wrong here, well its trying its best to deal with things.

ipdb> l
   2036 
   2037 class FloatBlock(FloatOrComplexBlock):
   2038     __slots__ = ()
   2039     is_float = True
   2040 
   2041     def _can_hold_element(self, element):
   2042         tipo = maybe_infer_dtype_type(element)
   2043         if tipo is not None:
-> 2044             return (issubclass(tipo.type, (np.floating, np.integer)) and
   2045                     not issubclass(tipo.type, (np.datetime64, np.timedelta64)))
   2046         return (

ipdb> p tipo.type
<function AffineScalarFunc.dtype.<lambda> at 0x1120b0158>

In any event, you need to hold these in object dtypes columns.

In [1]: from uncertainties import ufloat, UFloat
   ...: import pandas as pd
   ...: 
   ...: def test(name, key, value):
   ...:     dframe = pd.DataFrame()
   ...:     dframe[key] = pd.Series([value], [name], dtype=object)
   ...:     return dframe
   ...: 
   ...: test('000_osci', 'laser_%', ufloat(99, 4))
   ...: 
Out[1]: 
         laser_%
000_osci  99+/-4

better yet would be to write and ExtensionArray to have first class support in pandas.

@jreback jreback closed this as completed May 10, 2018
@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Usage Question ExtensionArray Extending pandas with custom dtypes or arrays. labels May 10, 2018
@jreback jreback added this to the won't fix milestone May 10, 2018
@TomAugspurger TomAugspurger modified the milestones: won't fix, No action Jul 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Usage Question
Projects
None yet
Development

No branches or pull requests

3 participants