Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame with PeriodIndex causes KeyError on get_value #15268

Closed
devanl opened this issue Jan 30, 2017 · 5 comments
Closed

DataFrame with PeriodIndex causes KeyError on get_value #15268

devanl opened this issue Jan 30, 2017 · 5 comments
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type Usage Question

Comments

@devanl
Copy link

devanl commented Jan 30, 2017

Code Sample, a copy-pastable example if possible

from pandas import to_datetime, period_range, DataFrame
import pandas as pd

print(pd.__version__)

start_of_time = to_datetime('2016-10-17 01:16:39.133000')
end_of_time = to_datetime('2017-01-04 23:58:37.905000')
avs_date_range = period_range(start_of_time, end_of_time, freq='D')

bins = DataFrame(dict(foo=[0] * len(avs_date_range), bar=[0] * len(avs_date_range)),
                 index=avs_date_range)

current = range(10)

for idx, bin in bins.iterrows():
    for i in range(6):
        bins.set_value(idx, 'foo', bin['foo'] + 1)

    f_count = bins.get_value(idx, 'foo')
    bins.set_value(idx, 'bar', len(current) - f_count)

print(bins)

Problem description

If I comment out the setting of index this works as expected with PeriodIndex defined this creates KeyError.

Output of pd.show_versions()

0.19.1 Traceback (most recent call last): File "pandas\index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas\index.c:4289) File "pandas\src\hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:8534) TypeError: an integer is required

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm 5.0.1\helpers\pydev\pydevd.py", line 2403, in
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files (x86)\JetBrains\PyCharm 5.0.1\helpers\pydev\pydevd.py", line 1794, in run
launch(file, globals, locals) # execute the script
File "C:\Program Files (x86)\JetBrains\PyCharm 5.0.1\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/dlippman/src/ETDataView/test.py", line 19, in
f_count = bins.get_value(idx, 'foo')
File "C:\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 1900, in get_value
return engine.get_value(series.get_values(), index)
File "pandas\index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas\index.c:3567)
File "pandas\index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas\index.c:3250)
File "pandas\index.pyx", line 163, in pandas.index.IndexEngine.get_loc (pandas\index.c:4373)
KeyError: Period('2016-10-17', 'D')

@jreback
Copy link
Contributor

jreback commented Jan 30, 2017

.set_value is a fairly raw low-level non-public interface. Use .loc.

Futher what you are doing is quite non-performant, iterating over the rows is not recommended.

In [15]: bins
Out[15]: 
            bar  foo
2016-10-17    0    0
2016-10-18    0    0
2016-10-19    0    0
2016-10-20    0    0
2016-10-21    0    0
...         ...  ...
2016-12-31    0    0
2017-01-01    0    0
2017-01-02    0    0
2017-01-03    0    0
2017-01-04    0    0

[80 rows x 2 columns]

In [16]: bins.loc[bins.index[-1], 'foo'] = 1

In [17]: bins
Out[17]: 
            bar  foo
2016-10-17    0    0
2016-10-18    0    0
2016-10-19    0    0
2016-10-20    0    0
2016-10-21    0    0
...         ...  ...
2016-12-31    0    0
2017-01-01    0    0
2017-01-02    0    0
2017-01-03    0    0
2017-01-04    0    1

[80 rows x 2 columns]

@jreback jreback closed this as completed Jan 30, 2017
@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type Usage Question labels Jan 30, 2017
@jreback jreback added this to the won't fix milestone Jan 30, 2017
@jorisvandenbossche
Copy link
Member

It is get_value that raises the error, not set_value.

I agree that you don't need to use this method in the current example, but still, it is a public, documented method that in this case totally fails to do what is documented it should. According to the docstring it takes row and column labels, which fails:

In [155]: bins.get_value(bins.index[0], bins.columns[0])
...
KeyError: Period('2016-10-17', 'D')

Shouldn't we just fix this? Or update the documentation to discourage its usage? (or both)

@devanl
Copy link
Author

devanl commented Jan 30, 2017

Probably not the place but, could you please explain a better way to populate each of he columns based on conditional analysis of external time stamped data being counted for each of the periods in the PeriodIndex?

@jorisvandenbossche
Copy link
Member

@devanl you can better ask on StackOverflow (and be sure to give there a reproducible example with a clear expected result, as this is currently not fully clear to me)

@jreback
Copy link
Contributor

jreback commented Jan 30, 2017

These are effectively internal method and should actually be deprecated. I thought we did this quite a while back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type Usage Question
Projects
None yet
Development

No branches or pull requests

4 participants