BUG ?: method .at[idx, "XXX"] generates InvalidIndexError in 1.4.0 or 1.4.1 but not in 1.3.5 #46036

thomas-lacroix · 2022-02-17T15:18:40Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

This bug happens with my real life data, however I am sorry but I am unable to reproduce this bug with mock data that model my real life data. The following code works fine :

import pandas as pd
print(pd.__version__)
df = pd.DataFrame([[1, 1, 129, 'WP_158394508.1', '+', 132906, 134099, 397, '-', 'Tyrosine integrase', 'Tyrosine integrase', 'Tyrosine integrase', 'WP_011835230', 357.0, 326.0, 25.46, 5.270e-15, 69.3, 72.29, 88.8, 'Integrase', 'ICE', 'Tn5252', 'pRS01', '-', '-', '-', '-', '-', 'validated', 'no', 'Tyrosine integrase', 'Tyrosine integrase', 'Phage_integrase', 172.0, 4.600e-40, 3.300e-40, 123.3, 0.6, 123.8, 0.6, 98.84, 43.58, 'yes']], index=[0], columns=['hit_blast', 'hit_HMM', 'CDS_num', 'CDS', 'CDS_strand', 'CDS_start', 'CDS_end', 'CDS_length', 'Is_pseudo', 'CDS_Protein_type', 'CDS_Protein_type_blast', 'Blast_description', 'Query_blast', 'Query_blast_length', 'Ali_length', 'Ali_Identity_perc', 'E-value_blast', 'Bitscore_blast', 'CDS_coverage_blast', 'Query_blast_coverage', 'Query_blast_Protein_type', 'Associated_element_type', 'ICE_superfamily', 'ICE_family', 'IME_family', 'Relaxase_family_domain', 'Relaxase_family_MOB', 'Coupling_type', 'False_positives', 'SP_blast_validation', 'Use_annotation', 'Profile_Protein_type', 'Profile_description', 'Profile_name', 'Profile_length', 'i-Evalue_hmm', 'E-value_hmm', 'Score_hmm', 'Bias_hmm', 'Global_score', 'Global_bias', 'HMM_coverage', 'CDS_coverage_hmm', 'Possible_SP'])
idx = 0
df.at[idx, "False_positives"] = "-"

Issue Description

Upgrading to pandas version 1.4.0 or 1.4.1 causes a call to the method .at[idx, "XXX"] to generate an InvalidIndexError :

Traceback (most recent call last):
File "XXX.py", line XXX, in XXX
data.at[idx, "False_positives"] = "-"
File "lib/python3.9/site-packages/pandas/core/indexing.py", line 2274, in setitem
return super().setitem(key, value)
File "/python3.9/site-packages/pandas/core/indexing.py", line 2229, in setitem
self.obj._set_value(*key, value=value, takeable=self._takeable)
File "/python3.9/site-packages/pandas/core/frame.py", line 3869, in _set_value
loc = self.index.get_loc(index)
File "/python3.9/site-packages/pandas/core/indexes/range.py", line 388, in get_loc
self._check_indexing_error(key)
File "/python3.9/site-packages/pandas/core/indexes/base.py", line 5637, in _check_indexing_error
raise InvalidIndexError(key)
pandas.errors.InvalidIndexError: Int64Index([0], dtype='int64')

This error does not occur with pandas version 1.3.5. Can you help figure this out ?

Expected Behavior

No exception should be raised.

Installed Versions

import pandas as pd
pd.show_versions()

INSTALLED VERSIONS

commit : 06d2301
python : 3.9.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.13.0-27-generic
Version : #29~20.04.1-Ubuntu SMP Fri Jan 14 00:32:30 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.1
numpy : 1.22.2
pytz : 2021.3
dateutil : 2.8.2
pip : 22.0.3
setuptools : 59.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
zstandard : None

The text was updated successfully, but these errors were encountered:

phofl · 2022-02-18T10:08:59Z

Hi,

thanks for your report. I am sorry, but we can't really help you, if you can not provide a reproducible example.

thomas-lacroix · 2022-02-18T10:46:57Z

I understand, I am not even sure if my issue is due to a bug or if I need to adapt my code somehow. I probably can not make a reproducible short snippet because my understanding of Pandas is not expert. If you are interested I could give you a bunch of command lines that will reproduce the issue, but it will not just be a python snippet and it will download and execute our tool.
I think it is not normal that a line of code that didn't produce any warning with version 1.3.5 produce an error with version 1.4.0. The only change I do to trigger the error is conda install -c conda-forge pandas=1.4.0, if I revert to conda install -c conda-forge pandas=1.3.5 then the error goes away on the same dateset. Looking at the list of changes for 1.4.0, I couldn't find anything that I need to change in my code related to the .at method. I was hoping to find someone with knowledge of the changes between those 2 versions of Pandas and that can understand what is going on by looking at the error stack. But I think you are right, maybe Stack-overflow is more appropriate for that.

phofl · 2022-02-18T14:31:58Z

This is hard to judge without knowing the data. Based on the content of a DataFrame the expected behavior might be different. Feel free to ping if you are able to create an example. Otherwise maybe stackoverflow might help as you suggested

adamzev · 2022-03-21T20:28:17Z

We are having the same issue.

Here is a reproducible example:

import pandas as pd
data_df = pd.DataFrame(data={'name':['a','a', 'b','a'], 'combine_id': [None, None, None, None]})
target_id_dataframe = pd.DataFrame(data={'index':[0,1, 3], 'other_col':[1,2,3]})
data_df.at[target_id_dataframe['index'], "combine_id"]= 7

The code throws an error because the combine_id column already exists. There's no error if it doesn't exist yet.

The example completes without an error in 1.3.5 but throws the error in 1.4.1.

Here's the traceback

InvalidIndexError                         Traceback (most recent call last)
Input In [19], in <cell line: 1>()
----> 1 data_df.at[target_id_dataframe['index'], "combine_id"]= 7

File ~/.local/lib/python3.9/site-packages/pandas/core/indexing.py:2274, in _AtIndexer.__setitem__(self, key, value)
   2271     self.obj.loc[key] = value
   2272     return
-> 2274 return super().__setitem__(key, value)

File ~/.local/lib/python3.9/site-packages/pandas/core/indexing.py:2229, in _ScalarAccessIndexer.__setitem__(self, key, value)
   2226 if len(key) != self.ndim:
   2227     raise ValueError("Not enough indexers for scalar access (setting)!")
-> 2229 self.obj._set_value(*key, value=value, takeable=self._takeable)

File ~/.local/lib/python3.9/site-packages/pandas/core/frame.py:3869, in DataFrame._set_value(self, index, col, value, takeable)
   3867 else:
   3868     series = self._get_item_cache(col)
-> 3869     loc = self.index.get_loc(index)
   3871 # setitem_inplace will do validation that may raise TypeError
   3872 #  or ValueError
   3873 series._mgr.setitem_inplace(loc, value)

File ~/.local/lib/python3.9/site-packages/pandas/core/indexes/range.py:388, in RangeIndex.get_loc(self, key, method, tolerance)
    386         except ValueError as err:
    387             raise KeyError(key) from err
--> 388     self._check_indexing_error(key)
    389     raise KeyError(key)
    390 return super().get_loc(key, method=method, tolerance=tolerance)

File ~/.local/lib/python3.9/site-packages/pandas/core/indexes/base.py:5637, in Index._check_indexing_error(self, key)
   5633 def _check_indexing_error(self, key):
   5634     if not is_scalar(key):
   5635         # if key is not a scalar, directly raise an error (the code below
   5636         # would convert to numpy arrays and raise later any way) - GH29926
-> 5637         raise InvalidIndexError(key)

Update:
Switching from .at to .loc fixes our issue so perhaps this is an old bug that was fixed since we had been using .at to update multiple values but shouldn't have been?

Looking at pandas/core/indexes/range.py:388 in get_loc
_check_indexing_error throws an InvalidIndexError which isn't handled. If a key error was allowed to be thrown (I'm not sure if that was the old behavior), it would be handled and passed off to an .loc. (

pandas/pandas/core/frame.py

Line 3906 in 6033ed4

self.loc[index, col] = value

)

Enterprise-D · 2022-07-14T23:26:14Z

I have this problem, too. Indices from .index will trigger the same error when using df.at[indices,...]. After rolling back from 1.4.2 to 1.3.5, the problem disappeared.

phofl · 2022-07-15T00:16:47Z

Can you give as a reproducible example?

Enterprise-D · 2022-07-15T02:09:49Z

import pandas as pd
full_table_file = '../Data/protein_domain_data/cddid.tbl'
min_domain_length = 30

df = pd.read_csv(full_table_file, sep='\t', header=None, index_col=0)
print(df.shape)
###Filter tiny little families to make life easier
df = df[df[4]>min_domain_length]
print(df.shape)
df.head()

###Identify the domains related to the following search terms
case_insensitive_search_terms = ['integrase', 'excisionase', 'recombinase',
'transposase', 'lysogen', 'temperate']
case_sensitive_search_terms = ['parA|ParA|parB|ParB']

for search_term in case_insensitive_search_terms:
indices = df[df[3].str.contains(search_term, case=False)==True].index
df[search_term] = 0
df.at[indices, search_term] = 1

for search_term in case_sensitive_search_terms:
indices = df[df[3].str.contains(search_term, case=True)==True].index
df[search_term] = 0
df.at[indices, search_term] = 1

python 3.8
pandas 1.4.2
macOS 12.4 on M1 Pro

you will also need the table file https://www.icloud.com/iclouddrive/02cY4kezLwkVj5mV620oXpj0g#cddid

phofl · 2022-07-18T00:53:06Z

@adamzev at is only meant for single values, so no guarantees on multiple values.

@Enterprise-J: We need a minimal and reproducible example, see https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

thomas-lacroix · 2022-07-18T10:45:57Z

From what I understand for pandas version 1.4.0 and up: the .at method will fail to update an index list of a single value and throw an InvalidIndexError. Switching to the .loc method for index list of size 1 or more should work. See answer from Mark Greenwood at https://stackoverflow.com/questions/71293357/upgrading-to-pandas-version-1-4-0-or-1-4-1-causes-a-call-to-the-method-atidx/71545633?noredirect=1#comment126506658_71545633

…dev/pandas#46036)

thomas-lacroix added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 17, 2022

phofl added Indexing Related to indexing on series/frames, not to indexes themselves Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 18, 2022

jreback added this to the No action milestone Feb 18, 2022

jreback closed this as completed Feb 18, 2022

pierremillard added a commit to NMRTeamTBI/MultiNMRFit that referenced this issue Jan 4, 2023

switch from .at to .loc to update parameters dataframe (issue pandas-…

0a69746

…dev/pandas#46036)

pierremillard added a commit to NMRTeamTBI/MultiNMRFit that referenced this issue Jan 4, 2023

switch from .at to .loc to update parameters dataframe (issue pandas-…

5289bd4

…dev/pandas#46036)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG ?: method .at[idx, "XXX"] generates InvalidIndexError in 1.4.0 or 1.4.1 but not in 1.3.5 #46036

BUG ?: method .at[idx, "XXX"] generates InvalidIndexError in 1.4.0 or 1.4.1 but not in 1.3.5 #46036

thomas-lacroix commented Feb 17, 2022

phofl commented Feb 18, 2022

thomas-lacroix commented Feb 18, 2022

phofl commented Feb 18, 2022

adamzev commented Mar 21, 2022 •

edited

Loading

Enterprise-D commented Jul 14, 2022 •

edited

Loading

phofl commented Jul 15, 2022

Enterprise-D commented Jul 15, 2022

phofl commented Jul 18, 2022

thomas-lacroix commented Jul 18, 2022

BUG ?: method .at[idx, "XXX"] generates InvalidIndexError in 1.4.0 or 1.4.1 but not in 1.3.5 #46036

BUG ?: method .at[idx, "XXX"] generates InvalidIndexError in 1.4.0 or 1.4.1 but not in 1.3.5 #46036

Comments

thomas-lacroix commented Feb 17, 2022

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

phofl commented Feb 18, 2022

thomas-lacroix commented Feb 18, 2022

phofl commented Feb 18, 2022

adamzev commented Mar 21, 2022 • edited Loading

Enterprise-D commented Jul 14, 2022 • edited Loading

phofl commented Jul 15, 2022

Enterprise-D commented Jul 15, 2022

phofl commented Jul 18, 2022

thomas-lacroix commented Jul 18, 2022

adamzev commented Mar 21, 2022 •

edited

Loading

Enterprise-D commented Jul 14, 2022 •

edited

Loading