BUG: DataFrame.shift with axis=1 shifts object columns to the next column with object dtype #26929

nchrisr · 2019-06-18T17:32:55Z

Code Sample

import pandas as pd
import StringIO

# Store the csv string in a variable 
gps_string = """
"2010-01-12 18:00:00","$GPGGA","180439","7249.2150","N","11754.4238","W","2","10","0.9","-8.1","M","-12.4","M","","*57","","",""
"2010-01-12 17:30:00","$GPGGA","173439","7249.2160","N","11754.4233","W","2","11","0.8","-4.5","M","-12.4","M","","*5B","","",""
"2010-01-12 17:00:00","$GPGGA","170439","7249.2152","N","11754.4235","W","2","11","0.8","-3.1","M","-12.4","M","","*5C","","",""
"2010-01-12 16:30:00","N","11754.4210","W","2","09","1.1","-13.1","M","-12.4","M","","*6C","","","","","",""
"2010-01-12 16:00:00","N","11754.4229","W","2","10","0.9","-2.9","M","-12.4","M","","*53","","","","","",""
"2010-01-12 15:30:00","N","11754.4269","W","2","09","0.8","-4.3","M","-12.4","M","","*54","","","","","",""
"2010-01-12 15:00:00","N","11754.4267","W","2","10","0.8","-1.6","M","-12.4","M","","*56","","","","","",""
"2010-01-12 14:30:00","$GPGGA","143439","7249.2152","N","11754.4253","W","2","11","0.7","-4.3","M","-12.4","M","","*56","","",""
"2010-01-12 14:00:00","N","11754.4245","W","2","10","0.9","-7.0","M","-12.4","M","","*50","","","","","",""
"2010-01-12 13:30:00","$GPGGA","133439","7249.2134","N","11754.4243","W","2","11","0.7","-10.7","M","-12.4","M","","*61","","",""
"2010-01-12 13:00:00","N","11754.4245","W","2","10","0.8","-5.5","M","-12.4","M","","*56","","","","","",""
"2010-01-12 12:30:00","N","11754.4226","W","2","10","0.9","-7.1","M","-12.4","M","","*59","","","","","",""
"2010-01-12 12:00:00","N","11754.4238","W","2","10","0.8","-6.5","M","-12.4","M","","*51","","","","","",""
"2010-01-12 11:30:00","N","11754.4227","W","2","10","0.8","0.1","M","-12.4","M","","*73","","","","","",""
"2010-01-12 11:00:00","-7.4","M","-12.4","M","","*5F","","","","","","","","","","","",""
"2010-01-12 10:30:00","N","11754.4271","W","2","08","1.1","-8.4","M","-12.4","M","","*5A","","","","","","" """

# Read the csv string into a dataframe.
gps_df = pd.read_csv(StringIO.StringIO(gps_string), header=None, index_col=0)
rows_to_shift = gps_df[gps_df[15].isnull()].index
gps_df.loc[rows_to_shift] = gps_df.loc[rows_to_shift].shift(periods=1, axis=1)
# create this file to see the outcome
gps_df.to_csv("f.csv")

Problem description

I am trying to use the Dataframe. shift() function to move certain rows of data into their correct columns, and the Dataframe.shift() function is doing some weird things, it is creating an empty column of null(s) and moves one of the columns to the end of the dataframe.

Screenshots

Original data:

Output after execution of the code

As seen above, the data that was originally in column 10 has been move to column 15 for some reason. Also a column with null values has been created in column 6.

I expect that the data should be moved to the right by one step, that is the data in each column should move to the column to the left of it, and the current behaviour is confusing, based on the documentation of what this function should do.

My expected output:

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 2.7.14.final.0
python-bits: 32
OS: Windows

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-06-18T18:06:08Z

Haven't read through everything but is this example fully minimal? At a glance, it seems overly complex to demonstrate the issue. http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

nchrisr · 2019-06-18T18:37:39Z

@TomAugspurger I have minimized the example as much as possible, but it is a problem that requires detailed description, that is why it seems complex. The code generates a dataframe from a string and this is why it looks long, in reality the first 15 lines of the code are assigning a dataset to a variable, so it can easily be copied and pasted. I have looked through and I believe every piece of information on there is necessary to demonstrate and recreate the issue.

TomAugspurger · 2019-06-18T18:41:39Z

OK. Let us know if you figure anything out. I don't plan to look at this any more closely in the near term.

nchrisr · 2019-06-18T21:35:58Z

@TomAugspurger I Made some changes to the issue to make it more concise, I have been testing this out for the past couple of days and have not been able to find a solution to this issue.

mroeschke · 2019-06-18T22:50:33Z

Generally, it looks like shifting object columns will automatically shift to the next column that had an object dtype.

In [43]: df = pd.DataFrame([[2, 9.0, 'a'], ['M', 8.0, 'b']])

In [44]: df
Out[44]:
   0    1  2
0  2  9.0  a
1  M  8.0  b

In [45]: df.shift(1, axis=1)
Out[45]:
     0   1  2
0  NaN NaN  2
1  NaN NaN  M

nchrisr · 2019-06-19T15:18:53Z

I have found a work around for this problem while this bug is being fixed, just in case anyone had a similar issue.

I convert all the data in the fields to strings and then perform the shift, after which I convert the data back into a csv string, and then read this string into a dataframe again, in order to maintain their previous data types.

import pandas as pd
import StringIO

gps_string = """
"2010-01-12 18:00:00","$GPGGA","180439","7249.2150","N","11754.4238","W","2","10","0.9","-8.1","M","-12.4","M","","*57","","",""
"2010-01-12 17:30:00","$GPGGA","173439","7249.2160","N","11754.4233","W","2","11","0.8","-4.5","M","-12.4","M","","*5B","","",""
"2010-01-12 17:00:00","$GPGGA","170439","7249.2152","N","11754.4235","W","2","11","0.8","-3.1","M","-12.4","M","","*5C","","",""
"2010-01-12 16:30:00","N","11754.4210","W","2","09","1.1","-13.1","M","-12.4","M","","*6C","","","","","",""
"2010-01-12 16:00:00","N","11754.4229","W","2","10","0.9","-2.9","M","-12.4","M","","*53","","","","","",""
"2010-01-12 15:30:00","N","11754.4269","W","2","09","0.8","-4.3","M","-12.4","M","","*54","","","","","",""
"2010-01-12 15:00:00","N","11754.4267","W","2","10","0.8","-1.6","M","-12.4","M","","*56","","","","","",""
"2010-01-12 14:30:00","$GPGGA","143439","7249.2152","N","11754.4253","W","2","11","0.7","-4.3","M","-12.4","M","","*56","","",""
"2010-01-12 14:00:00","N","11754.4245","W","2","10","0.9","-7.0","M","-12.4","M","","*50","","","","","",""
"2010-01-12 13:30:00","$GPGGA","133439","7249.2134","N","11754.4243","W","2","11","0.7","-10.7","M","-12.4","M","","*61","","",""
"2010-01-12 13:00:00","N","11754.4245","W","2","10","0.8","-5.5","M","-12.4","M","","*56","","","","","",""
"2010-01-12 12:30:00","N","11754.4226","W","2","10","0.9","-7.1","M","-12.4","M","","*59","","","","","",""
"2010-01-12 12:00:00","N","11754.4238","W","2","10","0.8","-6.5","M","-12.4","M","","*51","","","","","",""
"2010-01-12 11:30:00","N","11754.4227","W","2","10","0.8","0.1","M","-12.4","M","","*73","","","","","",""
"2010-01-12 11:00:00","-7.4","M","-12.4","M","","*5F","","","","","","","","","","","",""
"2010-01-12 10:30:00","N","11754.4271","W","2","08","1.1","-8.4","M","-12.4","M","","*5A","","","","","",""

 """

gps_df = pd.read_csv(StringIO.StringIO(gps_string), header=None, index_col=0)
rows_to_shift = gps_df[gps_df[15].isnull()].index  # get the indexes to shift
gps_df_all_strings = gps_df.astype(str)  # Convert all the data to be of type str (string)

# Shift the data
gps_df_all_strings.loc[rows_to_shift] = gps_df_all_strings.loc[rows_to_shift].shift(periods=1, axis=1)
s = gps_df_all_strings.to_csv(header=None)  # Put shifted csv data into a string after shifting.
new_gps_df = pd.read_csv(StringIO.StringIO(s), header=None, index_col=0)  # re read csv data.

And0k · 2020-06-11T09:10:58Z

string dtype dataframe does not respond to axis

df = pd.DataFrame([['2', '3', '4'], ['K', 'L', 'M']], dtype='string')
df.shift(1, axis=1)
Out: 
      0     1     2
0  <NA>  <NA>  <NA>
1     2     3     4

example workaround (that works for any positive shifts shift_positive):

shift_positive = 1
df.iloc[:, shift_positive:] = df.slice_shift(shift_positive, axis=1)
df.iloc[:, :shift_positive] = pd.NA
df
Out: 
      0  1  2
0  <NA>  2  3
1  <NA>  K  L

jbrockmendel · 2020-09-26T22:58:41Z

closed by #35578

mroeschke added the Bug label Jun 18, 2019

mroeschke changed the title ~~Potential Bug with Dataframe.shift when used with Dataframe.loc~~ BUG: DataFrame.shift with axis=1 shifts object columns to the next column with object dtype Jun 18, 2019

TomAugspurger mentioned this issue Sep 17, 2019

SparseDataFrame.shift does not respond to axis #6427

Closed

mroeschke added the Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff label Jun 11, 2020

jbrockmendel added the Multi-Block Issues caused by the presence of multiple Blocks label Sep 21, 2020

jbrockmendel closed this as completed Sep 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame.shift with axis=1 shifts object columns to the next column with object dtype #26929

BUG: DataFrame.shift with axis=1 shifts object columns to the next column with object dtype #26929

nchrisr commented Jun 18, 2019 •

edited

Loading

INSTALLED VERSIONS

TomAugspurger commented Jun 18, 2019

nchrisr commented Jun 18, 2019 •

edited

Loading

TomAugspurger commented Jun 18, 2019

nchrisr commented Jun 18, 2019 •

edited

Loading

mroeschke commented Jun 18, 2019

nchrisr commented Jun 19, 2019

And0k commented Jun 11, 2020 •

edited

Loading

jbrockmendel commented Sep 26, 2020

BUG: DataFrame.shift with axis=1 shifts object columns to the next column with object dtype #26929

BUG: DataFrame.shift with axis=1 shifts object columns to the next column with object dtype #26929

Comments

nchrisr commented Jun 18, 2019 • edited Loading

Code Sample

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Jun 18, 2019

nchrisr commented Jun 18, 2019 • edited Loading

TomAugspurger commented Jun 18, 2019

nchrisr commented Jun 18, 2019 • edited Loading

mroeschke commented Jun 18, 2019

nchrisr commented Jun 19, 2019

And0k commented Jun 11, 2020 • edited Loading

jbrockmendel commented Sep 26, 2020

nchrisr commented Jun 18, 2019 •

edited

Loading

Output of `pd.show_versions()`

nchrisr commented Jun 18, 2019 •

edited

Loading

nchrisr commented Jun 18, 2019 •

edited

Loading

And0k commented Jun 11, 2020 •

edited

Loading