Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting a column with a scalar and no index (revert #16823) #17894

Closed
toobaz opened this issue Oct 16, 2017 · 0 comments · Fixed by #17902
Closed

Allow setting a column with a scalar and no index (revert #16823) #17894

toobaz opened this issue Oct 16, 2017 · 0 comments · Fixed by #17902
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@toobaz
Copy link
Member

toobaz commented Oct 16, 2017

Extract of discussion from #16823

Code Sample, a copy-pastable example if possible

In [2]: df = pd.DataFrame()

In [3]: df['dummy'] = 1
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-3122daad6dab> in <module>()
----> 1 df['dummy'] = 1

/home/nobackup/repo/pandas/pandas/core/frame.py in __setitem__(self, key, value)
   2515         else:
   2516             # set column
-> 2517             self._set_item(key, value)
   2518 
   2519     def _setitem_slice(self, key, value):

/home/nobackup/repo/pandas/pandas/core/frame.py in _set_item(self, key, value)
   2584         """
   2585 
-> 2586         self._ensure_valid_index(value)
   2587         value = self._sanitize_column(key, value)
   2588         NDFrame._set_item(self, key, value)

/home/nobackup/repo/pandas/pandas/core/frame.py in _ensure_valid_index(self, value)
   2561             if not is_list_like(value):
   2562                 # GH16823, Raise an error due to loss of information
-> 2563                 raise ValueError('If using all scalar values, you must pass'
   2564                                  ' an index')
   2565             try:

ValueError: If using all scalar values, you must pass an index

Problem description

Previously, the above would just add a new (obviously empty) column.

@jreback objects that if this is allowed, then we should also allow initialization with only scalars (as in pd.Dataframe({'a' : 1, 'b' : 2})

I'm not 100% sure of what @jorisvandenbossche suggests, but he agrees with me that the current state is inconsistent.

My view is that previously things were just (almost) fine:

  • at initialization, a DataFrame needs to have an index. You can avoid providing one expliclty only if it can be automatically built for the values you pass (i.e. 1-dimensional objects of the same length, or a single 2-dimensional block of data). Scalars clearly do not satisfy this requirement, so the constructor will raise if passed only scalars (but pd.DataFrame({'A' : range(3), 'B' : 23}) works, which is cool).
  • at assignment, there is already an index, and in particular, when assigning a(n entire) column you know you'll never alter the index. More specifically, when you assign a scalar to a column, you know it will alter all existing rows, which means "none" if the index is empty. And if the column does not exist, it will just be added, clearly empty as well.

In both cases, scalars/empty indexes represent no exception to the general behavior.

For consistency, we might want to fix the following too:

In [2]: pd.DataFrame().loc[1] = 0
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-20bba0809d0b> in <module>()
----> 1 pd.DataFrame().loc[1] = 0

/home/nobackup/repo/pandas/pandas/core/indexing.py in __setitem__(self, key, value)
    192             key = com._apply_if_callable(key, self.obj)
    193         indexer = self._get_setitem_indexer(key)
--> 194         self._setitem_with_indexer(indexer, value)
    195 
    196     def _has_valid_type(self, k, axis):

/home/nobackup/repo/pandas/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
    421                     # no columns and scalar
    422                     if not len(self.obj.columns):
--> 423                         raise ValueError("cannot set a frame with no defined "
    424                                          "columns")
    425 

ValueError: cannot set a frame with no defined columns

but I will detail this in a separate issue.

Expected Output

None, but a new column "dummy" is added to df.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 5bf7f9a
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.21.0rc1+18.g5bf7f9a4f
pytest: 3.0.6
pip: 9.0.1
setuptools: None
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 5.1.0.dev
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

@toobaz toobaz changed the title Allow setting a column with a scalar and no index should raise (revert #16823) Allow setting a column with a scalar and no index (revert #16823) Oct 16, 2017
@jorisvandenbossche jorisvandenbossche added API Design Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 16, 2017
@jorisvandenbossche jorisvandenbossche added this to the 0.21.0 milestone Oct 16, 2017
jreback added a commit to jreback/pandas that referenced this issue Oct 17, 2017
jreback added a commit that referenced this issue Oct 18, 2017
…h no index ( #16823) (#16968)" (#17902)

* Revert "ERR: Raise ValueError when setting scalars in a dataframe with no index ( #16823) (#16968)"

This reverts commit f9ba6fe.

* TST: expicit test on setting scalars on empty frame

closes #17894
yeemey pushed a commit to yeemey/pandas that referenced this issue Oct 20, 2017
…h no index ( pandas-dev#16823) (pandas-dev#16968)" (pandas-dev#17902)

* Revert "ERR: Raise ValueError when setting scalars in a dataframe with no index ( pandas-dev#16823) (pandas-dev#16968)"

This reverts commit f9ba6fe.

* TST: expicit test on setting scalars on empty frame

closes pandas-dev#17894
alanbato pushed a commit to alanbato/pandas that referenced this issue Nov 10, 2017
…h no index ( pandas-dev#16823) (pandas-dev#16968)" (pandas-dev#17902)

* Revert "ERR: Raise ValueError when setting scalars in a dataframe with no index ( pandas-dev#16823) (pandas-dev#16968)"

This reverts commit f9ba6fe.

* TST: expicit test on setting scalars on empty frame

closes pandas-dev#17894
No-Stream pushed a commit to No-Stream/pandas that referenced this issue Nov 28, 2017
…h no index ( pandas-dev#16823) (pandas-dev#16968)" (pandas-dev#17902)

* Revert "ERR: Raise ValueError when setting scalars in a dataframe with no index ( pandas-dev#16823) (pandas-dev#16968)"

This reverts commit f9ba6fe.

* TST: expicit test on setting scalars on empty frame

closes pandas-dev#17894
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
2 participants