Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST/CLN: break up & parametrize tests for df.set_index #22236

Merged
merged 13 commits into from
Sep 15, 2018
191 changes: 191 additions & 0 deletions pandas/tests/frame/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
import pytest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things about this file:

  • Do you use all of these fixtures in your changes?
  • I would prefer if the naming is a little more consistent e.g.:
  • You have the word "frame" is some of your fixture names but not othres
  • You have underscores between words in some names but not others

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR, I turned an often-used DF into a fixture - together with the other attributes of TestData. @jreback then told me to start a conftest.py and put it there, together with the other TestData-attributes used in this module (frame, mixed_frame).

I translated all the attributes into conftest.py without renaming them, so that they can be replaced on a per-module-basis as laid out in #22471. The names of the fixtures are clearly suboptimal, but following up on fixturizing the other modules would be much harder if I start renaming now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. Let's save for a follow-up then.

Copy link
Contributor Author

@h-vetinari h-vetinari Sep 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfyoung:

Since the names now changed after all, I thought I'd ask for your opinion on the naming/consistency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h-vetinari : This looks great!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfyoung
Thanks. I just added a few more consistency fixes (but squashed to simplify reviewing), but only the first three fixture names were affected.


import numpy as np

from pandas import compat
import pandas.util.testing as tm
from pandas import DataFrame, date_range, NaT


@pytest.fixture
def float_frame():
"""
Fixture for DataFrame of floats with index of unique strings
Columns are ['A', 'B', 'C', 'D'].
"""
return DataFrame(tm.getSeriesData())


@pytest.fixture
def float_frame2():
"""
Fixture for DataFrame of floats with index of unique strings
Columns are ['D', 'C', 'B', 'A']
"""
return DataFrame(tm.getSeriesData(), columns=['D', 'C', 'B', 'A'])


@pytest.fixture
def int_frame():
"""
Fixture for DataFrame of ints with index of unique strings
Columns are ['A', 'B', 'C', 'D']
"""
df = DataFrame({k: v.astype(int)
for k, v in compat.iteritems(tm.getSeriesData())})
# force these all to int64 to avoid platform testing issues
return DataFrame({c: s for c, s in compat.iteritems(df)}, dtype=np.int64)


@pytest.fixture
def datetime_frame():
"""
Fixture for DataFrame of floats with DatetimeIndex
Columns are ['A', 'B', 'C', 'D']
"""
return DataFrame(tm.getTimeSeriesData())


@pytest.fixture
def float_string_frame():
"""
Fixture for DataFrame of floats and strings with index of unique strings
Columns are ['A', 'B', 'C', 'D', 'foo'].
"""
df = DataFrame(tm.getSeriesData())
df['foo'] = 'bar'
return df


@pytest.fixture
def mixed_float_frame():
"""
Fixture for DataFrame of different float types with index of unique strings
Columns are ['A', 'B', 'C', 'D'].
"""
df = DataFrame(tm.getSeriesData())
df.A = df.A.astype('float16')
df.B = df.B.astype('float32')
df.C = df.C.astype('float64')
return df


@pytest.fixture
def mixed_float_frame2():
"""
Fixture for DataFrame of different float types with index of unique strings
Columns are ['A', 'B', 'C', 'D'].
"""
df = DataFrame(tm.getSeriesData())
df.D = df.D.astype('float16')
df.C = df.C.astype('float32')
df.B = df.B.astype('float64')
return df


@pytest.fixture
def mixed_int_frame():
"""
Fixture for DataFrame of different int types with index of unique strings
Columns are ['A', 'B', 'C', 'D'].
"""
df = DataFrame({k: v.astype(int)
for k, v in compat.iteritems(tm.getSeriesData())})
df.A = df.A.astype('uint8')
df.B = df.B.astype('int32')
df.C = df.C.astype('int64')
df.D = np.ones(len(df.D), dtype='uint64')
return df


@pytest.fixture
def mixed_type_frame():
"""
Fixture for DataFrame of float/int/string columns with RangeIndex
Columns are ['a', 'b', 'c', 'float32', 'int32'].
"""
return DataFrame({'a': 1., 'b': 2, 'c': 'foo',
'float32': np.array([1.] * 10, dtype='float32'),
'int32': np.array([1] * 10, dtype='int32')},
index=np.arange(10))


@pytest.fixture
def timezone_frame():
"""
Fixture for DataFrame of date_range Series with different time zones
Columns are ['A', 'B', 'C']; some entries are missing
"""
df = DataFrame({'A': date_range('20130101', periods=3),
'B': date_range('20130101', periods=3,
tz='US/Eastern'),
'C': date_range('20130101', periods=3,
tz='CET')})
df.iloc[1, 1] = NaT
df.iloc[1, 2] = NaT
return df


@pytest.fixture
def empty_frame():
"""
Fixture for empty DataFrame
"""
return DataFrame({})


@pytest.fixture
def datetime_series():
"""
Fixture for Series of floats with DatetimeIndex
"""
return tm.makeTimeSeries(nper=30)


@pytest.fixture
def datetime_series_short():
"""
Fixture for Series of floats with DatetimeIndex
"""
return tm.makeTimeSeries(nper=30)[5:]


@pytest.fixture
def simple_frame():
"""
Fixture for simple 3x3 DataFrame
Columns are ['one', 'two', 'three'], index is ['a', 'b', 'c'].
"""
arr = np.array([[1., 2., 3.],
[4., 5., 6.],
[7., 8., 9.]])

return DataFrame(arr, columns=['one', 'two', 'three'],
index=['a', 'b', 'c'])


@pytest.fixture
def frame_of_index_cols():
"""
Fixture for DataFrame of columns that can be used for indexing
Columns are ['A', 'B', 'C', 'D', 'E']; 'A' & 'B' contain duplicates (but
are jointly unique), the rest are unique.
"""
df = DataFrame({'A': ['foo', 'foo', 'foo', 'bar', 'bar'],
'B': ['one', 'two', 'three', 'one', 'two'],
'C': ['a', 'b', 'c', 'd', 'e'],
'D': np.random.randn(5),
'E': np.random.randn(5)})
return df
Loading