Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Updated the DataFrame.assign docstring #21917

Merged
merged 88 commits into from
Sep 22, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
58942fc
Working on the assign docstring
datapythonista Jul 22, 2018
de61b38
DOC: cont'd simplified examples in DataFrame.assign docstring
aeltanawy Jul 22, 2018
ef49f88
DOC: adjusted docstring examples in DataFrame.assign to illustrate py…
aeltanawy Sep 4, 2018
1fa9bc5
DOC: Adjusted DataFrame.assign docstring
aeltanawy Sep 9, 2018
4cb55a4
DOC: adjusted the grammer in DataFrame.assign docstring.
aeltanawy Sep 11, 2018
7c7bb7a
Fixed loffset with numpy timedelta (#22482)
discort Sep 4, 2018
d96a334
CLN: Rename 'n' to 'repeats' in .repeat methods (#22574)
gfyoung Sep 4, 2018
607d646
DOC: Updating DataFrame.merge docstring (#22141)
elmq0022 Sep 4, 2018
1b11063
TST: Add capture_stderr decorator to test_validate_docstrings (#22543)
WillAyd Sep 4, 2018
3141dfe
BLD: Fix openpyxl to 2.5.5 (#22601)
gfyoung Sep 5, 2018
66d376d
Use dispatch_to_series where possible (#22572)
jbrockmendel Sep 5, 2018
2168e4a
BUG: resample with TimedeltaIndex, fenceposts are off (#22488)
discort Sep 5, 2018
6693d9a
DOC: Update link and description of the Spyder IDE in Ecosystem docs …
CAM-Gerlach Sep 5, 2018
4ed3760
DOC: Improve the docstring of DataFrame.equals() (#22539)
seantchan Sep 5, 2018
25030e2
TST: fixturize series/test_alter_axes.py (#22526)
h-vetinari Sep 5, 2018
bdca5e9
TST: restructure internal extension arrays tests (split between /arra…
jorisvandenbossche Sep 6, 2018
6c7c975
TST: Fix skipping test due to lack of connectivity (#22598)
rhysparry Sep 6, 2018
2d21d9b
API: Add CalendarDay ('CD') offset (#22288)
mroeschke Sep 7, 2018
9b92446
CLN/DEPR: removed deprecated as_indexer arg from str.match() (#22626)
HyunTruth Sep 7, 2018
ec1f7eb
BUG: NaN should have pct rank of NaN (#22600)
gfyoung Sep 8, 2018
1bfe0c4
Set hypothesis healthcheck (#22597)
alimcmaster1 Sep 8, 2018
0ac130d
Implement delegate_names to allow decorating delegated attributes (#2…
jbrockmendel Sep 8, 2018
1faac78
[PERF] use numexpr in dispatch_to_series (#22284)
jbrockmendel Sep 8, 2018
24501d9
Fix incorrect DTI/TDI indexing; warn before dropping tzinfo (#22549)
jbrockmendel Sep 8, 2018
52b1bf5
[CLN] More cython cleanups, with bonus type annotations (#22283)
jbrockmendel Sep 8, 2018
2e21bd0
move rename functionality out of internals (#21924)
jbrockmendel Sep 8, 2018
1a2b524
TST: Continue collecting arithmetic tests (#22559)
jbrockmendel Sep 8, 2018
09a3d6b
BUG: fix failing DataFrame.loc when indexing with an IntervalIndex (#…
sideeye Sep 8, 2018
128cbd9
DOC: Update `month_name` and `day_name` docstrings (#22544)
Peque Sep 8, 2018
f2af1c6
CLN: tests for str.cat (#22575)
h-vetinari Sep 8, 2018
338683e
DOC: Fix to_latex docstring. (#22516)
Moisan Sep 8, 2018
2fda626
TST: add test to io/formats/test_to_html.py to close GH6131 (#22588)
simonjayhawkins Sep 9, 2018
49b560e
DOC/CLN: small whatsnew fixes (#22659)
jschendel Sep 11, 2018
688c8a4
DOC: Add cross references to advanced.rst (#22671)
topper-123 Sep 12, 2018
f3b3694
DOC: Add section on MultiIndex.to_frame() ordering (#22674)
matthewgilbert Sep 12, 2018
6b3e3c2
TST: Avoid DeprecationWarnings (#22646)
jbrockmendel Sep 12, 2018
16725cf
TST: Collect/Use arithmetic test fixtures (#22645)
jbrockmendel Sep 12, 2018
2ec957b
pythonize cython code (#22638)
jbrockmendel Sep 12, 2018
9837dbc
API: register_extension_dtype class decorator (#22666)
TomAugspurger Sep 13, 2018
e371129
TST: Close ZipFile in compression test (#22679)
TomAugspurger Sep 13, 2018
788158d
CLN: Standardize searchsorted signatures (#22670)
gfyoung Sep 13, 2018
243a19e
DEPR: Removed styler shim (#22691)
TomAugspurger Sep 13, 2018
3445e19
TST Use pytest.raises instead of legacy constructs (#22681)
rth Sep 13, 2018
7d6f275
Fix test_sql pytest fixture warnings (#22515)
alimcmaster1 Sep 14, 2018
b151427
API: Add 'name' as argument for index 'to_frame' method (#22580)
henriqueribeiro Sep 14, 2018
dad9b7c
BUG: Incorrect addition of Week(weekday=6) to DatetimeIndex (#22695)
reidy-p Sep 14, 2018
fab723c
ASV: more for str.cat (#22652)
h-vetinari Sep 14, 2018
1761dbc
TST: Test for bug fixed during #22534 discussion (#22694)
jbrockmendel Sep 15, 2018
93628c5
Fix broken link in install.rst (#22716)
ratijas Sep 15, 2018
d950096
BUG: Make sure that sas7bdat parsers memory is initialized to 0 (#216…
troels Sep 15, 2018
831a527
API: Make .shift always copy (Fixes #22397) (#22517)
AaronCritchley Sep 15, 2018
2b81853
TST: Add test of DataFrame.xs() with duplicates (#13719) (#22294)
nmusolino Sep 15, 2018
e5d334f
DEPR: Standardize searchsorted signature (#22672)
gfyoung Sep 15, 2018
2ac80c4
TST/CLN: break up & parametrize tests for df.set_index (#22236)
h-vetinari Sep 15, 2018
a507946
TST: Mock clipboard IO (#22715)
TomAugspurger Sep 16, 2018
9fe3faf
removing superfluous reference to axis in Series.reorder_levels docst…
SandrineP Sep 17, 2018
7afa8a0
CLN/DOC: Refactor timeseries.rst intro and overview (#22728)
mroeschke Sep 17, 2018
006c013
CLN: Remove unused imports in pyx files (#22739)
mroeschke Sep 18, 2018
845b21a
CLN: Removes module pandas.json (#22737)
vitoriahmc Sep 18, 2018
3ec461f
TST/CLN: remove duplicate data file used in tests (unicode_series.csv…
simonjayhawkins Sep 18, 2018
9465a59
BUG: Some sas7bdat files with many columns are not parseable by read_…
troels Sep 18, 2018
bbf119d
DOC: improve doc string for .aggregate and .transform (#22641)
topper-123 Sep 18, 2018
48de0db
BUG: DataFrame.apply not adding a frequency if freq=None (#22150) (#2…
HannahFerch Sep 18, 2018
3c6ad7d
[ENH] pull in warning for dialect change from pandas-gbq. (#22557)
tswast Sep 18, 2018
4310671
DOC: Updating str_repeat docstring (#22571)
JesperDramsch Sep 18, 2018
49f7fc7
use fused types for reshape (#22454)
jbrockmendel Sep 18, 2018
c15d8c0
use fused types for parts of algos_common_helper (#22452)
jbrockmendel Sep 18, 2018
d03ef77
DOC: Updating the docstring of Series.str.extractall (#22565)
lucadonini96 Sep 18, 2018
52a480d
BUG: don't mangle NaN-float-values and pd.NaT (GH 22295) (#22296)
realead Sep 18, 2018
9935305
DOC: Expose ExcelWriter as part of the Generated API (#22359)
newinh Sep 18, 2018
bada277
Test in scripts/validate_docstrings.py that the short summary is alwa…
Moisan Sep 18, 2018
4f000f5
fix raise of TypeError when subtracting timedelta array (#22054)
illegalnumbers Sep 18, 2018
79b8763
Bug: Logical operator of Series with Index (#22092) (#22293)
makbigc Sep 18, 2018
1aaefe5
DOC: Fix Series nsmallest and nlargest docstring/doctests (#22731)
Moisan Sep 18, 2018
9fe0fbc
Fixturize tests/frame/test_api and tests/sparse/frame/test_frame (#22…
h-vetinari Sep 18, 2018
d64c0a8
BUG SeriesGroupBy.mean() overflowed on some integer array (#22653)
troels Sep 18, 2018
0ba7b16
TST: Fail on warning (#22699)
TomAugspurger Sep 18, 2018
73ff71e
BUG: Allow IOErrors when attempting to retrieve default client encodi…
JayOfferdahl Sep 19, 2018
b7d9884
API: Git version (#22745)
alimcmaster1 Sep 19, 2018
22b2e4a
DOC: add more links to the API in advanced.rst (#22746)
topper-123 Sep 19, 2018
27ea656
DOC: Fix DataFrame.to_xarray doctests and allow the CI to run it. (#2…
Moisan Sep 19, 2018
4a2a24c
Set up CI with Azure Pipelines (#22760)
azure-pipelines[bot] Sep 19, 2018
96b7d84
CI: Fix travis CI (#22765)
TomAugspurger Sep 19, 2018
113ff50
CI: Publish test summary (#22770)
TomAugspurger Sep 19, 2018
5474d32
BUG: Check types in Index.__contains__ (#22085) (#22602)
yeojin-dev Sep 19, 2018
6c765d3
Merge remote-tracking branch 'upstream/master' into doc
aeltanawy Sep 20, 2018
61e4dee
Merge remote-tracking branch 'upstream/master' into doc
aeltanawy Sep 20, 2018
ecfaf47
Removing -assign from pandas/ci/doctests.sh
aeltanawy Sep 21, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ci/doctests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ if [ "$DOCTEST" ]; then

# DataFrame / Series docstrings
pytest --doctest-modules -v pandas/core/frame.py \
-k"-assign -axes -combine -isin -itertuples -join -nlargest -nsmallest -nunique -pivot_table -quantile -query -reindex -reindex_axis -replace -round -set_index -stack -to_dict -to_stata"
-k"-axes -combine -isin -itertuples -join -nlargest -nsmallest -nunique -pivot_table -quantile -query -reindex -reindex_axis -replace -round -set_index -stack -to_dict -to_stata"

if [ $? -ne "0" ]; then
RET=1
Expand Down
70 changes: 28 additions & 42 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3273,7 +3273,7 @@ def assign(self, **kwargs):

Parameters
----------
kwargs : keyword, value pairs
**kwargs : dict of {str: callable or Series}
The column names are keywords. If the values are
callable, they are computed on the DataFrame and
assigned to the new columns. The callable must not
Expand All @@ -3283,7 +3283,7 @@ def assign(self, **kwargs):

Returns
-------
df : DataFrame
DataFrame
A new DataFrame with the new columns in addition to
all the existing columns.

Expand All @@ -3303,48 +3303,34 @@ def assign(self, **kwargs):

Examples
--------
>>> df = pd.DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})
>>> df = pd.DataFrame({'temp_c': [17.0, 25.0]},
... index=['Portland', 'Berkeley'])
>>> df
temp_c
Portland 17.0
Berkeley 25.0

Where the value is a callable, evaluated on `df`:

>>> df.assign(ln_A = lambda x: np.log(x.A))
A B ln_A
0 1 0.426905 0.000000
1 2 -0.780949 0.693147
2 3 -0.418711 1.098612
3 4 -0.269708 1.386294
4 5 -0.274002 1.609438
5 6 -0.500792 1.791759
6 7 1.649697 1.945910
7 8 -1.495604 2.079442
8 9 0.549296 2.197225
9 10 -0.758542 2.302585

Where the value already exists and is inserted:

>>> newcol = np.log(df['A'])
>>> df.assign(ln_A=newcol)
A B ln_A
0 1 0.426905 0.000000
1 2 -0.780949 0.693147
2 3 -0.418711 1.098612
3 4 -0.269708 1.386294
4 5 -0.274002 1.609438
5 6 -0.500792 1.791759
6 7 1.649697 1.945910
7 8 -1.495604 2.079442
8 9 0.549296 2.197225
9 10 -0.758542 2.302585

Where the keyword arguments depend on each other

>>> df = pd.DataFrame({'A': [1, 2, 3]})

>>> df.assign(B=df.A, C=lambda x:x['A']+ x['B'])
A B C
0 1 1 2
1 2 2 4
2 3 3 6
>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)
temp_c temp_f
Portland 17.0 62.6
Berkeley 25.0 77.0

Alternatively, the same behavior can be achieved by directly
referencing an existing Series or sequence:
>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32)
temp_c temp_f
Portland 17.0 62.6
Berkeley 25.0 77.0

In Python 3.6+, you can create multiple columns within the same assign
where one of the columns depends on another one defined within the same
assign:
>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32,
... temp_k=lambda x: (x['temp_f'] + 459.67) * 5 / 9)
temp_c temp_f temp_k
Portland 17.0 62.6 290.15
Berkeley 25.0 77.0 298.15
"""
data = self.copy()

Expand Down