GitHub - eikevons/pandas-paddles: Access the parent Pandas data frame in loc[], iloc[], assign(), and others Pandas helpers

Pandas Paddles

Access the calling pandas data frame in loc[], iloc[], assign() and other methods with DF to write better chains of data frame operations, e.g.:

df = (
    df
    # Select all rows with column "x" < 2
    .loc[DF["x"] < 2]
    .assign(
        # Shift "x" by its minimum.
        y = DF["x"] - DF["x"].min(),
        # Clip "x" to it's central 50% window. Note how DF is used
        # in the argument to `clip()`.
        z = DF["x"].clip(
            lower=DF["x"].quantile(0.25),
            upper=DF["x"].quantile(0.75)
        ),
    )
)

Overview

Motivation: Make chaining Pandas operations easier and bring functionality to Pandas similar to Spark's col() function or referencing columns in R's dplyr.
Install from PyPI with pip install pandas-paddles. Pandas versions 1+ (>=1,<3) are supported.
Documentation can be found at readthedocs.
Source code can be obtained from GitHub.
Changelog

Example: Create new column and filter

Instead of writing "traditional" Pandas like this:

df_in = pd.DataFrame({"x": range(5)})
df = df_in.copy()
df["y"] = df["x"] // 2
df = df.loc[df["y"] <= 1]
df
#    x  y
# 0  0  0
# 1  1  0
# 2  2  1
# 3  3  1

One can write:

from pandas_paddles import DF
df = (
  df_in
  .assign(y = DF["x"] // 2)
  .loc[DF["y"] <= 1]
)

This is especially handy when re-iterating on data frame manipulations interactively, e.g. in a notebook (just imagine you have to rename df to df_out).

But you can access all methods and attributes of the data frame from the context:

df = pd.DataFrame({
    "X": range(5),
    "y": ["1", "a", "c", "D", "e"],
})
df.loc[DF["y"].str.isupper() | DF["y"].str.isnumeric()]
#    X  y
# 0  0  1
# 3  3  D
df.loc[:, DF.columns.str.isupper()]
#    X
# 0  0
# 1  1
# 2  2
# 3  3
# 4  4

You can even use DF in the arguments to methods:

df = pd.DataFrame({
    "x": range(5),
    "y": range(2, 7),
})
df.assign(z = DF['x'].clip(lower=2.2, upper=DF['y'].median()))
#    x  y    z
# 0  0  2  2.2
# 1  1  3  2.2
# 2  2  4  2.2
# 3  3  5  3.0
# 4  4  6  4.0

When working with pd.Series the S object exists. It can be used similar to DF:

s = pd.Series(range(5))
s[S < 3]
# 0    0
# 1    1
# 2    2
# dtype: int64

Similar projects for pandas

siuba
- (+) active
- (-) new API to learn
pandas-ply
- (-) stale(?), last change 6 years ago
- (-) new API to learn
- (-) Symbol / pandas_ply.X works only with ply_* functions
pandas-select
- (+) no explicite df necessary
- (-) new API to learn
pandas-selectable
- (+) simple select accessor
- (-) usage inside chains clumsy (needs explicite df):
```
((df
  .select.A == 'a')
  .select.B == 'b'
)
```
- (-) hard-coded str, dt accessor methods
- (?) composable?

Development

Development is containerized with Docker to separte from host systems and improve reproducability. No other prerequisites are needed on the host system.

Recommendation for Windows users: install WSL 2 (tested on Ubuntu 20.04), and for containerized workflows, Docker Desktop for Windows.

The common tasks are collected in Makefile (See make help for a complete list):

Run the unit tests: make test or make watch for continuously running tests on code-changes.
Build the documentation: make docs
TODO: Update the poetry.lock file: make lock
Add a dependency:
1. Start a shell in a new container.
2. Add dependency with poetry add in the running container. This will update poetry.lock automatically:
```
# 1. On the host system
% make shell
# 2. In the container instance:
I have no name!@7d0e85b3a303:/app$ poetry add --dev --lock falcon
```
Build the development image make image (Note: This should be done automatically for the targets.)

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.github/workflows		.github/workflows
docker		docker
docs		docs
pandas_paddles		pandas_paddles
pandas_selector		pandas_selector
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
Changelog.md		Changelog.md
LICENSE		LICENSE
Makefile		Makefile
README.rst		README.rst
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pandas Paddles

Overview

Example: Create new column and filter

Similar projects for pandas

Development

About

Releases

Packages

Languages

License

eikevons/pandas-paddles

Folders and files

Latest commit

History

Repository files navigation

Pandas Paddles

Overview

Example: Create new column and filter

Similar projects for pandas

Development

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages