Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Enable unary math operations for pandas, sqlite #1071

Closed
wants to merge 3 commits into from

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Jul 19, 2017

Implement decimal for pandas
Add SQLite unary ops
Fix operations in postgres that require numeric

@cpcloud cpcloud force-pushed the unary-ops branch 3 times, most recently from 0dc84c3 to 13a0422 Compare July 20, 2017 00:30
@cpcloud cpcloud self-assigned this Jul 20, 2017
@cpcloud cpcloud added the feature Features or general enhancements label Jul 20, 2017
@cpcloud cpcloud added this to the 0.11.3 milestone Jul 20, 2017
@cpcloud cpcloud force-pushed the unary-ops branch 5 times, most recently from d6c9085 to 3322844 Compare July 21, 2017 02:39
@cpcloud cpcloud requested a review from wesm July 21, 2017 15:37
@cpcloud
Copy link
Member Author

cpcloud commented Jul 21, 2017

@wesm can you take a look here? bit of a rabbit hole with decimals, otherwise just adding unary ops to series.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, but otherwise LGTM

if column_name in schema:
ibis_type = dt.validate_type(schema[column_name])
elif dtype == np.object_:
inferred_dtype = infer_dtype(df[column_name].dropna())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes. I guess we should make a NaN-friendly type inference function someplace (seems like an oversight in infer_dtype originally)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u post an issue in pandas tracker about this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I PR'd it :) pandas-dev/pandas#17066.

def execute_series_unary_op(op, data, scope=None):
function = getattr(np, type(op).__name__.lower())
if data.dtype == np.dtype(np.object_):
return data.apply(functools.partial(execute_node, op, scope=scope))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Series.apply different from Series.map (in behavior or performance)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think so, @jreback any idea here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it looks like Series.map accepts a dict, to support a simple CASE-statement-like operation, as well as callables, whereas apply only deals with callables. For callables, both methods use the same underlying function lib.map_infer to call the passed in callable in a Cython loop. Here we could use either since we're only dealing with callables.

def execute_series_log_with_base(op, data, base, scope=None):
if data.dtype == np.dtype(np.object_):
func = np.vectorize(functools.partial(execute_node, op, scope=scope))
return pd.Series(func(data, base), index=data.index, name=data.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this bit could be factored out into a helper function since it's repeated a couple times (in case it's useful in future execution rules)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good idea!


def _floor_divide(t, expr):
left, right = map(t.translate, expr.op().args)
return sa.func.floor(left / right)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does integer division in postgres yield doubles?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it yields integers. This is implemented so that it works regardless of the type of left and right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I was just curious =)

Implement decimal for pandas
Add SQLite unary ops
Fix operations in postgres that require numeric
@cpcloud
Copy link
Member Author

cpcloud commented Jul 27, 2017

Merging on green.

@cpcloud cpcloud closed this in 9882b5a Jul 27, 2017
@cpcloud cpcloud deleted the unary-ops branch July 27, 2017 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants