-
-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add local map operation #16
Conversation
def map(self, col, fn, stage_name: str): | ||
pass | ||
def map(self, col, fn, stage_name: str = None): | ||
return map(fn, col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the order is reversed: fn
is passed before col
. That's how map
works in most languages, so current order of the arguments in PipelineOperations.map
may be confusing to people. Can we reverse it or should col
stay as the first argument? Same thing with filter: function is passed before iterable to Python's filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question! I think this is a common thing that different things are different in different languages/frameworks :)
For example in Spark and Beam collection is before fn in statement:
Beam: col | beam.Map(fn)
Spark: col.Map(fn)
Another thing is that for all PipelineOperations method col is always an argument, fn isn't (eg. in Keys()), so it's more uniform to have col always as the 1st argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
def map(self, col, fn, stage_name: str): | ||
pass | ||
def map(self, col, fn, stage_name: str = None): | ||
return map(fn, col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question! I think this is a common thing that different things are different in different languages/frameworks :)
For example in Spark and Beam collection is before fn in statement:
Beam: col | beam.Map(fn)
Spark: col.Map(fn)
Another thing is that for all PipelineOperations method col is always an argument, fn isn't (eg. in Keys()), so it's more uniform to have col always as the 1st argument.
Description
Adds a method for local map operation. See comment about a file change below.
Affected Dependencies
Nope.
How has this been tested?
Tests added 🧪
The empty
__init__.py
file has been added because without it pytest (and probably other testing frameworks?) cannot findpipeline_dp
for importing (Python 3.8.5). I can imagine PyCharm users do not see the issue, but it will be needed for CI.Checklist