Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add local map operation #16

Merged
merged 1 commit into from
May 5, 2021
Merged

Conversation

replomancer
Copy link
Member

@replomancer replomancer commented May 4, 2021

Description

Adds a method for local map operation. See comment about a file change below.

Affected Dependencies

Nope.

How has this been tested?

Tests added 🧪
The empty __init__.py file has been added because without it pytest (and probably other testing frameworks?) cannot find pipeline_dp for importing (Python 3.8.5). I can imagine PyCharm users do not see the issue, but it will be needed for CI.

Checklist

@replomancer replomancer added the Type: New Feature ➕ Introduction of a completely new addition to the codebase label May 4, 2021
def map(self, col, fn, stage_name: str):
pass
def map(self, col, fn, stage_name: str = None):
return map(fn, col)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the order is reversed: fn is passed before col. That's how map works in most languages, so current order of the arguments in PipelineOperations.map may be confusing to people. Can we reverse it or should col stay as the first argument? Same thing with filter: function is passed before iterable to Python's filter.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question! I think this is a common thing that different things are different in different languages/frameworks :)

For example in Spark and Beam collection is before fn in statement:

Beam: col | beam.Map(fn)
Spark: col.Map(fn)

Another thing is that for all PipelineOperations method col is always an argument, fn isn't (eg. in Keys()), so it's more uniform to have col always as the 1st argument.

Copy link
Collaborator

@dvadym dvadym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

def map(self, col, fn, stage_name: str):
pass
def map(self, col, fn, stage_name: str = None):
return map(fn, col)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question! I think this is a common thing that different things are different in different languages/frameworks :)

For example in Spark and Beam collection is before fn in statement:

Beam: col | beam.Map(fn)
Spark: col.Map(fn)

Another thing is that for all PipelineOperations method col is always an argument, fn isn't (eg. in Keys()), so it's more uniform to have col always as the 1st argument.

@dvadym dvadym merged commit 6694a76 into OpenMined:main May 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: New Feature ➕ Introduction of a completely new addition to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants