Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB contrib : Next replaced with for loop to avoid propagation of StopIteration #2221

Merged
merged 9 commits into from
Sep 11, 2017

Conversation

kritchie
Copy link
Contributor

Description

Replaced next with for loop in MongoDB count target.
Also fixed comment in the function's documentation.

Motivation and Context

A specific usage caused a StopIteration to be propagated which cause the "output" to be always true even if the exist condition was not met in the MongoDB count target.

This line in the complete method always return true if a StopIteration is raised in the exist method :
return all(map(lambda output: output.exists(), outputs))

Because the all method of python returns True in case of an empty array.
(Maybe this is something that could also be adjusted? Just wondering)

Have you tested this? If so, how?

Using the following code, I've tested multiple runs with conditions :

  • Collection doesn't exists
  • Collection exists with count not matching
  • Collection exists with count matching

In all of the above, the pipeline worked and produced the desired result.
(Running when the count doesn't match or the collection doesn't exist, and not running if the count matches)

class MongoVerifyCount(luigi.Task):

    endpoint = luigi.Parameter(default='127.0.0.1')
    port = luigi.IntParameter(default=27017)
    database = luigi.Parameter()
    collection = luigi.Parameter()
    target_count = luigi.IntParameter()

    def output(self):
        return MongoCountTarget(mongo_client=MongoClient(self.endpoint, self.port),
                                index=self.database,
                                collection=self.collection,
                                target_count=self.target_count)

    def run(self):

        current_count = self.output().read()
        collection = MongoClient(self.endpoint, self.port)[self.database][self.collection]

        for i in range(self.target_count - current_count):
            collection.insert_one({'count': i})

Following this, here's a simple example of what was happening :
This will always result with :

Scheduled 1 tasks of which:

  • 1 present dependencies were encountered:
    • 1 ZeroTask()
import luigi

class ZeroTarget(luigi.Target):
    def exists(self):
        raise StopIteration

class ZeroTask(luigi.Task):
    def output(self):
        return ZeroTarget()

    def run(self):
        print('Zero !')

if __name__ == '__main__':
    luigi.run(main_task_cls=ZeroTask, local_scheduler=True)

@mention-bot
Copy link

@kritchie, thanks for your PR! By analyzing the history of the files in this pull request, we identified @MathiasDesch to be a potential reviewer.

@kritchie
Copy link
Contributor Author

kritchie commented Sep 7, 2017

Anything on this ?

@dlstadther
Copy link
Collaborator

@MathiasDesch As another MongoDB contributor, could you take a look here?

Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't use MongoDB, but based on description this makes sense to me.

One comment about unnecessary (but not incorrect) final return statement.

return next(self.get_collection().aggregate([{'$group': {'_id': None, 'count': {'$sum': 1}}}])).get('count')
for res in self.get_collection().aggregate([{'$group': {'_id': None, 'count': {'$sum': 1}}}]):
return res.get('count', None)
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this last line is necessary. If the for loop isn't entered, the method by default return None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's true, I just wanted to be as explicit as possible and since we use the None value for comparison in the exist() method I though it made sense to add that line but I could be wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Yeah, that's fine. Just commenting on it; not saying it was wrong.

@dlstadther dlstadther merged commit 1795eb1 into spotify:master Sep 11, 2017
This was referenced Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants