-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Criteo dataloader #642
Criteo dataloader #642
Conversation
Jeremr criteo dataloader addition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry that you have to bear the brunt of my grumpiness with how the data loading is setup =)
we can simplify this section and avoid creating unnecessary single use functions or convenience classes that make the code more brittle to changes to dependencies. this will also limit the test surface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check my comment for potential misbehavior that might be caused by removing the clean-up codes.
reco_utils/dataset/criteo.py
Outdated
|
||
|
||
@contextmanager | ||
def _real_path(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loomlike we should do DRY here. However, this code is a little bit different to what you have in movielens. I'm not sure how to homogenize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it. What about this:
- Put this
real_path
to url_utils (btw we talked about renaming this to dataset_utils, right?) - Use that function from criteo.py
- Refactor movielens to use the same function. Probablly, to check "zip" file name from outside of this function. - either you or I can refactor this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks reasonable to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. @loomlike if you are free today, feel free to push directly to the branch. If not I'll do it tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, let's merge this!
reco_utils/dataset/criteo.py
Outdated
|
||
|
||
@contextmanager | ||
def _real_path(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks reasonable to me
Description
Criteo dataload in python and spark
Smoke and integration
Related Issues
#555
Checklist: