-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update SASRec Notebook & List of Authors #1621
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @aeroabir I did another pass. Maybe the problem we are having with the long tests in the nightly builds could be related to the hardcoded values of the epochs in the tests.
we are getting weird OOM errors in other parts of the code, see https://github.com/microsoft/recommenders/runs/5035292456?check_suite_focus=true
Not sure if this is related to this PR or is something we already have in staging |
This appears in other PRs too. I am not sure why, but the GPU hosts get stuck sometimes and some tox processes remain. I cleaned them up. Try to rerun the jobs. |
Codecov Report
@@ Coverage Diff @@
## staging #1621 +/- ##
============================================
+ Coverage 0.00% 57.97% +57.97%
============================================
Files 88 88
Lines 9091 9096 +5
============================================
+ Hits 0 5273 +5273
- Misses 0 3823 +3823
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's see if after this change the nightly builds take less time
The Spark pipeline keeps failing at the tuning notebook. However it passed when I ran it on a VM. It may be because the GitHub hosts are not well resourced. |
yeah, I think this is expected because the github hosted memory is low, however, there is an OOM in the GPU ones. This is annoying, because the ADO pipeline for the notebook and unit test passes. What are your thoughts here? do you want to debug more the GPU issue or do you think we can merge based on the ADO tests? |
GPU failing is due to the hosts we have whose memory is occupied by some processes running tox from previous tests. Once you kill these processes, the pipeline can run again. However this happens frequently and needs some understanding what is going wrong. About Spark, I am seeing the tox command failing on a VM that has more than enough memory. Meanwhile, manually installing a conda env and running the tests with pytest works fine (which is consistent with ADO tests passing). I'd like to debug this a little further and post any updates here. |
Also pytest needs 5 reruns before passing. |
Ok, there was a bad configuration in the Spark cross validation test, which I fixed. The tests now pass, although there are still a couple of reruns. |
Description
SASRec notebook uses a function
data_process_with_time
that can be written in terms of an existing functionmin_rating_filter_pandas
. The notebook is updated by removingdata_process_with_time
and using onlymin_rating_filter_pandas
.Related Issues
Fixing the issue here
Checklist:
staging branch
and not tomain branch
.