-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any plans for an entire goodreads user review dataset? #3
Comments
Hello @SantoshGuptaML this is a project I would be interested in participating in! if you are still interested in scrapping this data please contact me i would be happy to help you in this endeavor! |
HI @jonvlcs07 , somehow I missed your reply. How can I contact you? Feel free to email me at SanGupta.ML@gmail.com |
@SantoshGuptaML My apologies to the late reply as I'm mostly maintain this dataset in my limited spare time and do not have any plan to extend the dataset. Do you need to keep this issue open in case other folks may want to join your project? |
Sure sounds good. At the moment I'm just trying to contact jonvlcs07 as they expressed interest in the dataset development. I can't seem to find any contact info on the profile, but if convenient feel free to pass them mine if possible SanGupta.ML@gmail.com |
Reopen this issue - for anyone who's interested in collecting a more recent and comprehensive Goodreads review dataset, feel free to contact @SantoshGuptaML (SanGupta.ML@gmail.com) |
Is there a more recent dataset update since 2017? |
Hi @cantonalex - we (the original author group) don't have any plan on continue collecting the data at the moment. If you're interested in a more recent data, probably can contact @SantoshGuptaML on that. |
@cantonalex @jonvlcs07 I started scraping the data. Could use some help, currently projected to finish in many years lol |
The API has been discontinued, but it's actually faster to collect reviews from RSS feeds.
If there's interest, I started a script here
https://colab.research.google.com/drive/1uOyVlKaT4QFtce9yQpKj9hRtj5z8Uyta
It still needs work in confirming it has gotten all the books from a user (I think there might be timeouts) and issues with books that have several versions/editions. But the biggest bottleneck is collecting reviews from all 100 million users
The text was updated successfully, but these errors were encountered: