Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any plans for an entire goodreads user review dataset? #3

Open
SantoshGuptaML opened this issue Feb 25, 2022 · 8 comments
Open

Any plans for an entire goodreads user review dataset? #3

SantoshGuptaML opened this issue Feb 25, 2022 · 8 comments

Comments

@SantoshGuptaML
Copy link

The API has been discontinued, but it's actually faster to collect reviews from RSS feeds.

If there's interest, I started a script here

https://colab.research.google.com/drive/1uOyVlKaT4QFtce9yQpKj9hRtj5z8Uyta

It still needs work in confirming it has gotten all the books from a user (I think there might be timeouts) and issues with books that have several versions/editions. But the biggest bottleneck is collecting reviews from all 100 million users

@jonvlcs07
Copy link

Hello @SantoshGuptaML this is a project I would be interested in participating in! if you are still interested in scrapping this data please contact me i would be happy to help you in this endeavor!

@SantoshGuptaML
Copy link
Author

HI @jonvlcs07 , somehow I missed your reply. How can I contact you? Feel free to email me at SanGupta.ML@gmail.com

@MengtingWan
Copy link
Owner

@SantoshGuptaML My apologies to the late reply as I'm mostly maintain this dataset in my limited spare time and do not have any plan to extend the dataset. Do you need to keep this issue open in case other folks may want to join your project?

@SantoshGuptaML
Copy link
Author

@SantoshGuptaML My apologies to the late reply as I'm mostly maintain this dataset in my limited spare time and do not have any plan to extend the dataset. Do you need to keep this issue open in case other folks may want to join your project?

Sure sounds good. At the moment I'm just trying to contact jonvlcs07 as they expressed interest in the dataset development. I can't seem to find any contact info on the profile, but if convenient feel free to pass them mine if possible SanGupta.ML@gmail.com

@MengtingWan
Copy link
Owner

MengtingWan commented Jun 3, 2023

Reopen this issue - for anyone who's interested in collecting a more recent and comprehensive Goodreads review dataset, feel free to contact @SantoshGuptaML (SanGupta.ML@gmail.com)

@MengtingWan MengtingWan reopened this Jun 3, 2023
@cantonalex
Copy link

Is there a more recent dataset update since 2017?

@MengtingWan
Copy link
Owner

Hi @cantonalex - we (the original author group) don't have any plan on continue collecting the data at the moment. If you're interested in a more recent data, probably can contact @SantoshGuptaML on that.

@Santosh-Gupta
Copy link

@cantonalex @jonvlcs07 I started scraping the data. Could use some help, currently projected to finish in many years lol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants