Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forever Dreaming TV Transcript Dataset #2631

Merged
merged 3 commits into from
Apr 16, 2023

Conversation

sedthh
Copy link
Collaborator

@sedthh sedthh commented Apr 16, 2023

I have finally finished crawling foreverdreaming.com. The transcripts are about 5% of all the content on the website.
However, we have decided not to share the crawler notebook this time, because it would allow anyone to mirror all the contents on the website just by changing a few lines of code. The owner of foreverdreaming has invested a lot of time and resources into running their website and it simply would not be fair towards them.
We have discussed this on Discord.

The dataset is https://huggingface.co/datasets/sedthh/fd_dialogue

sedthh added 3 commits April 16, 2023 23:12
- added fd_dialogue to datasets
- fixed typo for Gutenber(g)
- readded missing index=False to README.md (even though there
is a correct copy in the docs for datasets)
- added copy of dataset card from HF and
mentioned the lack of crawler notebooks
- with --all-files
@olliestanley olliestanley changed the title Forver Dreaming TV Transcript Dataset Forever Dreaming TV Transcript Dataset Apr 16, 2023
Copy link
Collaborator

@olliestanley olliestanley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@olliestanley olliestanley merged commit 3fe7c44 into LAION-AI:main Apr 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants