Example code and data for "Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014.
- The book: "Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014 (book copyright Manning Publications Co., all rights reserved)
- The support site: GitHub WinVector/zmPDSwR
- Chapter 8: Using Unsupervised Methods
Book-Crossing dataset mined by Cai-Nicolas Ziegler, DBIS Freiburg original link http://www.informatik.uni-freiburg.de/~cziegler/BX/
Collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September 2004) from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems. Contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.
Freely available for research use when acknowledged with the following reference (further details on the dataset are given in this publication):
Improving Recommendation Lists Through Topic Diversification, Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen; Proceedings of the 14th International World Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan. To appear.
http://www.informatik.uni-freiburg.de/~cziegler/BX/WWW-2005-Preprint.pdf
- bxBooks.RData : R-binary version of Book-Crossing dataset.
- bookdata.tsv.gz : gzipped tab-separated file containing customer book ratings by title and numerical rating
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
- read_bookcrossing.R : script to read in original data files and create bxBooks.RData
- create_bookdata.R : script to create the data file bookdata.tsv