Skip to content

Latest commit

 

History

History
 
 

Bookdata

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Example code and data for "Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014.

The code and data in this directory supports examples from:

  • Chapter 8: Using Unsupervised Methods

Original data:

Book-Crossing dataset mined by Cai-Nicolas Ziegler, DBIS Freiburg original link http://www.informatik.uni-freiburg.de/~cziegler/BX/

Collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September 2004) from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems. Contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.

Freely available for research use when acknowledged with the following reference (further details on the dataset are given in this publication):

Improving Recommendation Lists Through Topic Diversification, Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen; Proceedings of the 14th International World Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan. To appear.

http://www.informatik.uni-freiburg.de/~cziegler/BX/WWW-2005-Preprint.pdf

Derived works (no claim of license on these):

  • bxBooks.RData : R-binary version of Book-Crossing dataset.
  • bookdata.tsv.gz : gzipped tab-separated file containing customer book ratings by title and numerical rating

Our additional documentation, notes, code, and example data:

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

  • read_bookcrossing.R : script to read in original data files and create bxBooks.RData
  • create_bookdata.R : script to create the data file bookdata.tsv