LightFM Dataset helper

a lightweight python package to help preparing Dataframes (csv ... ) for LightFM module for easy training Training

Install

simply install with the Python Package Index (PyPI)

pip install lightfm-dataset-helper

or manually from released wheel

Example

imports the module

from lightfm_dataset_helper.lightfm_dataset_helper import DatasetHelper

Preparing the Dataframe and the required info

loading csv files

# using pandas to load csv files
import pandas as pd

def read_csv(filename):
    return pd.read_csv(filename, sep=";", error_bad_lines=False, encoding="latin-1", low_memory=False)

books = read_csv("Data/BX-Books.csv")
users = read_csv("Data/BX-Users.csv")
ratings = read_csv("Data/BX-Book-Ratings.csv")

Columns Definitions

items_column = "ISBN"
user_column = "User-ID"
ratings_column = "Book-Rating"

items_feature_columns = [
    "Book-Title",
    "Book-Author",
    "Year-Of-Publication",
    "Publisher",
]

user_features_columns = ["Location", "Age"]

Optional* for testing on small amount of data (500)

# just cutting down the amount of data to 500 for less time (making sure no missing data will be passed )
Test_amount = 500
ratings = ratings[:Test_amount]
books = books[books[items_column].isin(ratings[items_column])]
users = users[users[user_column].isin(ratings[user_column])]

Creating the helper instance

feeding the dataframes to the helper and running the routine

dataset_helper_instance = DatasetHelper(
    users_dataframe=users,
    items_dataframe=books,
    interactions_dataframe=ratings,
    item_id_column=items_column,
    items_feature_columns=items_feature_columns,
    user_id_column=user_column,
    user_features_columns=user_features_columns,
    interaction_column=ratings_column,
    clean_unknown_interactions=True,
)

# run the routine
# you can alslo run the steps separately one by one | routine function is simplifying the flow
dataset_helper_instance.routine()

feeding the dataset to the LightFM class

from lightfm import LightFM

model = LightFM(no_components=24, loss="warp", k=15)
model.fit(
    interactions=dataset_helper_instance.interactions,
    sample_weight=dataset_helper_instance.weights,
    item_features=dataset_helper_instance.item_features_list,
    user_features=dataset_helper_instance.user_features_list,
    verbose=True,
    epochs=10,
    num_threads=20,
)

Model fitted successfully and the result with verbose=True,

Epoch 0
Epoch 1
.
.
.
Epoch 8
Epoch 9

Used Dataset

using books Dataset from here

The Book-Crossing dataset comprises 3 tables.

BX-Users
Contains the users. Note that user IDs (`User-ID`) have been anonymized and map to integers. Demographic data is provided (`Location`, `Age`) if available. Otherwise, these fields contain NULL-values.

BX-Books
Books are identified by their respective ISBN. Invalid ISBNs have already been removed from the dataset. Moreover, some content-based information is given (`Book-Title`, `Book-Author`, `Year-Of-Publication`, `Publisher`), obtained from Amazon Web Services. Note that in case of several authors, only the first is provided. URLs linking to cover images are also given, appearing in three different flavours (`Image-URL-S`, `Image-URL-M`, `Image-URL-L`), i.e., small, medium, large. These URLs point to the Amazon web site.

BX-Book-Ratings
Contains the book rating information. Ratings (`Book-Rating`) are either explicit, expressed on a scale from 1-10 (higher values denoting higher appreciation), or implicit, expressed by 0.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
.idea		.idea
docs		docs
examples		examples
lightfm_dataset_helper		lightfm_dataset_helper
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LightFM Dataset helper

Install

Example

Preparing the Dataframe and the required info

Creating the helper instance

Used Dataset

MIT license

About

Releases

Packages

Languages

License

Med-ELOMARI/LightFM-Dataset-Helper

Folders and files

Latest commit

History

Repository files navigation

LightFM Dataset helper

Install

Example

Preparing the Dataframe and the required info

Creating the helper instance

Used Dataset

MIT license

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages