Skip to content
/ hidi Public

A library for high-dimensional latent factor modeling for collaborative filtering applications

License

Notifications You must be signed in to change notification settings

kahnvex/hidi

Repository files navigation

HiDi: Pipelines for Latent Factor Modeling

https://circleci.com/gh/VEVO/hidi/tree/master.svg?style=svg

HiDi is a library for high-dimensional latent factor modeling for collaborative filtering applications.

Read the full documentation.

How Do I Use It?

This will get you started.

from hidi import inout, clean, matrix, pipeline


# CSV file with link_id and item_id columns
in_files = ['hidi/examples/data/user-item.csv']

# File to write output data to
outfile = 'latent-factors.csv'

transforms = [
    inout.ReadTransform(in_files),      # Read data from disk
    clean.DedupeTransform(),            # Dedupe it
    matrix.SparseTransform(),           # Make a sparse user*item matrix
    matrix.SimilarityTransform(),       # To item*item similarity matrix
    matrix.SVDTransform(),              # Perform SVD dimensionality reduction
    matrix.ItemsMatrixToDFTransform(),  # Make a DataFrame with an index
    inout.WriteTransform(outfile)       # Write results to csv
]

pl = pipeline.Pipeline(transforms)
pl.run()

Setup

Requirements

HiDi is tested against CPython 2.7, 3.4, 3.5, and 3.6. It may work with different version of CPython.

Installation

To install HiDi, simply run

$ pip install hidi

Run the Tests

$ pip install tox
$ tox

About

A library for high-dimensional latent factor modeling for collaborative filtering applications

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published