SessionPath (Ludwig re-mastered edition)

Personalized Category Suggestions for eCommerce Type-Ahead

Overview

This repo contains working code from our blog post Building personalized category suggestions with Ludwig. By leveraging Ludwig capabilities, we implement an encoder-decoder architecture to provide personalized and dynamic category suggestion to augment type-ahead API.

A typical type-ahead experience is this one:

What we are trying to build is a smarter system, one that suggests different categories depending on contextual factors as well (e.g. the products the user interacted with):

Blog post and code are inspired by our research paper presented @ ACL 2020: How to Grow a (Product) Tree.

Setup

Code has been written for Python 3.7 - the provided requirements.txt can be used with a virtualenv to run the project in a separate virtual environment.

Credentials and global parameters can be set with the standard .env file (*.env.local is provided as a template), and they are available in the pipeline script through dotenv.

Repo Structure

We provide two main scripts to test out our models for category prediction in type-ahead: a simplified, but realistic end-to-end "stateless" pipeline, creating from scratch from raw data all input features and a Ludwig-friendly dataset; a stand-alone folder with a minimal Ludwig script in case you already have embeddings and data rows ready for the model.

Luigi-powered pipeline

By running model_pipeline.py, a Luigi local pipeline executes a DAG comprising four tasks:

prod2vec training: product embeddings are trained from browsing data and stored locally as text in the Glove format;
dataset preparation: extract data from search logs and prepare a csv with three columns, "query" (the input query), "skus_in_session" (product identifiers for in-session interactions: view, add, etc.), "path" (the target taxonomy path). "skus_in_session" and "path" are sequences, so they are saved as tokens separated by a space;
Ludwig training: define the deep learning model and feed it to Ludwig for training and local persistence;
Ludwig testing: load the model from storage, test it on held-out data and print out summary statistics.

By using Luigi, we wrap this DAG in a convenient flow that saves us time if we need to re-run the pipeline from a particular step, and ensure consistency if we perform a clean run.

Please note that data retrieval functions in data_service.py and prod2vec_train.py are just stubs: in our original repository they contained our Snowflake-based code to load behavioral and search data from our warehouse; modify them with your own logic to extract behavioral and search data so that downstream tasks can run seamlessly (we left a small snowflake client in the repo for convenience).

The folder ludwig_playground contains *.local files that show sample datasets and sample ancillary files.

The folder data contains catalog.csv.local, which is a sample csv file representing product information (identifiers, images, taxonomy path): it may be useful to have a product lookup if your search logs (e.g. products clicked after a search) report product identifiers and you need to join products with paths to prepare the final dataset.

Standalone Ludwig training

If you already have embeddings ready (stored in a tab-separated text file, as in the "Glove format") and a dataset file, you can put them in the ludwig_playground folder and play directly with Ludwig code with no other dependency: ludwig_playground.py have some global variables you can set to re-run training, or just running a trained model on new input rows.

The *.local files in the folder show the accepted format for a dataset and an embedding file to run the Ludwig code.

Acknowledgments

This repo is a joint effort of Jacopo, Bingqing and Marie.

We wish to thank our friend Piero Molino, Ludwig's creator, for showing us how to re-write our model (SessionPath) with Ludwig.

How to Cite our Work

If you find this repo (and the ideas in it) useful for your research, please cite our work:

@inproceedings{tagliabue-etal-2020-grow,
    title = "How to Grow a (Product) Tree: Personalized Category Suggestions for e{C}ommerce Type-Ahead",
    author = "Tagliabue, Jacopo  and
      Yu, Bingqing  and
      Beaulieu, Marie",
    booktitle = "Proceedings of The 3rd Workshop on e-Commerce and NLP",
    month = jul,
    year = "2020",
    address = "Seattle, WA, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.ecnlp-1.2",
    doi = "10.18653/v1/2020.ecnlp-1.2",
    pages = "7--18",
    abstract = "In an attempt to balance precision and recall in the search page, leading digital shops have been effectively nudging users into select category facets as early as in the type-ahead suggestions. In this work, we present SessionPath, a novel neural network model that improves facet suggestions on two counts: first, the model is able to leverage session embeddings to provide scalable personalization; second, SessionPath predicts facets by explicitly producing a probability distribution at each node in the taxonomy path. We benchmark SessionPath on two partnering shops against count-based and neural models, and show how business requirements and model behavior can be combined in a principled way.",
}

The arxiv version is available here.

License

The code in this repo is freely available and provided "as is" as covered by the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SessionPath (Ludwig re-mastered edition)

Overview

Setup

Repo Structure

Luigi-powered pipeline

Standalone Ludwig training

Acknowledgments

How to Cite our Work

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
images		images
ludwig_playground		ludwig_playground
.env.local		.env.local
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_service.py		data_service.py
ludwig_wrapper.py		ludwig_wrapper.py
model_pipeline.py		model_pipeline.py
prod2vec_train.py		prod2vec_train.py
requirements.txt		requirements.txt
snowflake_client.py		snowflake_client.py

License

jacopotagliabue/session-path

Folders and files

Latest commit

History

Repository files navigation

SessionPath (Ludwig re-mastered edition)

Overview

Setup

Repo Structure

Luigi-powered pipeline

Standalone Ludwig training

Acknowledgments

How to Cite our Work

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages