This is a C++ framework for variant weighted network embedding techniques. We currently release the command line interface for following models:
- DeepWalk
- Walklets
- LINE(Large-scale Information Network Embedding)
- HPE (Heterogeneous Preference Embedding)
- APP (Asymmetric Proximity Preserving graph embedding)
- MF (Matrix Factorization)
- BPR (Bayesian Personalized Ranking)
- WARP-like
- HOP-REC
- CSE (named nemf & nerank in cli)
In the near future, we will redesign the framework making some solid APIs for fast development on different network embedding techniques.
- g++ > 4.9 (In macOS, it needs OpenMP-enabled compilers. e.g. brew reinstall gcc6 --without-multilib)
$ git clone https://github.com/cnclabs/smore
$ cd smore
$ make
Given a network input:
userA itemA 3
userA itemC 5
userB itemA 1
userB itemB 5
userC itemA 4
The model learns the representations of each vertex:
6 5
userA 0.0815412 0.0205459 0.288714 0.296497 0.394043
itemA -0.207083 -0.258583 0.233185 0.0959801 0.258183
itemC 0.0185886 0.138003 0.213609 0.276383 0.45732
userB -0.0137994 -0.227462 0.103224 -0.456051 0.389858
itemB -0.317921 -0.163652 0.103891 -0.449869 0.318225
userC -0.156576 -0.3505 0.213454 0.10476 0.259673
Directly call the execution file to see the usage like:
./cli/deepwalk
./cli/walklets
./cli/line
./cli/hpe
./cli/app
./cli/mf
./cli/bpr
./cli/warp
./cli/hoprec
then you will see the options description like:
Options Description:
-train <string>
Train the Network data
-save <string>
Save the representation data
-dimensions <int>
Dimension of vertex representation; default is 64
-undirected <int>
Whether the edge is undirected; default is 1
-negative_samples <int>
Number of negative examples; default is 5
-window_size <int>
Size of skip-gram window; default is 5
-walk_times <int>
Times of being staring vertex; default is 10
-walk_steps <int>
Step of random walk; default is 40
-threads <int>
Number of training threads; default is 1
-alpha <float>
Init learning rate; default is 0.025
Usage:
./deepwalk -train net.txt -save rep.txt -undirected 1 -dimensions 64 -walk_times 10 -walk_steps 40 -window_size 5 -negative_samples 5 -alpha 0.025 -threads 1
This shell script will help obtain the representations of the Youtube links in Youtube-links dataset.
cd example
sh train_youtube.sh
Changing the number of threads in train_youtube.sh could speedup the process.
- Running with locally built image
- Building docker image which is created following the instructions of Dockerfile
docker build -t smore:latest .
- Running container instantiated by image.
docker run -it --name smore --rm -v "$PWD":/usr/local/smore/data smore:latest model_name -train training_dataset -save embedding [model_options]
- Example:
docker run -it --name smore --rm -v "$PWD":/usr/local/smore/data smore:latest hpe -train net.txt -save rep.txt
- Running with published image
- Running
smore.sh
./smore.sh model_name -train training_dataset -save embedding [model_options]
- Example:
./smore.sh model_name -train training_dataset -save embedding [model_options]
- Running
You can find related work from awesome-network-embedding.
@inproceedings{smore,
author = {Chen, Chih-Ming and Wang, Ting-Hsiang and Wang, Chuan-Ju and Tsai, Ming-Feng},
title = {SMORe: Modularize Graph Embedding for Recommendation},
year = {2019},
booktitle = {Proceedings of the 13th ACM Conference on Recommender Systems},
series = {RecSys ’19}
}
@article{pronet2017,
title={Vertex-Context Sampling for Weighted Network Embedding},
author={Chih-Ming Chen and Yi-Hsuan Yang and Yian Chen and Ming-Feng Tsai},
journal={arXiv preprint arXiv:{1711.00227}},
year={2017}
}
for HOP-REC & CSE, it is required to assign the field of each vertex in "vertex field" form:
userA u
userB u
userC u
itemA i
itemB i
itemC i
itemD i
by -field
argument.