This repository contains all the codes and scripts written for the MERCI project (ASPLOS '21) @ SNU Architecture and Code Optimization (ARC) Lab, 2021.
Please refer to the full paper at https://snu-arc.github.io/pubs/asplos21_merci.pdf.
Make sure that data is present at $HOME/MERCI/1_raw_data
directory.
Download reviews/metadata dataset from https://jmcauley.ucsd.edu/data/amazon/ to $HOME/MERCI/1_raw_data/amazon
e.g., $HOME/MERCI/1_raw_data/amazon/meta_Office_Products.json.gz,Office_Products.json.gz
Download dblp datset from http://networkrepository.com/ca-coauthors-dblp.php to $HOME/MERCI/1_raw_data/dblp
Download lastm dataset from http://millionsongdataset.com/lastfm/#getting to $HOME/MERCI/1_raw_data/lastfm
control_dir_path.sh generates data directory as shown below.
$ ./control_dir_path.sh ${dataset} ${num_partition}
# e.g., ./control_dir_path amazon_Office_Products 2748
$HOME/MERCI/1_raw_data
$HOME/MERCI/2_transactions/$dataset
$HOME/MERCI/3_train_test/$dataset
$HOME/MERCI/4_filtered/$dataset
$HOME/MERCI/5_patoh/$dataset/partition_$num_partition
$HOME/MERCI/6_evaluation_input/$dataset/partition_$num_partition
Process raw data into transactions, train/test sets, and filter them out accordingly
$ cd scripts
# Amazon
$ python3 amazon_parse_divide_filter.py Office_Products
# Other dataset
$ ./lastfm_dblp.sh dblp
Partition train dataset with PaToH algorithm
# Put latest PATOH binary in bin/
$ cd scripts
$ ./run_patoh.sh ${dataset} ${num_partition}
# e.g., ./run_patoh.sh amazon_Office_Products 2748
Make sure PARTITION_SIZE in clustering.cc
is set to Max value of PaToH result
'Con - 1' Cost: 1904732
Part Weights : Min= 126 (0.007) Max= 128 (0.009)
$ mkdir bin
$ make
# Correlation-Aware Variable-Sized Clustering
$ ./bin/clustering -d ${dataset} -p ${num_partition}
# For Remapped in paper result
$ ./bin/clustering -d ${dataset} -p ${num_partition} --remap-only
Make sure PARTITION_SIZE is set to PARTITION_SIZE in clustering.cc BUF_SIZE in eval_merci.cc should be set to 1024 in case of dblp dataset
$ mkdir bin
$ make all
# eval baseline
$ ./bin/eval_baseline -d ${dataset} -r ${repeat} -b ${batch size} -c ${thread}
# eval merci
$ ./bin/eval_merci -d ${dataset} -p ${num_partition} --memory_ratio ${mem} -c ${thread} -r ${repeat}
# eval remap only
$ ./bin/eval_remap_only -d ${dataset} -p ${num_partition} -c ${thread} -r ${repeat}
$ ./run_all.sh
To reproduce the results in the paper, we recommend you to set up an instance on Amazon Web Services (AWS) EC2.
m5.8xlarge instance (16 Intel Xeon Platinum 8259CL CPU cores with 128GiB of DRAM)