Skip to content

Official Repo for "Efficient task-specific data valuation for nearest neighbor algorithms"

Notifications You must be signed in to change notification settings

DSoudis/KNN-PVLDB

 
 

Repository files navigation

Data Valuation

This repo is the official code base for PVLDB paper "Efficient task-specific data valuation for nearest neighbor algorithms".


It contains scripts to calculate exact Shapley value (in the exact_sp.py) and approximate Shapley value based on LSH (in the LSH_sp.py) for KNN classifier.

We also provide two examples about how to calculate exact Shapley value (in the exact_sp_example.py) and approximate Shapley value (in the LSH_sp_example.py) on Cifar-10 dataset.

In the reproduction folder, we provide our jupyter notebook scripts for tree datasets (Cifar-10, ImageNet, and YFCC100M), which recorded our experiment results, to help reproduce our experiments.

For example: result

If you have any questions about our code, please do not hesitate to ask in the issues. Thanks!

About

Official Repo for "Efficient task-specific data valuation for nearest neighbor algorithms"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 96.4%
  • Python 3.6%