Achieving Fairness at No Utility Cost via Data Reweighing with Influence

Code for ICML 2022 paper: Achieving Fairness at No Utility Cost via Data Reweighing with Influence.
https://proceedings.mlr.press/v162/li22p/li22p.pdf

Please consider citing our paper if you find our research helpful :)

Reference

@InProceedings{pmlr-v162-li22p,
  title = 	 {Achieving Fairness at No Utility Cost via Data Reweighing with Influence},
  author =       {Li, Peizhao and Liu, Hongfu},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {12917--12930},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/li22p/li22p.pdf},
  url = 	 {https://proceedings.mlr.press/v162/li22p.html},
  abstract = 	 {With the fast development of algorithmic governance, fairness has become a compulsory property for machine learning models to suppress unintentional discrimination. In this paper, we focus on the pre-processing aspect for achieving fairness, and propose a data reweighing approach that only adjusts the weight for samples in the training phase. Different from most previous reweighing methods which usually assign a uniform weight for each (sub)group, we granularly model the influence of each training sample with regard to fairness-related quantity and predictive utility, and compute individual weights based on influence under the constraints from both fairness and utility. Experimental results reveal that previous methods achieve fairness at a non-negligible cost of utility, while as a significant advantage, our approach can empirically release the tradeoff and obtain cost-free fairness for equal opportunity. We demonstrate the cost-free fairness through vanilla classifiers and standard training processes, compared to baseline methods on multiple real-world tabular datasets. Code available at https://github.com/brandeis-machine-learning/influence-fairness.}
}

Setup

Dataset

Download data to ./data and build the folder structure as

./data
    └ adult
    │    │ adult.data
    │    │ adult.test
    │    └ meta.json
    └ compas
    │    │ compas-scores-two-years.csv
    │    │ idx.json
    │    └ meta.json
    └ german
         │ german.data
         │ idx.json
         └ meta.json

Adult: https://archive.ics.uci.edu/ml/datasets/adult
Compas: https://github.com/propublica/compas-analysis/
German: https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)

Gurobi

We use Gurobi to solve linear programs.

Install Gurobi for Python: https://support.gurobi.com/hc/en-us/articles/360044290292-How-do-I-install-Gurobi-for-Python-
We register an Individual Academic License to use the full features of Gurobi. Please follow https://www.gurobi.com/academia/academic-program-and-licenses/ to obtain a gurobi.lic if you are eligible.

Experiment

Run logistic regression on the Adult dataset with Equal Opportunity.

python main.py --dataset adult --metric eop --beta 0.5 --gamma 0.2

You will have the following printing information:

Loading Adult dataset..
Dataset statistic - #total: 45222; #train: 22622; #val.: 7540; #test: 15060; #dim.: 102

Maximum fairness promotion: -0.71638; Maximum utility promotion: -241.51456
Set parameter Username
Academic license - for non-commercial use only - expires 2022-12-01
Warning for adding constraints: zero or small (< 1e-13) coefficients, ignored
Gurobi Optimizer version 9.5.0 build v9.5.0rc5 (linux64)
Thread count: 10 physical cores, 20 logical processors, using up to 20 threads
Optimize a model with 2 rows, 22622 columns and 44984 nonzeros
Model fingerprint: 0x87f80f7c
Coefficient statistics:
  Matrix range     [1e-13, 2e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [1e+00, 1e+00]
  RHS range        [9e-02, 5e+01]

Concurrent LP optimizer: primal simplex, dual simplex, and barrier
Showing barrier log only...

Presolve removed 0 rows and 8933 columns
Presolve time: 0.02s
Presolved: 2 rows, 13689 columns, 27378 nonzeros

Ordering time: 0.00s

Barrier statistics:
 AA' NZ     : 1.000e+00
 Factor NZ  : 3.000e+00 (roughly 5 MB of memory)
 Factor Ops : 5.000e+00 (less than 1 second per iteration)
 Threads    : 1

                  Objective                Residual
Iter       Primal          Dual         Primal    Dual     Compl     Time
   0   6.67687894e+09 -2.82255343e+11  1.19e+00 6.04e+05  7.28e+08     0s
   1   5.71993064e+07 -2.49057677e+11  1.12e-02 8.38e+06  1.35e+07     0s
   2   1.58465157e+05 -4.06195963e+10  3.36e-05 1.07e-06  1.49e+06     0s
   3   7.21509681e+03 -6.37481572e+06  0.00e+00 5.96e-07  2.33e+02     0s
   4   7.05936180e+03 -1.56293507e+05  0.00e+00 3.73e-07  5.97e+00     0s
   5   5.47000038e+03 -2.77631209e+04  0.00e+00 6.61e-08  1.21e+00     0s
   6   4.54852224e+03 -1.00437285e+04  0.00e+00 5.96e-08  5.33e-01     0s

Barrier performed 6 iterations in 0.03 seconds (0.03 work units)
Barrier solve interrupted - model solved by another algorithm


Solved with dual simplex
Solved in 7 iterations and 0.03 seconds (0.02 work units)
Optimal objective  2.497175044e+02
Total removal: 249.71750; Ratio: 1.104%

Fairness loss: 0.17708 -> 0.08392; Utility loss: 3050.91919 -> 3027.89156
------------------------------ Results on val
Grp. 0 - #instance: 2446; #pos. pred: 357; Acc.: 0.899428
Grp. 1 - #instance: 5094; #pos. pred: 1684; Acc.: 0.797801
Overall acc.: 0.830769; Demographic parity: 0.184632; Equal opportunity: 0.007115
------------------------------ Results on test
Grp. 0 - #instance: 4913; #pos. pred: 750; Acc.: 0.889477
Grp. 1 - #instance: 10147; #pos. pred: 3229; Acc.: 0.795210
Overall acc.: 0.825963; Demographic parity: 0.165566; Equal opportunity: -0.002712
Total time: 2.64818s

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
fig		fig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
eval.py		eval.py
fair_fn.py		fair_fn.py
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Achieving Fairness at No Utility Cost via Data Reweighing with Influence

Reference

Setup

Dataset

Gurobi

Experiment

About

Releases

Packages

Languages

License

brandeis-machine-learning/influence-fairness

Folders and files

Latest commit

History

Repository files navigation

Achieving Fairness at No Utility Cost via Data Reweighing with Influence

Reference

Setup

Dataset

Gurobi

Experiment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages