In our project using the ProPublica COMPAS dataset, we focused on predicting recidivism within the next two years. After analyzing the dataset, we identified and eliminated irrelevant columns, encoded categorical data, and selected the "two_year_recid" variable as our target. We further analyzed and normalized the features, selecting 14 strongly correlated features related to age, charge degree, race, score text, sex, priors count, and length of stay. We replaced missing values and removed rows with irrelevant dates. Using the Gradient Boosting algorithm as our optimal classifier, we achieved superior performance in terms of accuracy, precision, AUC, and false positives compared to other classifiers.
After preprocessing, the COMPAS dataset was split into training and testing sets, with 33% reserved for testing. The Gradient Boosting algorithm was chosen as the optimum classifier and yielded a confusion matrix with True Positives of 560, True Negatives of 837, False Positives of 281, and False Negatives of 359. The false positive rates were then calculated, revealing a rate of 35% for African Americans and 16% for Caucasians. These results suggest that the algorithm may be biased against African Americans, potentially leading to unjust sentencing and reinforcing racial biases within the criminal justice system. Another measure of bias, calibration, showed that the algorithm was similarly calibrated for both racial groups. However, even small amounts of bias can have significant societal implications. The opportunity cost metric, considering false positive rates, is more appropriate in the domain of prisoner classification. The fair model, using fairlearn's equilized_odds_difference metric, successfully reduced the disparity in false positive rates between African Americans and Caucasians. The fair model achieved a lower accuracy and f1-score but addressed the issue of significant disparity in false positive rates, treating both groups equally and eliminating the possibility of unintended discrimination or biased outcomes.