Skip to content

Latest commit

 

History

History
131 lines (100 loc) · 5.36 KB

README.md

File metadata and controls

131 lines (100 loc) · 5.36 KB

Symbolic knowledge injection meets intelligent agents

Experiments for "Quality-of-Service Metrics for Intelligent Agents exploiting Symbolic Knowledge Injection via PSyKI" (JAAMAS).

Reference paper

Andrea Agiollo, Andrea Rafanelli, Matteo Magnini, Giovanni Ciatto, Andrea Omicini. "[Symbolic knowledge injection meets intelligent agents: QoS metrics and experiments]", in: Auton. Agents Multi Agent Syst. 37(2): 27 (2023).

Bibtex:

@article{DBLP:journals/aamas/ARMCO23,
  author       = {Andrea Agiollo and
                  Andrea Rafanelli and
                  Matteo Magnini and
                  Giovanni Ciatto and
                  Andrea Omicini},
  title        = {Symbolic knowledge injection meets intelligent agents: QoS metrics
                  and experiments},
  journal      = {Auton. Agents Multi Agent Syst.},
  volume       = {37},
  number       = {2},
  pages        = {27},
  year         = {2023},
  url          = {https://doi.org/10.1007/s10458-023-09609-6},
  doi          = {10.1007/S10458-023-09609-6},
  timestamp    = {Tue, 12 Sep 2023 07:57:44 +0200},
  biburl       = {https://dblp.org/rec/journals/aamas/AgiolloRMCO23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

1. Download datasets

Execute the command python -m setup load_datasets [-f] [-o] to download datasets from UCI website. By default, the command will store the original dataset into datasets folder. If you specify:

  • -f y to binarize input features;
  • -o y to map the output classes into numeric indices.

Datasets are not tracked by git, so you first need to execute this command before doing anything else. For To reproduce the experiments in the paper you should run the command with both the options:

python -m setup load_datasets -f y -o y

UPDATE! (23/04/2023)

Recently, the UCI website updated one of the dataset that we are using in the experiments. Therefore, to preserve reproducibility, we have added the preprocessed dataset to the repository. Conversely, there is no need to execute the command python -m setup.py load_datasets anymore.

It represents clinical data of patients. It consists of 9 categorical ordinal features:

  1. Clump Thickness
  2. Uniformity of Cell Size
  3. Uniformity of Cell Shape
  4. Marginal Adhesion
  5. Single Epithelial Cell Size
  6. Bare Nuclei
  7. Bland Chromatin
  8. Normal Nucleoli
  9. Mitoses

All features have integer values in [1, 10] range. Class indicates if the cancer is benign or malignant.

It represents DNA sequences. Each sequence consists of 60 bases. Values of one base can be a, c, g, t (adenine, cytosine, guanine, thymine). Class indicates if a sequence activate a biological process: exon-intron, intron-exon, none. The dataset comes with its own knowledge base. Both dataset and knowledge have special symbols in addition to the 4 bases. These symbols indicate that for a particular position in the sequence more than one value of the 4 basis is allowed. For this reason, the dataset is binarized (one-hot encoding) in order to represent dna sequences with just the 4 basis.

Census income dataset (census income)

It represents general person's data and the yearly income (less or above 50,000 USD). Features are continuous, (nominal and ordinal) categorical and binary.

  1. age, continuous (integers)
  2. workclass, nominal categorical
  3. fnlwgt (final weight), continuous
  4. education, nominal categorical
  5. education-num, ordinal categorical (integers)
  6. marital-status, nominal categorical
  7. occupation, nominal categorical
  8. relationship, nominal categorical
  9. race, nominal categorical
  10. sex, binary
  11. capital-gain, continuous
  12. capital-loss, continuous
  13. hours-per-week, continuous
  14. native-country, nominal categorical

2. Generate knowledge for the census income and breast cancer dataset

Knowledge is already provided for the splice junction dataset. Instead, for the census income and breast cancer dataset it must be generated somehow. We provide a command:

python -m setup generate_missing_knowledge

for this purpose. The knowledge is generated by a classification decision tree trained upon half of the training set. To make sure that this command is successful, be sure to download the datasets first.

3. Grid Search

To obtain the best hyperparameters for the predictors, run the command:

python -m setup grid_search

This command will perform a grid search over the hyperparameters of the predictors (uneducated and educated) for each dataset. The grid search is performed on the number of layers and on the number of neurons per layer.

Note that this command will take a long time to complete.

4. Run experiments

To run the experiments, execute the command:

python -m setup run_experiments

This command will run the experiments for each dataset and for each predictor (uneducated and educated). The results will be stored in the results folder.

Note that this command will take a long time to complete.