🪐MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset

This is the official code and data repository for the paper: 🪐MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset.

1. Download Dataset/Model Checkpoints

The 🪐MARS benchmark and our best model checkpoints on three tasks in 🪐MARS can be downloaded at this link.

2. Benchmark Curation

Code for instructing ChatGPT to curate the 🪐MARS benchmark can be found in the benchmark_curation folder.

3. Evaluation

Code for evaluating language models on the 🪐MARS benchmark can be found in the evaluation folder.

4. Citing this work

Please use the bibtex below for citing our paper:

@inproceedings{Wang2024MARSBT,
  title={MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset},
  author={Weiqi Wang and Yangqiu Song},
  year={2024},
  url={https://doi.org/10.48550/arXiv.2406.02106},
  doi={10.48550/arXiv.2406.02106}
}

5. Acknowledgement

The authors of this paper were supported by the NSFC Fund (U20B2053) from the NSFC of China, the RIF (R6020-19 and R6021-20), and the GRF (16211520 and 16205322) from RGC of Hong Kong. We also thank the support from the UGC Research Matching Grants (RMGS20EG01-D, RMGS20CR11, RMGS20CR12, RMGS20EG19, RMGS20EG21, RMGS23CR05, RMGS23EG08).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
benchmark_curation		benchmark_curation
demo		demo
evaluation		evaluation
.gitignore		.gitignore
Croissant.json		Croissant.json
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🪐MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset

1. Download Dataset/Model Checkpoints

2. Benchmark Curation

3. Evaluation

4. Citing this work

5. Acknowledgement

About

Releases

Packages

Languages

License

HKUST-KnowComp/MARS

Folders and files

Latest commit

History

Repository files navigation

🪐MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset

1. Download Dataset/Model Checkpoints

2. Benchmark Curation

3. Evaluation

4. Citing this work

5. Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages