LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

You may want to check out our paper - Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

We comprehensively evaluate the logical reasoning ability of LLMs on 25 different reasoning patterns spanning over propositional, first-order, and non-monotonic logics. To enable systematic evaluation, we introduce LogicBench, a natural language question-answering dataset focusing on the use of a single inference rule. We conduct detailed analysis with a range of LLMs such as GPT-4, ChatGPT, Gemini, Llama-2, and Mistral using chain-of-thought prompting. Experimental results show that existing LLMs do not fare well on LogicBench; especially, they struggle with instances involving complex reasoning and negations. We believe that our work and findings facilitate future research for evaluating and enhancing the logical reasoning ability of LLMs.

Data Release

Please see ./data folder to access the LogicBench dataset.

Licence: MIT License

Scope of the dataset: As shown below, LogicBench covers 25 inference rules/reasoning patterns spanning propositional, first-order, and non-monotonic logic.

We introduce two versions of our proposed dataset: LogicBench(Eval) and LogicBench(Aug). LogicBench(Eval) is a high-quality human-verified evaluation dataset, whereas LogicBench(Aug) is only a synthetically augmented version for training purposes. LogicBench(Eval) consists of two types of tasks: (1) Binary Question-Answering (BQA), and (2) Multiple Choice Question-Answering (MCQA).

data/ contains both versions of the dataset and is distributed in the folder as follows:

├── ...
├── data
    ├── LogicBench(Aug)
    │   ├── first_order_logic
    │   ├── nm_logic
    │   └── propositional_logic
    └── LogicBench(Eval)
        ├── BQA
        |   ├── propositional_logic
        |   ├── first_order_logic
        |   └── nm_logic
        └── MCQA
            ├── propositional_logic
            ├── first_order_logic
            └── nm_logic

In all these folders, the JSON file corresponding to each inference rule is formatted as below:

JSON file format for LogicBench(Eval) - BQA

{
    "type": "str",
    "axiom": "str",
    "samples": [
        {
            "id": "int",
            "context": "str",
            "qa_pairs": [
                {
                    "question": "str",
                    "answer": "str"
                },
                {
                    "question": "str",
                    "answer": "str"
                }
            ]
        },
        {
            "id": "int",
            "context": "str",
            "qa_pairs": [
                {
                    "question": "str",
                    "answer": "str"
                },
                {
                    "question": "str",
                    "answer": "str"
                }
            ]
        }
    ]
}

JSON file format for LogicBench(Eval) - MCQA

{
    "type": "str",
    "axiom": "str",
    "samples": [
        {
            "id": "int",
            "context": "str",
            "question": "str",
            "choices": "dict"
            "answer": "str"
        },
        {
            "id": "int",
            "context": "str",
            "question": "str",
            "choices": "dict"
            "answer": "str"
        }
    ]
}

BibTeX Entry and Citation Info

If you are using our dataset, please cite our paper:

@article{parmar2024towards,
  title={Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models},
  author={Parmar, Mihir and Patel, Nisarg and Varshney, Neeraj and Nakamura, Mutsumi and Luo, Man and Mashetty, Santosh and Mitra, Arindam and Baral, Chitta},
  journal={arXiv preprint arXiv:2404.15522},
  year={2024}
}

Stay tuned for ...

Huggingface version of LogicBench dataset for easy access
Release of source code for data generation, inference, and analysis

Contact Information

For help or issues in using LogicBench, please submit a GitHub issue.
Please contact Mihir Parmar (mparmar3@asu.edu) or Nisarg Patel (nppatel7@asu.edu) for communication related to LogicBench.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
logic_types.png		logic_types.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

Data Release

JSON file format for LogicBench(Eval) - BQA

JSON file format for LogicBench(Eval) - MCQA

BibTeX Entry and Citation Info

Stay tuned for ...

Contact Information

About

Releases

Packages

License

Mihir3009/LogicBench

Folders and files

Latest commit

History

Repository files navigation

LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

Data Release

JSON file format for LogicBench(Eval) - BQA

JSON file format for LogicBench(Eval) - MCQA

BibTeX Entry and Citation Info

Stay tuned for ...

Contact Information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages