All datasets are structured on the following format:
Key | Type | Explanation |
---|---|---|
target | str | This is the target for the generation e.g. gold label |
input | list[str] | This is the list of inputs that are provided to the model. These can contain different tags depending on what type of inputs the dataset contains. The tags are explained down below |
There are a few different special tags included in this dataset, to distinguish the context from the cited documents. These are the following:
Tag | Meaning |
---|---|
<title> | The title of the cited paper |
<abs> | In the related work dataset this is the abstract of the target paper and in the survey generation dataset this is the abstract of the cited paper |
<ctx_b> | The context before the target text |
<ctx_a> | The context after the target text |
The CiteBench consists of 4 different datasets. These are structured a bit differently from each other. The benchmark.zip
contains the dataset which only consists of the parts that are present in all the datasets, e.g. the input abstracts and target. The benchmark_with_context.zip
also includes the additional context that exists for some of the datasets.
An example from the Lu et al dataset (the inputs are shorted to reduce the space needed):
{
"target": "Within the MAS community, some work [1] has focused on how artificial AI-based learning agents would fare in communities of similar agents. For example, [2] and [3] show how agents can learn the capabilities of others via repeated interactions, but these agents do not learn to predict what actions other might take. Most of the work in MAS also fails to recognize the possible gains from using explicit agent models to predict agent actions. [0] is an exception and gives another approach for using nested agent models. However, they do not go so far as to try to quantify the advantages of their nested models or show how these could be learned via observations. We believe that our research will bring to the foreground some of the common observations seen in these research areas and help to clarify the implications and utility of learning and using nested agent models.",
"input": [
"<abs> We present our approach to the problem of how an agent,... agent populations influence system behavior. </abs>",
"In multi-agent environments, ... The article presents experimental results illustrating the agents' dynamic behavior.",
"I. Introduction, ... simultaneously reduces the gap between theory and practice.",
""
]
}