Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop Support for Expected Results and Scores without Task-Definition Files? #439

Closed
PhilippWendler opened this issue Jul 12, 2019 · 8 comments
Milestone

Comments

@PhilippWendler
Copy link
Member

Currently, BenchExec has two modes for checking expected results and computing (SV-COMP) scores:

  1. If task-definition files are used, arbitrary property files can be used (since BenchExec 1.17).
  2. Without task-definition files a specific set of properties (mostly from SV-COMP) can be used and the expected result needs to be encoded in the file name.

The second mode is historical and no longer recommended. The code for it in BenchExec complicates the result handling and it would simplify things if we could remove it.
Replicating old experiments (e.g., old SV-COMP instances) would get a little bit more difficult, but would still be possible (one could simply generate task-definition files for tasks in the old format). Furthermore, SV-COMP switched to task-definition files in 2019, and if we remove the legacy mode in 2020, already two instances of SV-COMP with the new format will have had taken place.

To clarify: Tasks could still be defined without task-definition files, BenchExec would just no longer check the correctness of the result nor compute scores.

Please comment here if you still have a use case for the historical mode and it would create problems for you to migrate.

@PhilippWendler PhilippWendler added this to the Release 3.0 milestone Jul 12, 2019
@PhilippWendler PhilippWendler pinned this issue Jul 12, 2019
PhilippWendler added a commit that referenced this issue Jan 13, 2020
table-generator reads task-definition files, but this is not tested yet.
Furthermore, after #439 this test will be the only one where
table-generator generates the full set of statistics (per category).
@PhilippWendler PhilippWendler unpinned this issue May 8, 2020
@dbeyer
Copy link
Member

dbeyer commented Jun 10, 2020

Question: For replicating old experiments, I could just use an old release of BenchExec without trouble, correct?

@PhilippWendler
Copy link
Member Author

Several possibilities:

  • use old version of BenchExec (but of course you will miss all the nice improvements of newer BenchExec, so I would not recommend that)
  • just run with new version of BenchExec, you will only miss checking for expected results and score computation (during replication, I guess the main goal would be to compare against the old results anyway)
  • generate task-definition files for the old tasks with the script that we provide and just run the experiment with these (this won't affect how the tool sees the tasks, so the results will still be comparable)

@dbeyer
Copy link
Member

dbeyer commented Jun 10, 2020

I am in favor of dropping support for expected results in file names.

@PhilippWendler
Copy link
Member Author

This was already decided a few months ago :-)

@dbeyer
Copy link
Member

dbeyer commented Jun 10, 2020

But not documented!

@dbeyer
Copy link
Member

dbeyer commented Jun 10, 2020

... and the issue was not closed.

@dbeyer dbeyer closed this as completed Jun 10, 2020
@PhilippWendler
Copy link
Member Author

Because the implementation is not finished yet. We will close it when this is done by labeling the commit as usually.

@dbeyer
Copy link
Member

dbeyer commented Jun 10, 2020

ok

PhilippWendler added a commit that referenced this issue Jul 3, 2020
…nerator

Part of #439.
For tasks that are not defined with yaml files,
table-generator so far parses the file name to detect the expected
verdict. This commit removes this.
The result is that for tables with such tasks,
there are no numbers of true/false tasks,
and no statistics for correct/wrong results.
PhilippWendler added a commit that referenced this issue Jul 3, 2020
Part of #439.

This means that all expected verdicts that are encoded in file names
(e.g., "_true-unreach-call") are now ignored.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants