Drop Support for Expected Results and Scores without Task-Definition Files? #439

PhilippWendler · 2019-07-12T10:22:56Z

Currently, BenchExec has two modes for checking expected results and computing (SV-COMP) scores:

If task-definition files are used, arbitrary property files can be used (since BenchExec 1.17).
Without task-definition files a specific set of properties (mostly from SV-COMP) can be used and the expected result needs to be encoded in the file name.

The second mode is historical and no longer recommended. The code for it in BenchExec complicates the result handling and it would simplify things if we could remove it.
Replicating old experiments (e.g., old SV-COMP instances) would get a little bit more difficult, but would still be possible (one could simply generate task-definition files for tasks in the old format). Furthermore, SV-COMP switched to task-definition files in 2019, and if we remove the legacy mode in 2020, already two instances of SV-COMP with the new format will have had taken place.

To clarify: Tasks could still be defined without task-definition files, BenchExec would just no longer check the correctness of the result nor compute scores.

Please comment here if you still have a use case for the historical mode and it would create problems for you to migrate.

table-generator reads task-definition files, but this is not tested yet. Furthermore, after #439 this test will be the only one where table-generator generates the full set of statistics (per category).

dbeyer · 2020-06-10T07:22:38Z

Question: For replicating old experiments, I could just use an old release of BenchExec without trouble, correct?

PhilippWendler · 2020-06-10T07:30:08Z

Several possibilities:

use old version of BenchExec (but of course you will miss all the nice improvements of newer BenchExec, so I would not recommend that)
just run with new version of BenchExec, you will only miss checking for expected results and score computation (during replication, I guess the main goal would be to compare against the old results anyway)
generate task-definition files for the old tasks with the script that we provide and just run the experiment with these (this won't affect how the tool sees the tasks, so the results will still be comparable)

dbeyer · 2020-06-10T07:34:50Z

I am in favor of dropping support for expected results in file names.

PhilippWendler · 2020-06-10T07:35:29Z

This was already decided a few months ago :-)

dbeyer · 2020-06-10T07:36:42Z

But not documented!

dbeyer · 2020-06-10T07:39:07Z

... and the issue was not closed.

PhilippWendler · 2020-06-10T07:39:37Z

Because the implementation is not finished yet. We will close it when this is done by labeling the commit as usually.

dbeyer · 2020-06-10T07:41:48Z

ok

…nerator Part of #439. For tasks that are not defined with yaml files, table-generator so far parses the file name to detect the expected verdict. This commit removes this. The result is that for tables with such tasks, there are no numbers of true/false tasks, and no statistics for correct/wrong results.

Part of #439. This means that all expected verdicts that are encoded in file names (e.g., "_true-unreach-call") are now ignored.

PhilippWendler added this to the Release 3.0 milestone Jul 12, 2019

PhilippWendler added the survey - PLEASE RESPOND label Jul 12, 2019

PhilippWendler pinned this issue Jul 12, 2019

PhilippWendler mentioned this issue Nov 16, 2019

Remove hard-coded SV-COMP properties from BenchExec? #508

Closed

PhilippWendler mentioned this issue Jan 27, 2020

Add Support for the new reach_error-Function of SV-Benchmarks #546

Closed

PhilippWendler unpinned this issue May 8, 2020

dbeyer removed the survey - PLEASE RESPOND label Jun 10, 2020

dbeyer closed this as completed Jun 10, 2020

PhilippWendler reopened this Jun 10, 2020

PhilippWendler added a commit that referenced this issue Jul 3, 2020

Drop support for expected verdicts of old-style tasks from benchexec

3e41a70

Part of #439. This means that all expected verdicts that are encoded in file names (e.g., "_true-unreach-call") are now ignored.

PhilippWendler closed this as completed Jul 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop Support for Expected Results and Scores without Task-Definition Files? #439

Drop Support for Expected Results and Scores without Task-Definition Files? #439

PhilippWendler commented Jul 12, 2019

dbeyer commented Jun 10, 2020

PhilippWendler commented Jun 10, 2020

dbeyer commented Jun 10, 2020

PhilippWendler commented Jun 10, 2020

dbeyer commented Jun 10, 2020

dbeyer commented Jun 10, 2020

PhilippWendler commented Jun 10, 2020

dbeyer commented Jun 10, 2020

Drop Support for Expected Results and Scores without Task-Definition Files? #439

Drop Support for Expected Results and Scores without Task-Definition Files? #439

Comments

PhilippWendler commented Jul 12, 2019

dbeyer commented Jun 10, 2020

PhilippWendler commented Jun 10, 2020

dbeyer commented Jun 10, 2020

PhilippWendler commented Jun 10, 2020

dbeyer commented Jun 10, 2020

dbeyer commented Jun 10, 2020

PhilippWendler commented Jun 10, 2020

dbeyer commented Jun 10, 2020