RFC: Automatic validation of solution's subtask status #34

edomora97 · 2022-02-27T19:51:23Z

I want to introduce in the italian_yaml format the association solution-subtask-status. This will allow task-maker to:

Automatically check that the solutions takes the points they deserve
Automatically compute a range for the task's time-limit

The status of a testcase can be one of the following:

Accepted (AC)
Wrong Answer (WA) (includes partial score, i.e. score < 1.0)
Time Limit Exceeded (TLE)
Memory Limit Exceeded (MLE)
Runtime Error (RE)

The status of a subtask is the set of statuses of its testcases.

There are many possibile ways to store this association:

A: Give a name to the subtasks (e.g. examples, quadratic, line, ...), and inside each solution list the sets of outcomes for the subtasks
B: List the solution names, for each subtask, for each status, inside the GEN
C: Create an extra file with this explicit association (e.g. a JSON file)

Solution A: Name the subtasks, tag the solutions

Inside gen/GEN, after the definition of a subtask, there is the name of the subtask with a STNAME directive. Inside each solution there are comments that are parsed by task-maker to extract the set of subtasks and statuses the solution should get.

Example

~~ gen/GEN ~~

#ST: 0
#STNAME: examples
#COPY: testo/test.input0.txt
#COPY: testo/test.input1.txt

#ST: 20
#STNAME: exponential
10 1000 1
10 1000 2

#ST: 50
#STNAME: quadratic
1000 1000 3
1000 1000 4

#ST: 30
#STNAME: linear
100000 1000 5
100000 1000 6

~~ sol/quadratic.cpp ~~

/*
 * @check-accepted: examples exponential quadratic
 * @check-time-limit: linear
 */
#include <iostream>
using namespace std;
int main() {
    // [...]
}

~~ sol/greedy.py ~~

# @check-wrong-answer: *
# @check-time-limit: linear

N = int(input())
# [...]

Notes:

@check-accepted = all the testcases of the subtask are AC
@check-time-limit = at least one testcase of the subtask is TLE (same for the other non-AC statuses)

Pros:

Solutions can be renamed without issues
Wildcards can be used to target all the subtasks
Subtasks can be reordered easily
GEN is not clogged of metadata and it's used only for testcase generation
Tags in the solution may work as documentation for the solution's complexity

Cons:

Renaming subtasks can be hard
Adding a subtask may require changing all the solution headers
Old task-maker versions don't recognize #STNAME and fail to parse gen/GEN

Open questions:

The set of subtasks is to be intended as "exact set" (those, and only those subtasks must have that status), or as "loose set" (all those subtasks must have that status, but maybe others)?
Format and name of #STNAME: name?
Format and names of @check-accepted/@check-...?

Solution B: List solution names inside `gen/GEN`

After the definition of a subtask in gen/GEN, list the solutions that should get a given status.

Example

~~ gen/GEN ~~

#ST: 0
#AC: soluzione.cpp quadratica.cpp
#WA: greedy.py
#COPY: testo/test.input0.txt
#COPY: testo/test.input1.txt

#ST: 20
#AC: soluzione.cpp quadratica.cpp
#WA: greedy.py
10 1000 1
10 1000 2

#ST: 50
#AC: soluzione.cpp quadratica.cpp
#WA: greedy.py
1000 1000 3
1000 1000 4

#ST: 30
#AC: soluzione.cpp
#WA: greedy.py
#TLE: greedy.py quadratica.cpp
100000 1000 5
100000 1000 6

Pros:

Subtasks can be reordered easily
Adding a subtask does not require changes in the solutions

Cons:

Renaming solutions can be hard
Wildcard cannot be used
GEN may be filled of solution names in long lines (which are hard to git-merge)
Hard to check if a solution has been listed or not
Old task-maker versions don't recognize #AC/#WA/... and fail to parse gen/GEN
GEN is used for both testcase generation and solution validation

Open questions:

#AC, #TLE, #WA, #MLE, #RE or something more verbose?
Spaces in the solution name? (in oii and ois repo there are no such solutions)

Solution C: Additional file

Write the association solution-subtask-status in a new file.

Pros:

Wildcards can be used
Adding a subtask does not require changes in the solutions
GEN is not clogged of metadata and it's used only for testcase generation
Changes to names can be applied with a find-and-replace within a single file

Cons:

Subtasks still have to be named to be referenced in this file
Since solution names have to be listed, it's hard to rename solutions
New file format to learn (and hate)

Open questions:

Format of this file?

The text was updated successfully, but these errors were encountered:

lucach · 2022-02-28T21:19:10Z

I'd cast a vote for proposal A, possibly with an implicit-AC semantics.

If a solution is expected to be accepted on all subtasks, unless stated otherwise, adding a new one potentially requires only to list its name in the solutions it affects.

In any case, even without this, I think that the costs are negligible compared to the benefits this proposal would bring.

edomora97 · 2022-03-01T10:10:09Z

Thanks Luca, and thanks also to all the others that spent some time answering my questions!

I'll recap here the various opinions I received:

	A	B	C
Alessandro	✅²	❌	❌
Dario O.	✅	✅	❌
Dario P.	✅²	❌⁸	❌
Edoardo	✅	❌¹	✅
Federico	❌	❌	✅†
Giorgio	✅	❌	❌
Luca C.	✅³	❌	❌
Luca V.	✅⁴	✅	❌
Marco	✅	❌¹	✅
Tommaso	❌	✅⁵	❌
William	❌	✅⁶	✅⁶⁷
	8	4	4

Feedback received:

gen/GEN is for testcase generation, with B it would also do solution's business
The cleanest among the three
possibly with an implicit-AC semantics
Giving a name to the subtask can be also useful in the future
Having everything in a single file can be handy
Almost never happens to rename a solution, especially if we introduce a standard naming convention (that may be enforced)
The format may be JSON with a schema that validates its content (and provide IntelliSense on vscode)
If we want to extend the format supporting also "the solution should take at least X points on this subtask" (for partially scored tasks), B would make the gen/GEN even messier

† Federico was kind enough to propose a format for C:

# Multiline comment with the description of the subtasks taken from gen/GEN and populated automatically
# e.g.
# ST 1: Examples (0 points)
# ST 2: N <= 100 (10 points)

                | ST 1 | ST 2 | ST 3  |
soluzione.cpp   | AC   | AC   | AC    |
greedy.py       | ?    | WA   | WA TL |
quadratica.cpp  | AC   | AC   | TL    |
new.cpp         | ?    | ?    | ?     |

This file is kept up-to-date with some tools provided by task-maker:

One that builds from scratch this file
One that extracts from the gen/GEN the information about the subtasks, and adds the solutions that are missing

If this file is broken (wrong format, not existing solution, invalid subtasks, ...) task-maker should not fail, but just warn.

This somewhat implies that the subtasks have to be named, for example using a #STDESCRIPTION: $N \le 100$. This also would enable us to generate the "Subtasks" section of the statement entirely from the gen/GEN.

edomora97 · 2022-03-01T20:42:42Z

It seems that Solution A has a lot more consensus than the other options. We will try to stick with A!

Here it is a somewhat formal specification of the format.

Definition of subtask's name

The grammar of the gen/GEN file (ref) is extended with the following syntax:

subtask_name = { "#STNAME:" ~ whitespace* ~ word ~ whitespace* }

Examples:

#STNAME: linear
#STNAME:without_space
#STNAME: with_comment  # comment

The subtask name must:

Be a valid UTF-8 XID identifier (according to https://unicode.org/reports/tr31/ with NFCK normalization)
Be at least 1 character long
Not contain any whitespace character or asterisk (*)
Not start with a dash (-)

Definition of solution checks

Inside each solution there could be lines defining the checks to perform on the results of that solution. The checks may be added inside comment lines. The checks have a format that is based on the following regex:

^
.*           # ignore any prefix (allowing to be in a comment of any language)
@check-      # signal the start of a check
(?P<result>accepted|wrong-answer|time-limit-exceeded|memory-limit-exceeded|runtime-error)
:
(?P<subtasks>
  (?:
    [\t ]*?  # spaces between subtask names
    [^\s]+?  # subtask name
  )*?        # allow a check without any subtask listed
)
[\t ]*?      # ignore spaces after the last subtask
$

Examples:

/*
 * // Check that the solution gets AC in examples and quadratic
 * @check-accepted: examples quadratic
 * // Checks may be repeated
 * @check-accepted: tree
 * @check-time-limit-exceeded: linear
 * // The same subtask can appear in many checks
 * @check-runtime-error: linear
 * // Spaces can be used as you wish (e.g. for alignment)
 * @check-runtime-error:crazy   with    spaces
 */

# // Asterisks can be used as wildcards
# @check-wrong-answer: *
# // Empty checks are also allowed
# @check-run-time-error:

The list of subtask names is split by whitespaces (spaces and tabs), and each subtask name is matched against the list of subtasks defined in the gen/GEN file. Asterisks in the subtask name in the check will match any sequence of characters when searching the gen/GEN (like a wildcard).

Note that the following format of comments is not supported since the list of subtask names is read until the end of the line:

/* @check-accepted: foo */

Checking logic

Given a solution S, each check is analyzed independently. If the check is @check-accepted, all the testcases of the listed subtasks must get AC. If the check is not @check-accepted, at least one testcase in each subtask must have the given result.

Example	Result	Logic
`@check-accepted: foo bar`	`accepted`	Check solution S gets AC on all the testcases of `foo` and `bar`
`@check-wrong-answer: foo bar`	`wrong-answer`	Check solution S gets WA in at least one testcase in `foo` and at least one in `bar`
`@check-time-limit-exceeded: foo bar`	`time-limit-exceeded`	Check solution S gets TLE in at least one testcase in `foo` and at least one in `bar`
`@check-memory-limit-exceeded: foo bar`	`memory-limit-exceeded`	Check solution S gets MLE in at least one testcase in `foo` and at least one in `bar`
`@check-runtime-error: foo bar`	`runtime-error`	Check solution S gets RE in at least one testcase in `foo` and at least one in `bar`

Sanity checks

Missing subtask names

If not all the subtasks have a name, emit a warning.

Solutions with no checks

If all the subtasks have a name, but at least a solution (that is not symlinked in att/, i.e. not a template) has no checks, emit a warning with the list of all such solutions.

Invalid check

Emit an error for all the malformed checks in the solutions. A malformed check is a line that contains @check-, but does not match the check regex.

Invalid subtask name

Emit an error for all the checks with a non-existent (or invalid, or non-matching) subtask name, including a list of the valid subtask names.

edomora97 added the enhancement New feature or request label Feb 27, 2022

edomora97 closed this as completed in 31d6a13 Mar 3, 2022

edomora97 mentioned this issue May 15, 2022

Tool for auto-tuning time limits #55

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Automatic validation of solution's subtask status #34

RFC: Automatic validation of solution's subtask status #34

edomora97 commented Feb 27, 2022 •

edited

Loading

lucach commented Feb 28, 2022

edomora97 commented Mar 1, 2022

edomora97 commented Mar 1, 2022 •

edited

Loading

RFC: Automatic validation of solution's subtask status #34

RFC: Automatic validation of solution's subtask status #34

Comments

edomora97 commented Feb 27, 2022 • edited Loading

Solution A: Name the subtasks, tag the solutions

Solution B: List solution names inside gen/GEN

Solution C: Additional file

lucach commented Feb 28, 2022

edomora97 commented Mar 1, 2022

edomora97 commented Mar 1, 2022 • edited Loading

Definition of subtask's name

Definition of solution checks

Checking logic

Sanity checks

edomora97 commented Feb 27, 2022 •

edited

Loading

Solution B: List solution names inside `gen/GEN`

edomora97 commented Mar 1, 2022 •

edited

Loading