Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Automatic validation of solution's subtask status #34

Closed
edomora97 opened this issue Feb 27, 2022 · 3 comments
Closed

RFC: Automatic validation of solution's subtask status #34

edomora97 opened this issue Feb 27, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@edomora97
Copy link
Collaborator

edomora97 commented Feb 27, 2022

I want to introduce in the italian_yaml format the association solution-subtask-status. This will allow task-maker to:

  • Automatically check that the solutions takes the points they deserve
  • Automatically compute a range for the task's time-limit

The status of a testcase can be one of the following:

  • Accepted (AC)
  • Wrong Answer (WA) (includes partial score, i.e. score < 1.0)
  • Time Limit Exceeded (TLE)
  • Memory Limit Exceeded (MLE)
  • Runtime Error (RE)

The status of a subtask is the set of statuses of its testcases.

There are many possibile ways to store this association:

  • A: Give a name to the subtasks (e.g. examples, quadratic, line, ...), and inside each solution list the sets of outcomes for the subtasks
  • B: List the solution names, for each subtask, for each status, inside the GEN
  • C: Create an extra file with this explicit association (e.g. a JSON file)

Solution A: Name the subtasks, tag the solutions

Inside gen/GEN, after the definition of a subtask, there is the name of the subtask with a STNAME directive. Inside each solution there are comments that are parsed by task-maker to extract the set of subtasks and statuses the solution should get.

Example

~~ gen/GEN ~~

#ST: 0
#STNAME: examples
#COPY: testo/test.input0.txt
#COPY: testo/test.input1.txt

#ST: 20
#STNAME: exponential
10 1000 1
10 1000 2

#ST: 50
#STNAME: quadratic
1000 1000 3
1000 1000 4

#ST: 30
#STNAME: linear
100000 1000 5
100000 1000 6
~~ sol/quadratic.cpp ~~

/*
 * @check-accepted: examples exponential quadratic
 * @check-time-limit: linear
 */
#include <iostream>
using namespace std;
int main() {
    // [...]
}
~~ sol/greedy.py ~~

# @check-wrong-answer: *
# @check-time-limit: linear

N = int(input())
# [...]

Notes:

  • @check-accepted = all the testcases of the subtask are AC
  • @check-time-limit = at least one testcase of the subtask is TLE (same for the other non-AC statuses)

Pros:

  • Solutions can be renamed without issues
  • Wildcards can be used to target all the subtasks
  • Subtasks can be reordered easily
  • GEN is not clogged of metadata and it's used only for testcase generation
  • Tags in the solution may work as documentation for the solution's complexity

Cons:

  • Renaming subtasks can be hard
  • Adding a subtask may require changing all the solution headers
  • Old task-maker versions don't recognize #STNAME and fail to parse gen/GEN

Open questions:

  • The set of subtasks is to be intended as "exact set" (those, and only those subtasks must have that status), or as "loose set" (all those subtasks must have that status, but maybe others)?
  • Format and name of #STNAME: name?
  • Format and names of @check-accepted/@check-...?

Solution B: List solution names inside gen/GEN

After the definition of a subtask in gen/GEN, list the solutions that should get a given status.

Example

~~ gen/GEN ~~

#ST: 0
#AC: soluzione.cpp quadratica.cpp
#WA: greedy.py
#COPY: testo/test.input0.txt
#COPY: testo/test.input1.txt

#ST: 20
#AC: soluzione.cpp quadratica.cpp
#WA: greedy.py
10 1000 1
10 1000 2

#ST: 50
#AC: soluzione.cpp quadratica.cpp
#WA: greedy.py
1000 1000 3
1000 1000 4

#ST: 30
#AC: soluzione.cpp
#WA: greedy.py
#TLE: greedy.py quadratica.cpp
100000 1000 5
100000 1000 6

Pros:

  • Subtasks can be reordered easily
  • Adding a subtask does not require changes in the solutions

Cons:

  • Renaming solutions can be hard
  • Wildcard cannot be used
  • GEN may be filled of solution names in long lines (which are hard to git-merge)
  • Hard to check if a solution has been listed or not
  • Old task-maker versions don't recognize #AC/#WA/... and fail to parse gen/GEN
  • GEN is used for both testcase generation and solution validation

Open questions:

  • #AC, #TLE, #WA, #MLE, #RE or something more verbose?
  • Spaces in the solution name? (in oii and ois repo there are no such solutions)

Solution C: Additional file

Write the association solution-subtask-status in a new file.

Pros:

  • Wildcards can be used
  • Adding a subtask does not require changes in the solutions
  • GEN is not clogged of metadata and it's used only for testcase generation
  • Changes to names can be applied with a find-and-replace within a single file

Cons:

  • Subtasks still have to be named to be referenced in this file
  • Since solution names have to be listed, it's hard to rename solutions
  • New file format to learn (and hate)

Open questions:

  • Format of this file?
@edomora97 edomora97 added the enhancement New feature or request label Feb 27, 2022
@lucach
Copy link

lucach commented Feb 28, 2022

I'd cast a vote for proposal A, possibly with an implicit-AC semantics.

If a solution is expected to be accepted on all subtasks, unless stated otherwise, adding a new one potentially requires only to list its name in the solutions it affects.

In any case, even without this, I think that the costs are negligible compared to the benefits this proposal would bring.

@edomora97
Copy link
Collaborator Author

Thanks Luca, and thanks also to all the others that spent some time answering my questions!

I'll recap here the various opinions I received:

A B C
Alessandro ✅²
Dario O.
Dario P. ✅² ❌⁸
Edoardo ❌¹
Federico ✅†
Giorgio
Luca C. ✅³
Luca V. ✅⁴
Marco ❌¹
Tommaso ✅⁵
William ✅⁶ ✅⁶⁷
8 4 4

Feedback received:

  1. gen/GEN is for testcase generation, with B it would also do solution's business
  2. The cleanest among the three
  3. possibly with an implicit-AC semantics
  4. Giving a name to the subtask can be also useful in the future
  5. Having everything in a single file can be handy
  6. Almost never happens to rename a solution, especially if we introduce a standard naming convention (that may be enforced)
  7. The format may be JSON with a schema that validates its content (and provide IntelliSense on vscode)
  8. If we want to extend the format supporting also "the solution should take at least X points on this subtask" (for partially scored tasks), B would make the gen/GEN even messier

† Federico was kind enough to propose a format for C:

# Multiline comment with the description of the subtasks taken from gen/GEN and populated automatically
# e.g.
# ST 1: Examples (0 points)
# ST 2: N <= 100 (10 points)

                | ST 1 | ST 2 | ST 3  |
soluzione.cpp   | AC   | AC   | AC    |
greedy.py       | ?    | WA   | WA TL |
quadratica.cpp  | AC   | AC   | TL    |
new.cpp         | ?    | ?    | ?     |

This file is kept up-to-date with some tools provided by task-maker:

  • One that builds from scratch this file
  • One that extracts from the gen/GEN the information about the subtasks, and adds the solutions that are missing

If this file is broken (wrong format, not existing solution, invalid subtasks, ...) task-maker should not fail, but just warn.

This somewhat implies that the subtasks have to be named, for example using a #STDESCRIPTION: $N \le 100$. This also would enable us to generate the "Subtasks" section of the statement entirely from the gen/GEN.

@edomora97
Copy link
Collaborator Author

edomora97 commented Mar 1, 2022

It seems that Solution A has a lot more consensus than the other options. We will try to stick with A!

Here it is a somewhat formal specification of the format.

Definition of subtask's name

The grammar of the gen/GEN file (ref) is extended with the following syntax:

subtask_name = { "#STNAME:" ~ whitespace* ~ word ~ whitespace* }

Examples:

#STNAME: linear
#STNAME:without_space
#STNAME: with_comment  # comment

The subtask name must:

  • Be a valid UTF-8 XID identifier (according to https://unicode.org/reports/tr31/ with NFCK normalization)
  • Be at least 1 character long
  • Not contain any whitespace character or asterisk (*)
  • Not start with a dash (-)

Definition of solution checks

Inside each solution there could be lines defining the checks to perform on the results of that solution. The checks may be added inside comment lines. The checks have a format that is based on the following regex:

^
.*           # ignore any prefix (allowing to be in a comment of any language)
@check-      # signal the start of a check
(?P<result>accepted|wrong-answer|time-limit-exceeded|memory-limit-exceeded|runtime-error)
:
(?P<subtasks>
  (?:
    [\t ]*?  # spaces between subtask names
    [^\s]+?  # subtask name
  )*?        # allow a check without any subtask listed
)
[\t ]*?      # ignore spaces after the last subtask
$

Examples:

/*
 * // Check that the solution gets AC in examples and quadratic
 * @check-accepted: examples quadratic
 * // Checks may be repeated
 * @check-accepted: tree
 * @check-time-limit-exceeded: linear
 * // The same subtask can appear in many checks
 * @check-runtime-error: linear
 * // Spaces can be used as you wish (e.g. for alignment)
 * @check-runtime-error:crazy   with    spaces
 */
# // Asterisks can be used as wildcards
# @check-wrong-answer: *
# // Empty checks are also allowed
# @check-run-time-error:

The list of subtask names is split by whitespaces (spaces and tabs), and each subtask name is matched against the list of subtasks defined in the gen/GEN file. Asterisks in the subtask name in the check will match any sequence of characters when searching the gen/GEN (like a wildcard).

Note that the following format of comments is not supported since the list of subtask names is read until the end of the line:

/* @check-accepted: foo */

Checking logic

Given a solution S, each check is analyzed independently. If the check is @check-accepted, all the testcases of the listed subtasks must get AC. If the check is not @check-accepted, at least one testcase in each subtask must have the given result.

Example Result Logic
@check-accepted: foo bar accepted Check solution S gets AC on all the testcases of foo and bar
@check-wrong-answer: foo bar wrong-answer Check solution S gets WA in at least one testcase in foo and at least one in bar
@check-time-limit-exceeded: foo bar time-limit-exceeded Check solution S gets TLE in at least one testcase in foo and at least one in bar
@check-memory-limit-exceeded: foo bar memory-limit-exceeded Check solution S gets MLE in at least one testcase in foo and at least one in bar
@check-runtime-error: foo bar runtime-error Check solution S gets RE in at least one testcase in foo and at least one in bar

Sanity checks

Missing subtask names

If not all the subtasks have a name, emit a warning.

Solutions with no checks

If all the subtasks have a name, but at least a solution (that is not symlinked in att/, i.e. not a template) has no checks, emit a warning with the list of all such solutions.

Invalid check

Emit an error for all the malformed checks in the solutions. A malformed check is a line that contains @check-, but does not match the check regex.

Invalid subtask name

Emit an error for all the checks with a non-existent (or invalid, or non-matching) subtask name, including a list of the valid subtask names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants