Rethinking the wording for assertion verdicts #945

jugglinmike · 2023-05-17T15:54:52Z

In this project, an assertion is judged in terms of an AT response to produce a "verdict." That verdict can be one of three values. The Working Mode uses one set of words to describe those values, and the app uses a different set of words:

Working Mode term	ARIA-AT App term
supported	good output
not supported	no output
incorrectly supported	incorrect output

This inconsistency can make it difficult to talk about the process and the implementation. Further, half of the terms are susceptible to misinterpretation:

"incorrectly supported" is something of an oxymoron. The word "incorrectly" subverts the meaning of "supported."
"no output" is too expansive. It's meant to describe the case where there is no output related to the assertion under consideration, but new Testers could be forgiven for thinking it only applies where there is no output at all (i.e. when the screen reader is completely silent).
"incorrect output" is likewise overly broad because it describes the "output" as a whole. For instance, given the output "bananas" and the assertion "the role 'button' is conveyed", a new Tester might reasonably say "incorrect output" because that output is incorrect.

I am proposing a new set of terms to be used by both the Working Mode and the App:

Working Mode term	ARIA-AT App term	new term
supported	good output	acceptable
not supported	no output	omitted
incorrectly supported	incorrect output	contradictory

jugglinmike · 2023-06-05T21:43:16Z

The Assistive Technology Automation Subgroup of ARIA-AT Community Group just discussed this issue. Meeting minutes are available on w3.org and also included below.

The full IRC log of that discussion

<Sam_Shaw> TOPIC: w3c/aria-at - #945 - Rethinking the wording for assertion verdicts
<Sam_Shaw> #945
<Sam_Shaw> jugglinmike: The working mode doesn't have the term verdict yet, but its one we intend to add.
<Sam_Shaw> jugglinmike: The working mode refers to verdicts as supported, not supported etc
<Sam_Shaw> jugglinmike: automation refers to the verdicts as acceptable, omitted, contradictory
<Sam_Shaw> jugglinmike: I have a proposal for a new set of terms
<Sam_Shaw> correction: automation refers to the verdicts as good output, no output, incorrect output
<Sam_Shaw> the proposed new terms are acceptable, omitted, contradictory
<Sam_Shaw> js: I like the new terms you proposed. In terms of bubbling up the results, I wonder if no support, partial support, supported is clearer
<Sam_Shaw> MK: Thats why I wanted to use numbers
<Sam_Shaw> MK: Partial support could mean anything between a little support to almost fully supported
<Sam_Shaw> JS: I agree but if something is 90% supported, the remaining 10% could still make it unusable
<Sam_Shaw> MK: I agree, unless we have multiple layers of assertions we don't need numbers. We also want to be diplomatic
<mmoss> present+
<Sam_Shaw> MK: I think for your solution is pretty solid
<Sam_Shaw> MK: We just need to decide if we extend the use of these terms, or bubble them up
<Sam_Shaw> jugglinmike: Yes bubbling up we need to consider, the case where a feature is all supported except one, its not supported. For verdicts that can be in three states, understanding why its partially supported is tough. I'm not sure if bubbling can work if we are looking for a percent score
<Sam_Shaw> MK: Yeah supported needs to be binary
<Sam_Shaw> JS: I think we need all three states
<Sam_Shaw> MK: What do the responses tell us? Either there is some support there or there isn't. Then the reasons is because someone tried, or someone didn't try to support
<Sam_Shaw> MK: If you measuring something using a percentage, then it needs to be binary
<Sam_Shaw> JS: for the reports, are there three levels to two of support?
<Sam_Shaw> MK: Any level of support beyond assertion is a percentage.
<Sam_Shaw> MK: At the AT level, the test level, at the AT level, all will be a percentage
<Sam_Shaw> MK: So we would say, using Mikes terminology, At the assertion level if the response is omitted or contradictory then that counts as a 0. If its acceptable then it counts as a 1.
<Sam_Shaw> MK: We could do other reports we could run that say what percent is contradictory, which percent is omitted
<Sam_Shaw> MK: I don't know that we need to bubble up these terms in the reports we have now
<Sam_Shaw> MK: We don't need terms for working mode, its just level of support
<Sam_Shaw> jugglinmike: I do think the working mode uses supported not supported.
<Sam_Shaw> MK: I can get rid of that
<Sam_Shaw> MK: I have some other issues for the working mode, particularly 950, I think we need to work on another iteration of the working mode and share it with the community
<Sam_Shaw> MK: We could have a binary state for assertions, and get rid of contradictory
<Sam_Shaw> JS: I agree, but we should rewrite the terms
<Sam_Shaw> JS: Lets add this to the agenda for the CG meeting thursday
<Sam_Shaw> jugglinmike: What I'm hearing is, we like the terms I proposed, but we may not need three terms
<Sam_Shaw> JS: It will make the testing easier if we just have two states/terms
<Sam_Shaw> MK: Okay but if this task isn't on the critical path, I want to be conscious of that
<Sam_Shaw> JS: This could speed up the process
<Sam_Shaw> MK: But its not a blocker, we can talk about enhancements in the near future
<Sam_Shaw> Michael Fairchild: Is there a third state where we publish a report with some of the data missing?
<Sam_Shaw> JS: No, really, but we need to consider this.
<Sam_Shaw> JS: If there is a situation where only 50% of test have been completed, what does that look like for a percent supported?
<Sam_Shaw> MK: We made a decision to change the working mode, and to get rid of the three output terms
<Sam_Shaw> MK: The question before we change the UI, is do we go from 3 to 2 states? Acceptable, not, contradictory

jugglinmike · 2023-06-29T20:20:08Z

The Assistive Technology Automation Subgroup of ARIA-AT Community Group just discussed this issue. Meeting minutes are available on w3.org and also included below.

The full IRC log of that discussion

<jugglinmike> Topic: Wording of assertion verdicts
<jugglinmike> Matt_King: Rethinking the wording for assertion verdicts · Issue #945 · w3c/aria-at
<jugglinmike> github: #945
<jugglinmike> Matt_King: In this issue, jugglinmike suggests alternative words for the three assertion verdicts
<jugglinmike> Matt_King: When we discussed this, I talked myself into thinking that there isn't enough of a need to differentiate between kinds of assertion failures
<jugglinmike> Matt_King: And that it would simplify things meaningfully if we just classified assertions as either "passing" or "failing"
<jugglinmike> IsaDC: For the output, we will still know if it's "incorrect" or "no output" by judging from the AT Response
<jugglinmike> Matt_King: Yes, Testers will still report that there was no AT response. Although separately, we need to give Testers a normalize way to describe this (e.g. via a checkbox) rather than inventing their own representation of "no output" each time
<jugglinmike> jugglinmike: But remember that "No output" as a verdict for an specific assertion is sometimes used even when the AT does respond
<jugglinmike> jugglinmike: The "no output" assertion verdict is designed to be used when the output does not include information about the information being asserted
<jugglinmike> Matt_King: But both "no output" and "incorrect output" are both failures, and I don't think tabulating them separately brings enough value to justify the complexity they require
<jugglinmike> Hadi: Are you suggesting that we remove the ability to describe unexpected responses?
<jugglinmike> Matt_King: No, we're keeping that
<jugglinmike> Hadi: I'm concerned that people reviewing the results will not be able to understand why the Tester assigned "fail"
<jugglinmike> Matt_King: The assertions are granular: there is a separate assertion for each ARIA property and attribute. This means implementers will be able to see precisely what aspect of a test failed
<jugglinmike> Hadi: Okay, that sounds good. And as long as we're keeping the ability to describe unexpected output with free text, I am supportive of this change
<jugglinmike> present+ Michael_Fairchild
<jugglinmike> Michael_Fairchild: I support this simplification, as well
<jugglinmike> present+ JoeHumbert
<jugglinmike> JoeHumbert: I do, too. Anything to reduce the number of options that the Testers must choose between
<jugglinmike> Matt_King: Sounds like we're in agreement. We'll have a separate discussion about when we make that change

jugglinmike · 2023-06-29T20:55:40Z

As documented above, we collectively agreed to simplify the assertion verdicts to allow only "pass" and "value." That's a larger change which obviates this issue, so I've opened gh-961 to track it and to allow us to close this.

jugglinmike mentioned this issue Jun 29, 2023

Plan changing assertion verdicts to PASS/FAIL from GOOD/NO/INCORRECT output #961

Closed

3 tasks

jugglinmike closed this as completed Jun 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethinking the wording for assertion verdicts #945

Rethinking the wording for assertion verdicts #945

jugglinmike commented May 17, 2023

jugglinmike commented Jun 5, 2023

jugglinmike commented Jun 29, 2023

jugglinmike commented Jun 29, 2023

Rethinking the wording for assertion verdicts #945

Rethinking the wording for assertion verdicts #945

Comments

jugglinmike commented May 17, 2023

jugglinmike commented Jun 5, 2023

jugglinmike commented Jun 29, 2023

jugglinmike commented Jun 29, 2023