Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reports should show the license scores for findings #5128

Closed
PatteSI opened this issue Mar 3, 2022 · 6 comments
Closed

Reports should show the license scores for findings #5128

PatteSI opened this issue Mar 3, 2022 · 6 comments
Labels
enhancement Issues that are considered to be enhancements reporter About the reporter tool

Comments

@PatteSI
Copy link

PatteSI commented Mar 3, 2022

As mentioned by @sschuberth it would be nice if the user of ORT could also see the scores produced by scan code.
After the discussion in the ScanCode project about false positives we realized that there is no way that the ORT user can see or use the this value in any way.
For example it should be possible for the user to set the min. value for the license score if a license finding is below that value it should only be shown as hint in the reports.
This will be very helpful for teams that have low risks of license violations but want to reduce the amount of false positives.

@sschuberth
Copy link
Member

As mentioned by @sschuberth

But I also mentioned that this only makes sense if the ScanCode score was in any way meaningful. Which in my experience it isn't. Even for one-liners that contain the words "modified by the user" ScanCode is 100% confident that it's a proprietary-license. As long as this is the case, filtering by score doesn't help us.

@mnonnenmacher also told me that @pombredanne told him that license scores in general are snake oil. Maybe he can share here why he thinks so... probably due to unclear semantics.

@sschuberth sschuberth added the scanner About the scanner tool label Mar 3, 2022
@PatteSI
Copy link
Author

PatteSI commented Mar 3, 2022

I hope the scoring will be improved soon and I saw a several examples in the mentioned thread of false positives where the score is way lower than 100% . It doesn't hurt to just offer this functionality in the future and let the end user decide on how to deal with it. But I agree it would make more sense if license scoring would be better documented and generally work better.

@pombredanne
Copy link
Contributor

pombredanne commented Mar 4, 2022

But I also mentioned that this only makes sense if the ScanCode score was in any way meaningful. Which in my experience it isn't. Even for one-liners that contain the words "modified by the user" ScanCode is 100% confident that it's a proprietary-license. As long as this is the case, filtering by score doesn't help us.

@sschuberth FWIW there is no one liner with "modified by the user" text in the license detection rules. But there are other rules that are weak license clues alright and if their relevance is 100% (and therefore possibly their score) then this is a bug to fix.

If the license score is not meaningful in your case, please report issues as this would be a serious bug to fix.

told him that license scores in general are snake oil.

I doubt that I ever said such thing :] ... the license score may not be perfect yet it has been designed carefully and tuned to represent the quality of a license match and is based on:

  • how relevant is the matched license text or notice? This is a number between 0 and 100 which is based on the length of the matched rule and can be overridden for some license detection rules. For instance a short "GPLv3" rule has a relevance set manually to 100 because this is a 100% relevant and totally unambiguous GPL-3.0 license reference.

  • how long is the matched license text or notice? This is a number of matched words and is turned into a "coverage" ratio of: number of matched words / number of total words. We use two coverage ratios:

    • the "qcoverage" or query-side coverage that represents the number of matched words wrt. the number of words in the matched text region in the queried text,
    • and the "icoverage" or index-side coverage that represents the number of matched words wrt. the number of words in the matched indexed license rule text,

See these for details:

@pombredanne
Copy link
Contributor

Note that we are also working on a closely related scoring improvement which is the overall clarity of a package licensing. See aboutcode-org/scancode-toolkit#2861 for the design by @DennisClark and aboutcode-org/scancode-toolkit#2875 for the implementation by @JonoYang that is about to be merged in the develop branch.

This scoring is designed to determine if the licensing of a package is clearly documented or presents ambiguities.

@sschuberth
Copy link
Member

See #5131 as a first step. However, the score is not displayed to the end user anywhere yet.

@sschuberth sschuberth added enhancement Issues that are considered to be enhancements reporter About the reporter tool and removed scanner About the scanner tool labels Dec 19, 2022
@sschuberth sschuberth changed the title ORT should show the license score from ScanCode Reports should show the license scores for findings Feb 8, 2023
@sschuberth
Copy link
Member

Closed as part of backlog grooming. Feel free to comment if you would like to contribute to this.

@sschuberth sschuberth closed this as not planned Won't fix, can't repro, duplicate, stale Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issues that are considered to be enhancements reporter About the reporter tool
Projects
None yet
Development

No branches or pull requests

3 participants