Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect SPDX Snippets #699

Merged
merged 4 commits into from
Apr 6, 2023
Merged

Detect SPDX Snippets #699

merged 4 commits into from
Apr 6, 2023

Conversation

mxmehl
Copy link
Member

@mxmehl mxmehl commented Mar 10, 2023

Fixes #696

This PR shall consist of two steps:

  1. Allow that the full file is analysed for license/copyright info if we have indicators that they contain a SPDX snippet
  2. Actually understand the licensing/copyright info in a snippet (specifically the SPDX-SnippetCopyrightText tag)

On 1: Based on the results of the performance comparison, I implemented a logic that first scans the file for a string (as bytes) (SPDX-SnippetBegin), and resets the reading buffer. If a match is found, the _HEADER_SIZE limit (currently 4096 bytes) is unset so the full file is fed into extract_spdx_info(decoded_text_from_binary()).

Amazingly, the performance ramifications are minimal! On multiple runs against the Linux kernel, the negative effect is ~0.1s or ~1.3%). This is what I would consider an acceptable performance loss, especially in comparison with the 140% loss we'd have if we just removed the limit.

On 2: Treat SPDX-SnippetCopyrightText the same as other (file-level) copyright tags. Of course it's not the same but for REUSE's purpose it should suffice.

However, we could check whether a snippet has licensing and copyright info. That may be a lot trickier as we would have to understand the boundaries of a snippet, and treat it isolatedly. I'm not sure whether it's worth to invest time into this now, or later. Interested in others' opinions.

…rching for SPDX info

If such an indicator is found, remove the search limit (currently 4096 bytes)
This treats the tag the same as other file-level copyright tags
@linozen
Copy link
Member

linozen commented Apr 6, 2023

LGTM 👍

@mxmehl
Copy link
Member Author

mxmehl commented Apr 6, 2023

However, we could check whether a snippet has licensing and copyright info. That may be a lot trickier as we would have to understand the boundaries of a snippet, and treat it isolatedly. I'm not sure whether it's worth to invest time into this now, or later. Interested in others' opinions.

Linus, Nico and I agreed that, while this would be interesting information, we don't plan to integrate this as of now. It could be a future improvement.

@mxmehl mxmehl marked this pull request as ready for review April 6, 2023 12:08
@mxmehl mxmehl merged commit 8f37460 into main Apr 6, 2023
@mxmehl mxmehl deleted the spdx-snippets branch April 6, 2023 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fully support SPDX snippets
2 participants