Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #696
This PR shall consist of two steps:
SPDX-SnippetCopyrightText
tag)On 1: Based on the results of the performance comparison, I implemented a logic that first scans the file for a string (as bytes) (
SPDX-SnippetBegin
), and resets the reading buffer. If a match is found, the_HEADER_SIZE
limit (currently 4096 bytes) is unset so the full file is fed intoextract_spdx_info(decoded_text_from_binary())
.Amazingly, the performance ramifications are minimal! On multiple runs against the Linux kernel, the negative effect is ~0.1s or ~1.3%). This is what I would consider an acceptable performance loss, especially in comparison with the 140% loss we'd have if we just removed the limit.
On 2: Treat
SPDX-SnippetCopyrightText
the same as other (file-level) copyright tags. Of course it's not the same but for REUSE's purpose it should suffice.However, we could check whether a snippet has licensing and copyright info. That may be a lot trickier as we would have to understand the boundaries of a snippet, and treat it isolatedly. I'm not sure whether it's worth to invest time into this now, or later. Interested in others' opinions.