Gitlab code quality fingerprint limitation: multiple errors with identical message #14732

fin-gal · 2024-12-02T14:18:50Z

Context

I'm implementing the use of code quality in Gitlab for the company I work at.
Code quality relies on json reports generated with a specific format, with each error having some metadata and a fingerprint.
My understanding is that in ruff, the fingerprint is a hash generated from:

The relative file path
The error message
A salt value

Relevant code is here.

The finger print is:

Generated
Checked for an identical finger print in the already generated ones
If the finger print already exists: regenerated by feeding the path, message and current fingerprint as a salt.

Past changes

In the past, the line number at which the error is picked up used to be included in the hash, but thankfully was removed. If someone added code before an issue the file had, it would generate a new finger print and the code quality widget in Gitlab would show all the issues as fixed and also new, duplicating everything and making the tool unusable.
Reelvant issues/discussions on this subject:
#3996 : initial discussion
#7203 : issue solving the above problem
#7159

Current problem

This is all positive and a clear welcomed improvement. However I see a shortcoming with this approach, that is actually mentioned in the issue referenced above.

Let's assume we have a bunch of errors in a file that all have the same error message, I'll blatantly reuse the example by @MichaReiser in the above linked thread:

Let's say we have two unused variable diagnostics in a file:

x = 1
y = 2

There's a hash collision for x and y, so y performs a second hashing round and we now fix the first violation

x = 1
y = 2
print(x)

There's no longer a hash collision for y, meaning that the diagnostic for y now gets the hash (fingerprint) of the violation that used to be for x.

If I do this in gitlab's code quality, assuming I had a report generated on my target merge branch generated with these two errors, I might end up with a case where:

y's unused error is now marked fixed even though it isn't
x's unused error is still present and not marked as fixed even though it is.

You can also imagine plenty of other scenarios, where I fix a bunch of errors like the one above, but I introduce other identical errors later in the code, following the same logic, the fixed ones will not be marked as fixed and the new ones won't even be picked up and displayed as a new error.

Specific scenario

To bring this back to my specific scenario, I noticed problems while doing some various tests of the feature where I would add docstrings to functions missing them but then implement new functions without a docstring and would end up with the same error as above. Why is that? Because the missing docstring rule for a public function (D103) doesn't mention the name of the function for example and therefore all missing docstring errors are the same.

I understand the current implementation is still functional and the above examples are maybe fringe for most people but I still wanted to log it as it represents a limitation in our project.
I also wanted to know if anyone has any idea of how this could be improved without re-adding the line number which creates much more problems than the above.

The text was updated successfully, but these errors were encountered:

MichaReiser · 2024-12-02T14:29:18Z

Thanks for the great write up.

We're interested in anyone's ideas on approaching this that doesn't rely on line numbers and isn't prone to the above problem. I tried to find some guidance or best practices from git lab but couldn't find any.

Do you use any other linters in your project that doesn't have the limitation you outlined above?

MichaReiser added the great writeup A wonderful example of a quality contribution label Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gitlab code quality fingerprint limitation: multiple errors with identical message #14732

Gitlab code quality fingerprint limitation: multiple errors with identical message #14732

fin-gal commented Dec 2, 2024

MichaReiser commented Dec 2, 2024

Gitlab code quality fingerprint limitation: multiple errors with identical message #14732

Gitlab code quality fingerprint limitation: multiple errors with identical message #14732

Comments

fin-gal commented Dec 2, 2024

Context

Past changes

Current problem

Specific scenario

MichaReiser commented Dec 2, 2024