Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gitlab code quality fingerprint limitation: multiple errors with identical message #14732

Open
fin-gal opened this issue Dec 2, 2024 · 1 comment
Labels
great writeup A wonderful example of a quality contribution

Comments

@fin-gal
Copy link

fin-gal commented Dec 2, 2024

Context

I'm implementing the use of code quality in Gitlab for the company I work at.
Code quality relies on json reports generated with a specific format, with each error having some metadata and a fingerprint.
My understanding is that in ruff, the fingerprint is a hash generated from:

  • The relative file path
  • The error message
  • A salt value

Relevant code is here.

The finger print is:

Past changes

In the past, the line number at which the error is picked up used to be included in the hash, but thankfully was removed. If someone added code before an issue the file had, it would generate a new finger print and the code quality widget in Gitlab would show all the issues as fixed and also new, duplicating everything and making the tool unusable.
Reelvant issues/discussions on this subject:
#3996 : initial discussion
#7203 : issue solving the above problem
#7159

Current problem

This is all positive and a clear welcomed improvement. However I see a shortcoming with this approach, that is actually mentioned in the issue referenced above.

Let's assume we have a bunch of errors in a file that all have the same error message, I'll blatantly reuse the example by @MichaReiser in the above linked thread:

Let's say we have two unused variable diagnostics in a file:

x = 1
y = 2

There's a hash collision for x and y, so y performs a second hashing round and we now fix the first violation

x = 1
y = 2
print(x)

There's no longer a hash collision for y, meaning that the diagnostic for y now gets the hash (fingerprint) of the violation that used to be for x.

If I do this in gitlab's code quality, assuming I had a report generated on my target merge branch generated with these two errors, I might end up with a case where:

  • y's unused error is now marked fixed even though it isn't
  • x's unused error is still present and not marked as fixed even though it is.

You can also imagine plenty of other scenarios, where I fix a bunch of errors like the one above, but I introduce other identical errors later in the code, following the same logic, the fixed ones will not be marked as fixed and the new ones won't even be picked up and displayed as a new error.

Specific scenario

To bring this back to my specific scenario, I noticed problems while doing some various tests of the feature where I would add docstrings to functions missing them but then implement new functions without a docstring and would end up with the same error as above. Why is that? Because the missing docstring rule for a public function (D103) doesn't mention the name of the function for example and therefore all missing docstring errors are the same.

I understand the current implementation is still functional and the above examples are maybe fringe for most people but I still wanted to log it as it represents a limitation in our project.
I also wanted to know if anyone has any idea of how this could be improved without re-adding the line number which creates much more problems than the above.

@MichaReiser MichaReiser added the great writeup A wonderful example of a quality contribution label Dec 2, 2024
@MichaReiser
Copy link
Member

Thanks for the great write up.

We're interested in anyone's ideas on approaching this that doesn't rely on line numbers and isn't prone to the above problem. I tried to find some guidance or best practices from git lab but couldn't find any.

Do you use any other linters in your project that doesn't have the limitation you outlined above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
great writeup A wonderful example of a quality contribution
Projects
None yet
Development

No branches or pull requests

2 participants