Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated bug report in html parser #3510

Closed
vChavezB opened this issue Nov 18, 2021 · 10 comments · Fixed by #3524
Closed

Duplicated bug report in html parser #3510

vChavezB opened this issue Nov 18, 2021 · 10 comments · Fixed by #3524
Labels
bug 🐛 plist2html 🌏 tools 🛠️ Meta-tag for all the additional tools supplied with CodeChecker: plist2html, tu_collector, etc.

Comments

@vChavezB
Copy link

Describe the bug
I updated a project with CI that uses a docker container with Codechecker v6.18.0 and found out that the HTML parser creates duplicated bugs. The stats show me that v6.16.0 created 47680 severities while for v6.17.0 it creates 58, for the same source files.

I noticed that duplicate notification when calling codechecker parser came from the file parser.py as:

[DEBUG][2021-11-18 12:44:02] {system} [1111] <139743470372672> - parse.py:861 skip_html_report_data_handler() - Skip report because it is a deduplication of an already processed report!
[DEBUG][2021-11-18 12:44:02] {system} [1111] <139743470372672> - parse.py:863 skip_html_report_data_handler() - Path hash: 4ef14c44533cd366c92c216f675b2aa1

and now they come from reports.py as:

[DEBUG][2021-11-18 11:34:20] {report-converter} [689] <140192054974272> - reports.py:118 skip() - Not showing report because it is a deduplication of an already processed report!
[DEBUG][2021-11-18 11:34:20] {report-converter} [689] <140192054974272> - reports.py:120 skip() - Path hash: b4a42a125db8720a6a2d0b5d4bc096ec

So it seems that the parser detects the duplicates but they are still visible in the HTML view.

CodeChecker version
v6.18.0

To Reproduce

  1. Build and trace with CodeChecker log a project which has bugs that can be detected as duplicates. For example, a header that is included in different .cpp files.
  2. Analyze and parse the report:
CodeChecker analyze codechecker.log --output ./codechecker_report --enable extreme --ctu -i skip.file -q
CodeChecker parse --export html --output ./codechecker_html ./codechecker_report

Expected behaviour
Duplicate errors in html generated pages are removed

Desktop (please complete the following information)

  • OS: Docker container Linux debian:buster-slim

Additional context
The project where I found the bug is for an MCU with arm-none-eabi-gcc compiler. At the moment as a temporal solution, I will stick with v6.17.0

@whisperity whisperity added bug 🐛 plist2html 🌏 tools 🛠️ Meta-tag for all the additional tools supplied with CodeChecker: plist2html, tu_collector, etc. labels Nov 26, 2021
@whisperity
Copy link
Contributor

I have just run into this issue and can confirm this is happening. We found this in the CI of Contour-Terminal, where the PList HTML compressed ZIP artefact that we uploaded to the CI suddenly jumped from ~20 MiB in size to 400 MiB. (Now it is at 745 MiB) The CI job we use always runs latest master CodeChecker.

Thanks to the build script of CodeChecker telling me the precise version, so far, I've been able to nail down the diff between these commits to be the culprit: 9f894e3...85dd52d

A minimal example

Thanks to @bruntib for helping in nailing this down.

Input "project"

header.h

int s() { return sizeof(42); }

main.cpp

#include "header.h"

int main() { return s(); }

main2.cpp

Same as main.cpp:

#include "header.h"

int main() { return s(); }

Execution

Running the analysis on the two source files results in the warnings:

/tmp/header.hpp:1:18: warning: suspicious usage of 'sizeof(K)'; did you mean 'K'? [bugprone-sizeof-expression]
int s() { return sizeof(42); }
                 ^

If I run this with the old version (9f894e3), it's fine. With the new version (85dd52d or v6.18.0), every report is converted/shown multiple times.

Instead of showing 1 bug (that is located only in the header!), it creates 4 bugs: main-tidy, main2-tidy, main-clangsa, main2-clangsa. For some reason, the _clangsa HTMLs will contain the Tidy report, even though the _clangsa PLists are empty!

@whisperity
Copy link
Contributor

Subsequent testing boiled the culprit down to 85dd52d.

I've re-run the analysis of Contour as mentioned above to rule out the "system under test" changing (as it definitely did in the [23 .. 18] days ago mentioned), on the same content of the analysed project.

@csordasmarton
Copy link
Contributor

@whisperity I created a patch which I think will solve this problem: #3524. Can you please try it out?

@jimis
Copy link
Contributor

jimis commented Dec 21, 2021

I hit the same bug on HTML report generation and verify that 6.18.1 fixes it.

But could it be that the diff command has the same bug? Check this diff report here that mentions 180 issues, while diffing in the UI shows only 15. It was generated with CodeChecker cmd diff -o html --new -b run1 -n run2 ....

Should we re-open this ticket?

@csordasmarton
Copy link
Contributor

@jimis Did you use remote runs for both --baseline and --newline options (run1 + run2) or one/both of them are local report directories?

@csordasmarton
Copy link
Contributor

Also one more thing. I found a problem couple of days ago that in diff mode bug steps are always 1 for remote reports because we do not get and print these information to the generated HTML files. I created a patch which will solve this problem: #3555. Can you please try it out too?

@jimis
Copy link
Contributor

jimis commented Dec 21, 2021

@csordasmarton both baseline and new are local directories. The baseline was generated with 6.18.0, the new with 6.18.1. The diff report in the UI appears normal.

About #3555 we never use plain text diff output, nor diff against "remote" reports. I'll try to see if I can reproduce the issue easily but no guarantees. If so I'll comment on #3555.

@csordasmarton
Copy link
Contributor

In the source code of the diff command we filter out duplicated reports:

# Skip duplicated reports.
reports = reports_helper.skip(reports, processed_path_hashes)

From the link what you have sent it looks like that the reports are coming from different header/source files (see the url parameters):
These are the first 3 reports:
image

And the links for these reports:

Also if you open a report in the source code section you see the following warning message:
image

Please try to re-analyze your project again.

@jimis
Copy link
Contributor

jimis commented Dec 21, 2021

[first 3 reports point to different links]

On the listing, they are listed multiple times. For example the issue in qringbuffer_p.h is the one listed at position 13. It shouldn't be listed at position 2 with "qthread.h" as title. The diff command that generated this HTML listing shows only 15 issues found:

----======== Summary ========----
----------------------------------------------
Number of processed analyzer result files | 0 
Number of analyzer reports                | 15
----------------------------------------------

Please try to re-analyze your project again.

I'm not sure I follow. Yes it says the source code is missing, I considered this part of the bug. Re-analyze what exactly, the same checkout? And then run the same diff command? Why would that fix the issue?

FWIW the diff command we use has not changed and it used to work in 6.17, and the files (for example src/corelib/thread/qthread.cpp) are found where expected in the file system, under the source directory.

@jimis
Copy link
Contributor

jimis commented Dec 28, 2021

Did you use remote runs for both --baseline and --newline options (run1 + run2) or one/both of them are local report directories?

I think I got this wrong. Even though the directories are there and have the same name as the runs, I believe we are using "remote" comparisons, since the CodeChecker cmd diff command contains the --url argument.

I tried to reproduce my issue with both the 6.18.1 version and the CodeChecker version from #3555, but it was not possible: I get a normal listing of 15 issues now. However I still get "File not found" for the source files and I'm not sure why. Maybe because the files are analysed with --trim and the paths not found are relative, not absolute ones.

The patched version shows one difference though, each issue shows a much deeper path length.

I will try again a few days later, after new issues start appearing again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 plist2html 🌏 tools 🛠️ Meta-tag for all the additional tools supplied with CodeChecker: plist2html, tu_collector, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants