-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid plist filenames being the same #3588
Avoid plist filenames being the same #3588
Conversation
@@ -147,7 +147,7 @@ def _write( | |||
|
|||
analyzer_info = AnalyzerInfo(name=self.TOOL_NAME) | |||
for file_path, file_reports in file_to_report.items(): | |||
source_file = os.path.basename(file_path) | |||
source_file = ("-".join(file_path.split("/")[-3:])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like too much of a symptom fix and a band-aid, which just begs to be broken sometime later by someone whose directories only differ 3 or 4 hops up the chain.
I'm not sure at a glance how hot this path is, but maybe instead we should put a sort of hash of the full file path into the file name?
(At least that is what clangd does with its cache: Something.cpp.AAAAAAAAAA.idx
is the file they create, where AAAA...
is a hash of the configuration that produced the file (so it includes not only path, but a hash of the compile command too, as far as I know.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @whisperity, this feels like a hacky solution which also not always works.
Now the report-converter
tool has a --filename
option which handles two special values: {source_file}
and {analyzer}
. My recommendation is to create another special value (e.g.: {hash}
) which will insert a hash into the file name.
You can create the file hash similarly to this:
hashlib.md5(build_info.encode(errors='ignore')).hexdigest() |
And the user will be able to use it like this: report-converter --filename "{source_file}_{analyzer}_{hash}"
Also it would be good to add a test case for this use case too to test that report-converter will create multiple plist files not just one.
@@ -147,7 +147,7 @@ def _write( | |||
|
|||
analyzer_info = AnalyzerInfo(name=self.TOOL_NAME) | |||
for file_path, file_reports in file_to_report.items(): | |||
source_file = os.path.basename(file_path) | |||
source_file = ("-".join(file_path.split("/")[-3:])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @whisperity, this feels like a hacky solution which also not always works.
Now the report-converter
tool has a --filename
option which handles two special values: {source_file}
and {analyzer}
. My recommendation is to create another special value (e.g.: {hash}
) which will insert a hash into the file name.
You can create the file hash similarly to this:
hashlib.md5(build_info.encode(errors='ignore')).hexdigest() |
And the user will be able to use it like this: report-converter --filename "{source_file}_{analyzer}_{hash}"
Also it would be good to add a test case for this use case too to test that report-converter will create multiple plist files not just one.
246dc90
to
29753c3
Compare
Just reworked the commit to use a file hash, hope this does the job? I'm happy to write a test, does anyone know where I can do this since the test suite is so big? |
tools/report-converter/codechecker_report_converter/analyzers/analyzer_result.py
Show resolved
Hide resolved
29753c3
to
999446b
Compare
Just implemented all the comments, let me know if I need to update any more. |
999446b
to
29149b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two more tiny comments, otherwise LGTM!
tools/report-converter/tests/functional/cmdline/test_cmdline.py
Outdated
Show resolved
Hide resolved
@@ -148,10 +149,13 @@ def _write( | |||
analyzer_info = AnalyzerInfo(name=self.TOOL_NAME) | |||
for file_path, file_reports in file_to_report.items(): | |||
source_file = os.path.basename(file_path) | |||
file_hash = hashlib.md5(file_path.encode(errors='ignore')) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to normalize the file path above when we put it into the dictionary here:
codechecker/tools/report-converter/codechecker_report_converter/analyzers/analyzer_result.py
Line 146 in fc3d1fe
file_to_report[report.file.original_path].append(report) |
Change this line to this:
file_path = os.path.normpath(report.file.original_path)
file_to_report[file_path].append(report)
This way if we have a/b/../x.cpp
and a/x.cpp
it will not treat it as two different file paths and will generate two plist files but only one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on this concept there may be a typo in the last line but I made the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milanlakhani In which line do you think there is a typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'it will not treat it as two different file paths and will generate two plist files but only one.' @csordasmarton
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two plist files but one filepath? i don't see the advantage
29149b1
to
09e660f
Compare
@csordasmarton thanks for your help! Makes sense to test equal to 2. Made those changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some github actions are failed. Can you please fix it?
Also thank you very much for your hard work on this patch 😊
This makes the filenames of plist format reports generated by report-converter include the file hash as well as the file name, since before if two different plist files had the same name there was only one of the reports in the database. Fixes Ericsson#3436 .
09e660f
to
d43bdb0
Compare
Add test to check that report-converter creates a pfile for each file that a bug is found in.
d43bdb0
to
9a194a3
Compare
Hi Marton, looks like the trivial bugs are done. Do you have any idea on how we can fix those 2 tests? :) |
@milanlakhani The problem is that now there is a unique hash in the plist file names produced by the To fix the test cases I recommend to change every For example change these lines: codechecker/tools/report-converter/tests/unit/analyzers/test_asan_parser.py Lines 54 to 55 in 3246665
to this: self.analyzer_result.transform(
'asan.out', self.cc_result_dir, plist.EXTENSION,
file_name="{source_file}_{analyzer}") And please try to run the test cases locally before you push it again so you can fix every test cases in one shot. You can run test cases locally with the following commands:
|
The tests are made to expect filenames with no hashes, so this removes the hashes of the filepaths from the file name and makes the file name just consist of {source_file}_{analyzer}.
716620b
to
1c36d2f
Compare
I tried testing locally and I got a fail in 1/1 test 0.03 seconds with a syntax error in |
Thanks for all the help @csordasmarton |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milanlakhani It works perfectly 😊 Thank you very much for your patch and your hard work on it 😊
This makes the filenames of plist format reports generated by
report-converter include the names of the parent and grandparent
directories as well as the filename, since before if two different
plist files had the same name there was only one of the reports in
the database. Fixes #3436 .