Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid plist filenames being the same #3588

Merged

Conversation

milanlakhani
Copy link
Contributor

This makes the filenames of plist format reports generated by
report-converter include the names of the parent and grandparent
directories as well as the filename, since before if two different
plist files had the same name there was only one of the reports in
the database. Fixes #3436 .

@@ -147,7 +147,7 @@ def _write(

analyzer_info = AnalyzerInfo(name=self.TOOL_NAME)
for file_path, file_reports in file_to_report.items():
source_file = os.path.basename(file_path)
source_file = ("-".join(file_path.split("/")[-3:]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like too much of a symptom fix and a band-aid, which just begs to be broken sometime later by someone whose directories only differ 3 or 4 hops up the chain.

I'm not sure at a glance how hot this path is, but maybe instead we should put a sort of hash of the full file path into the file name?

(At least that is what clangd does with its cache: Something.cpp.AAAAAAAAAA.idx is the file they create, where AAAA... is a hash of the configuration that produced the file (so it includes not only path, but a hash of the compile command too, as far as I know.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @whisperity, this feels like a hacky solution which also not always works.

Now the report-converter tool has a --filename option which handles two special values: {source_file} and {analyzer}. My recommendation is to create another special value (e.g.: {hash}) which will insert a hash into the file name.
You can create the file hash similarly to this:

hashlib.md5(build_info.encode(errors='ignore')).hexdigest()

And the user will be able to use it like this: report-converter --filename "{source_file}_{analyzer}_{hash}"

Also it would be good to add a test case for this use case too to test that report-converter will create multiple plist files not just one.

@@ -147,7 +147,7 @@ def _write(

analyzer_info = AnalyzerInfo(name=self.TOOL_NAME)
for file_path, file_reports in file_to_report.items():
source_file = os.path.basename(file_path)
source_file = ("-".join(file_path.split("/")[-3:]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @whisperity, this feels like a hacky solution which also not always works.

Now the report-converter tool has a --filename option which handles two special values: {source_file} and {analyzer}. My recommendation is to create another special value (e.g.: {hash}) which will insert a hash into the file name.
You can create the file hash similarly to this:

hashlib.md5(build_info.encode(errors='ignore')).hexdigest()

And the user will be able to use it like this: report-converter --filename "{source_file}_{analyzer}_{hash}"

Also it would be good to add a test case for this use case too to test that report-converter will create multiple plist files not just one.

@milanlakhani milanlakhani force-pushed the lano/reportconverter-files branch from 246dc90 to 29753c3 Compare February 1, 2022 19:57
@milanlakhani
Copy link
Contributor Author

Just reworked the commit to use a file hash, hope this does the job? I'm happy to write a test, does anyone know where I can do this since the test suite is so big?
Thanks,
Milan

@milanlakhani milanlakhani force-pushed the lano/reportconverter-files branch from 29753c3 to 999446b Compare February 2, 2022 11:50
@milanlakhani
Copy link
Contributor Author

milanlakhani commented Feb 2, 2022

Just implemented all the comments, let me know if I need to update any more.

@milanlakhani milanlakhani force-pushed the lano/reportconverter-files branch from 999446b to 29149b1 Compare February 2, 2022 12:02
Copy link
Contributor

@csordasmarton csordasmarton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two more tiny comments, otherwise LGTM!

@@ -148,10 +149,13 @@ def _write(
analyzer_info = AnalyzerInfo(name=self.TOOL_NAME)
for file_path, file_reports in file_to_report.items():
source_file = os.path.basename(file_path)
file_hash = hashlib.md5(file_path.encode(errors='ignore')) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to normalize the file path above when we put it into the dictionary here:

file_to_report[report.file.original_path].append(report)

Change this line to this:

file_path = os.path.normpath(report.file.original_path)
file_to_report[file_path].append(report)

This way if we have a/b/../x.cpp and a/x.cpp it will not treat it as two different file paths and will generate two plist files but only one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on this concept there may be a typo in the last line but I made the change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milanlakhani In which line do you think there is a typo?

Copy link
Contributor Author

@milanlakhani milanlakhani Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'it will not treat it as two different file paths and will generate two plist files but only one.' @csordasmarton

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two plist files but one filepath? i don't see the advantage

@csordasmarton csordasmarton added this to the release 6.19.0 milestone Feb 3, 2022
@csordasmarton csordasmarton added bugfix 🔨 tools 🛠️ Meta-tag for all the additional tools supplied with CodeChecker: plist2html, tu_collector, etc. labels Feb 3, 2022
@milanlakhani milanlakhani force-pushed the lano/reportconverter-files branch from 29149b1 to 09e660f Compare February 3, 2022 15:01
@milanlakhani
Copy link
Contributor Author

@csordasmarton thanks for your help! Makes sense to test equal to 2. Made those changes

Copy link
Contributor

@csordasmarton csordasmarton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some github actions are failed. Can you please fix it?

Also thank you very much for your hard work on this patch 😊

tools/report-converter/codechecker_report_converter/cli.py Outdated Show resolved Hide resolved
This makes the filenames of plist format reports generated by
report-converter include the file hash as well as the file name,
since before if two different plist files had the same name there
was only one of the reports in the database. Fixes Ericsson#3436 .
@milanlakhani milanlakhani force-pushed the lano/reportconverter-files branch from 09e660f to d43bdb0 Compare February 3, 2022 15:28
Add test to check that report-converter creates a pfile for each file
that a bug is found in.
@milanlakhani milanlakhani force-pushed the lano/reportconverter-files branch from d43bdb0 to 9a194a3 Compare February 3, 2022 18:48
@milanlakhani
Copy link
Contributor Author

Hi Marton, looks like the trivial bugs are done. Do you have any idea on how we can fix those 2 tests? :)

@csordasmarton
Copy link
Contributor

@milanlakhani The problem is that now there is a unique hash in the plist file names produced by the report_converter but the test cases are expect no hashes in the file names. This is the reason why the test cases are failing.

To fix the test cases I recommend to change every .transform function call in the test cases and override the default file_name parameter.

For example change these lines:

self.analyzer_result.transform(
'asan.out', self.cc_result_dir, plist.EXTENSION)

to this:

        self.analyzer_result.transform(
            'asan.out', self.cc_result_dir, plist.EXTENSION,
            file_name="{source_file}_{analyzer}")

And please try to run the test cases locally before you push it again so you can fix every test cases in one shot.

You can run test cases locally with the following commands:

  • Only the unit test cases: make -C tools/report-converter/ test_unit
  • Only the functional test cases: make -C tools/report-converter/ test_functional
  • All the test cases: make -C tools/report-converter/ test

The tests are made to expect filenames with no hashes, so this removes
the hashes of the filepaths from the file name and makes the file name
just consist of {source_file}_{analyzer}.
@milanlakhani milanlakhani force-pushed the lano/reportconverter-files branch from 716620b to 1c36d2f Compare February 5, 2022 10:07
@milanlakhani
Copy link
Contributor Author

I tried testing locally and I got a fail in 1/1 test 0.03 seconds with a syntax error in def get_reports(self, file_path: str) -> List[Report]: in tools/report-converter/codechecker_report_converter/analyzers/sanitizers/address/analyzer_result.py

@milanlakhani
Copy link
Contributor Author

milanlakhani commented Feb 6, 2022

Thanks for all the help @csordasmarton

Copy link
Contributor

@csordasmarton csordasmarton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milanlakhani It works perfectly 😊 Thank you very much for your patch and your hard work on it 😊

@csordasmarton csordasmarton merged commit 9004c10 into Ericsson:master Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix 🔨 tools 🛠️ Meta-tag for all the additional tools supplied with CodeChecker: plist2html, tu_collector, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing results in database
3 participants