[analyzer] Collect CTU-involved files in the report directory #3029

bruntib · 2020-11-17T22:16:23Z

When debugging analysis failures it is important to have all involved source
files. In case of CTU analysis the tu_collector tool is not informed about what
other TUs were used, so CodeChecker now collects this information under the
report directory.

csordasmarton · 2020-11-18T13:36:38Z

analyzer/codechecker_analyzer/analysis_manager.py

+
+    if involved_files:
+        out = os.path.join(output_dir, result_handler.analyzer_action_str)
+        with open(out, 'w') as f:


Suggested change

with open(out, 'w') as f:

with open(out, 'w', encoding='utf-8', errors='ignore') as f:

csordasmarton · 2020-11-18T13:40:14Z

analyzer/codechecker_analyzer/analysis_manager.py

+    involved_files.update(source_analyzer.get_analyzer_mentioned_files(
+        result_handler.analyzer_stderr))
+
+    if involved_files:


It is possible that previously this set wasn't empty, so we created an involved file, but the next time it was empty, so we do nothing. Do not we need to remove the previous file? So something similar to this:

out = os.path.join(output_dir, result_handler.analyzer_action_str) if involved_files: ... else if os.path.exists(out): os.remove(out)

I don't understand this comment. This directory contains generated files for for those TUs which involve some other source files during CTU analysis. It doesn't matter if there were files here with the same name because those will be rewritten. The content of this directory behaves the same way as failed directory.

@bruntib Okay, I try to explain my problem again. Lets assume that you analyzed the same TU two times. The first time the involved_files variable contains some file (lib.cpp) so you will create a file (result_handler.analyzer_action_str) which will contain the involved files (lib.cpp). If you change something in your code, you analyze your TU again and the involved_files set is empty you will do nothing. But in the output_dir there will be a file for this TU (result_handler.analyzer_action_str) which will contain the lib.cpp involved file from the previous analysis.
My question was that in this case don't we need to remove this file if it's exist and the involved_files set is empty?

Well, it doesn't cause any problem if those files are not removed. This is in analogy with failed ZIPs and .plist files: those are also not removed when you have lass analyzed TUs. Though we can remove these and the failed ZIPs if we want to gain some free space, though it's not that significant I think. But I'll check it and do this removal for failed ZIPs accordingly in a next commit.

Ok, I did it.

gyorb · 2020-11-20T12:46:11Z

analyzer/tests/functional/ctu/test_ctu.py

+        # We assume that only main.c has been analyzed with CTU and it involves
+        # lib.c during its analysis.
+        connections_dir = os.path.join(self.report_dir, 'ctu_connections')
+        connections_file = os.listdir(connections_dir)[0]


Do we expect more files in this directory? Will the first file always be main.c? Should lib.c be checked here too?

This folder contains files only for sources that involve other sources during CTU analysis. For analyzing lib.c no other sources are needed. Actually we can assert that the length of this is 1, I'll do that.

gyorb · 2020-11-20T12:48:25Z

analyzer/codechecker_analyzer/analysis_manager.py

@@ -627,6 +648,9 @@ def __create_timeout(analyzer_process):
                    handle_failure(source_analyzer, rh, zip_file,
                                   result_base, actions_map)

+        collect_ctu_involved_files(rh, source_analyzer,


Do we want this collection even if the analysis was successful?

Yes, we do. These files have to be used for replicating false-positive reports.

Ok, please add a comment here why the collection is done even if the analysis was successful.

martong

Hi Tibi,

I like the overall approach. Could you write an example usage?
I am thinking about something like this:

CodeChecker analyze --ctu x.json -o reports // produces the dir we need
CodeChecker tu_collect reports/what_to_write_here/dir // produces a zip file

martong · 2020-12-02T14:07:55Z

Hi Tibi,

I like the overall approach. Could you write an example usage?
I am thinking about something like this:
CodeChecker analyze --ctu x.json -o reports // produces the dir we need
CodeChecker tu_collect reports/what_to_write_here/dir // produces a zip file

Well, actually, maybe I am missing some user docs :)

When debugging analysis failures it is important to have all involved source files. In case of CTU analysis the tu_collector tool is not informed about what other TUs were used, so CodeChecker now collects this information under the report directory.

bruntib · 2021-01-18T19:50:01Z

@martong Thanks for the warning, I forgot the documentation from the --ctu-deps-dir flag of tu_collector script.

csordasmarton

Just some tiny comments otherwise LGTM!

csordasmarton · 2021-01-19T12:25:56Z

tools/tu_collector/tests/tu_collector_test.py

@@ -60,3 +60,36 @@ def test_file_existence(self):
        self.assertTrue(
            any([path.endswith(os.path.join('/', 'hello.c')) for path in files]))
        self.assertIn('compilation_database.json', files)
+
+    def test_ctu_collection(self):
+        ctu_deps_dir = tempfile.mkdtemp()


I prefer to create temporary directory on the following way:

codechecker/web/tests/functional/storage_of_analysis_statistics/test_storage_of_analysis_statistics.py

Line 163 in ed84de2

with TemporaryDirectory() as zip_dir:

If something happens between the creation and remove phase it will not mess up my system with unecessary directories.

csordasmarton · 2021-01-19T12:27:12Z

tools/tu_collector/tests/tu_collector_test.py

+        with open(os.path.join(ctu_deps_dir, hash_fun(ctu_action)), 'w') as f:
+            f.write(os.path.join(self._test_proj_dir, 'zero.cpp'))
+
+        zip_file_name = tempfile.mkstemp(suffix='.zip')[1]


Similar to my above comment. Use the following approach to create a temp file:

codechecker/web/tests/functional/cmdline/test_cmdline.py

Line 268 in ed84de2

with tempfile.NamedTemporaryFile() as component_f:

csordasmarton · 2021-01-19T12:28:16Z

tools/tu_collector/tu_collector/tu_collector.py

@@ -299,9 +353,20 @@ def zip_tu_files(zip_file, compilation_database, write_mode='w'):
    zip_file -- A file name or a file object.
    compilation_database -- Either a path of the compilation database JSON file
                            or a list of the parsed JSON.
+    file_filter -- TODO


The documentation is missing (TODO) for this parameter.

CodeChecker generates files under report directory which list which other source files were involved in a CTU analysis. tu_collector needs to collect these files too, so it has been extended with --ctu-deps-dir flag that can be given this generated folder.

csordasmarton

LGTM!

Your comments are fixed.

bruntib added enhancement 🌟 WIP 💣 Work In Progress CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands analyzer 📈 Related to the analyze commands (analysis driver) labels Nov 17, 2020

bruntib added this to the release 6.16.0 milestone Nov 17, 2020

bruntib requested review from csordasmarton and gyorb November 17, 2020 22:16

csordasmarton suggested changes Nov 18, 2020

View reviewed changes

gyorb previously requested changes Nov 20, 2020

View reviewed changes

bruntib force-pushed the collect_ctu_involved_files branch from 398e352 to debe0d3 Compare November 23, 2020 15:52

bruntib requested review from csordasmarton, gyorb and martong November 23, 2020 15:55

bruntib removed the WIP 💣 Work In Progress label Nov 23, 2020

martong reviewed Dec 2, 2020

View reviewed changes

bruntib force-pushed the collect_ctu_involved_files branch from debe0d3 to b1aec31 Compare January 18, 2021 19:49

bruntib force-pushed the collect_ctu_involved_files branch from b1aec31 to f8a80b2 Compare January 19, 2021 08:53

csordasmarton suggested changes Jan 19, 2021

View reviewed changes

bruntib requested a review from csordasmarton January 19, 2021 13:08

bruntib force-pushed the collect_ctu_involved_files branch from f8a80b2 to ee98547 Compare January 19, 2021 13:09

csordasmarton approved these changes Jan 19, 2021

View reviewed changes

csordasmarton merged commit 3496ecf into Ericsson:master Jan 19, 2021

bruntib deleted the collect_ctu_involved_files branch January 20, 2021 11:04

csordasmarton modified the milestones: release 6.16.0, release 6.15.1 Jan 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[analyzer] Collect CTU-involved files in the report directory #3029

[analyzer] Collect CTU-involved files in the report directory #3029

bruntib commented Nov 17, 2020

csordasmarton Nov 18, 2020

csordasmarton Nov 18, 2020

bruntib Nov 23, 2020

csordasmarton Nov 26, 2020

bruntib Dec 15, 2020

bruntib Jan 19, 2021

gyorb Nov 20, 2020

bruntib Nov 23, 2020

gyorb Nov 20, 2020

bruntib Nov 23, 2020

gyorb Nov 23, 2020

martong left a comment

martong commented Dec 2, 2020

bruntib commented Jan 18, 2021

csordasmarton left a comment

csordasmarton Jan 19, 2021

csordasmarton Jan 19, 2021

csordasmarton Jan 19, 2021

csordasmarton left a comment

	with open(out, 'w') as f:
	with open(out, 'w', encoding='utf-8', errors='ignore') as f:

[analyzer] Collect CTU-involved files in the report directory #3029

[analyzer] Collect CTU-involved files in the report directory #3029

Conversation

bruntib commented Nov 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martong left a comment

Choose a reason for hiding this comment

martong commented Dec 2, 2020

bruntib commented Jan 18, 2021

csordasmarton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csordasmarton left a comment

Choose a reason for hiding this comment