Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[analyzer] Collect CTU-involved files in the report directory #3029

Merged
merged 2 commits into from
Jan 19, 2021

Conversation

bruntib
Copy link
Contributor

@bruntib bruntib commented Nov 17, 2020

When debugging analysis failures it is important to have all involved source
files. In case of CTU analysis the tu_collector tool is not informed about what
other TUs were used, so CodeChecker now collects this information under the
report directory.

@bruntib bruntib added enhancement 🌟 WIP 💣 Work In Progress CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands analyzer 📈 Related to the analyze commands (analysis driver) labels Nov 17, 2020
@bruntib bruntib added this to the release 6.16.0 milestone Nov 17, 2020

if involved_files:
out = os.path.join(output_dir, result_handler.analyzer_action_str)
with open(out, 'w') as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with open(out, 'w') as f:
with open(out, 'w', encoding='utf-8', errors='ignore') as f:

involved_files.update(source_analyzer.get_analyzer_mentioned_files(
result_handler.analyzer_stderr))

if involved_files:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that previously this set wasn't empty, so we created an involved file, but the next time it was empty, so we do nothing. Do not we need to remove the previous file? So something similar to this:

out = os.path.join(output_dir, result_handler.analyzer_action_str)
if involved_files:
    ...
else if os.path.exists(out):
    os.remove(out)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. This directory contains generated files for for those TUs which involve some other source files during CTU analysis. It doesn't matter if there were files here with the same name because those will be rewritten. The content of this directory behaves the same way as failed directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bruntib Okay, I try to explain my problem again. Lets assume that you analyzed the same TU two times. The first time the involved_files variable contains some file (lib.cpp) so you will create a file (result_handler.analyzer_action_str) which will contain the involved files (lib.cpp). If you change something in your code, you analyze your TU again and the involved_files set is empty you will do nothing. But in the output_dir there will be a file for this TU (result_handler.analyzer_action_str) which will contain the lib.cpp involved file from the previous analysis.
My question was that in this case don't we need to remove this file if it's exist and the involved_files set is empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it doesn't cause any problem if those files are not removed. This is in analogy with failed ZIPs and .plist files: those are also not removed when you have lass analyzed TUs. Though we can remove these and the failed ZIPs if we want to gain some free space, though it's not that significant I think. But I'll check it and do this removal for failed ZIPs accordingly in a next commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I did it.

gyorb
gyorb previously requested changes Nov 20, 2020
# We assume that only main.c has been analyzed with CTU and it involves
# lib.c during its analysis.
connections_dir = os.path.join(self.report_dir, 'ctu_connections')
connections_file = os.listdir(connections_dir)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect more files in this directory? Will the first file always be main.c? Should lib.c be checked here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This folder contains files only for sources that involve other sources during CTU analysis. For analyzing lib.c no other sources are needed. Actually we can assert that the length of this is 1, I'll do that.

@@ -627,6 +648,9 @@ def __create_timeout(analyzer_process):
handle_failure(source_analyzer, rh, zip_file,
result_base, actions_map)

collect_ctu_involved_files(rh, source_analyzer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this collection even if the analysis was successful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do. These files have to be used for replicating false-positive reports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, please add a comment here why the collection is done even if the analysis was successful.

@bruntib bruntib force-pushed the collect_ctu_involved_files branch from 398e352 to debe0d3 Compare November 23, 2020 15:52
@bruntib bruntib removed the WIP 💣 Work In Progress label Nov 23, 2020
Copy link
Contributor

@martong martong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Tibi,

I like the overall approach. Could you write an example usage?
I am thinking about something like this:

CodeChecker analyze --ctu x.json -o reports // produces the dir we need
CodeChecker tu_collect reports/what_to_write_here/dir // produces a zip file

@martong
Copy link
Contributor

martong commented Dec 2, 2020

Hi Tibi,

I like the overall approach. Could you write an example usage?
I am thinking about something like this:

CodeChecker analyze --ctu x.json -o reports // produces the dir we need
CodeChecker tu_collect reports/what_to_write_here/dir // produces a zip file

Well, actually, maybe I am missing some user docs :)

When debugging analysis failures it is important to have all involved source
files. In case of CTU analysis the tu_collector tool is not informed about what
other TUs were used, so CodeChecker now collects this information under the
report directory.
@bruntib bruntib force-pushed the collect_ctu_involved_files branch from debe0d3 to b1aec31 Compare January 18, 2021 19:49
@bruntib
Copy link
Contributor Author

bruntib commented Jan 18, 2021

@martong Thanks for the warning, I forgot the documentation from the --ctu-deps-dir flag of tu_collector script.

@bruntib bruntib force-pushed the collect_ctu_involved_files branch from b1aec31 to f8a80b2 Compare January 19, 2021 08:53
Copy link
Contributor

@csordasmarton csordasmarton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some tiny comments otherwise LGTM!

@@ -60,3 +60,36 @@ def test_file_existence(self):
self.assertTrue(
any([path.endswith(os.path.join('/', 'hello.c')) for path in files]))
self.assertIn('compilation_database.json', files)

def test_ctu_collection(self):
ctu_deps_dir = tempfile.mkdtemp()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to create temporary directory on the following way:


If something happens between the creation and remove phase it will not mess up my system with unecessary directories.

with open(os.path.join(ctu_deps_dir, hash_fun(ctu_action)), 'w') as f:
f.write(os.path.join(self._test_proj_dir, 'zero.cpp'))

zip_file_name = tempfile.mkstemp(suffix='.zip')[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my above comment. Use the following approach to create a temp file:

with tempfile.NamedTemporaryFile() as component_f:

@@ -299,9 +353,20 @@ def zip_tu_files(zip_file, compilation_database, write_mode='w'):
zip_file -- A file name or a file object.
compilation_database -- Either a path of the compilation database JSON file
or a list of the parsed JSON.
file_filter -- TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation is missing (TODO) for this parameter.

CodeChecker generates files under report directory which list which other
source files were involved in a CTU analysis. tu_collector needs to collect
these files too, so it has been extended with --ctu-deps-dir flag that can be
given this generated folder.
@bruntib bruntib requested a review from csordasmarton January 19, 2021 13:08
@bruntib bruntib force-pushed the collect_ctu_involved_files branch from f8a80b2 to ee98547 Compare January 19, 2021 13:09
Copy link
Contributor

@csordasmarton csordasmarton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@csordasmarton csordasmarton dismissed gyorb’s stale review January 19, 2021 13:49

Your comments are fixed.

@csordasmarton csordasmarton merged commit 3496ecf into Ericsson:master Jan 19, 2021
@bruntib bruntib deleted the collect_ctu_involved_files branch January 20, 2021 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer 📈 Related to the analyze commands (analysis driver) CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands enhancement 🌟
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants