-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[analyzer] Collect CTU-involved files in the report directory #3029
[analyzer] Collect CTU-involved files in the report directory #3029
Conversation
|
||
if involved_files: | ||
out = os.path.join(output_dir, result_handler.analyzer_action_str) | ||
with open(out, 'w') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with open(out, 'w') as f: | |
with open(out, 'w', encoding='utf-8', errors='ignore') as f: |
involved_files.update(source_analyzer.get_analyzer_mentioned_files( | ||
result_handler.analyzer_stderr)) | ||
|
||
if involved_files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible that previously this set wasn't empty, so we created an involved file, but the next time it was empty, so we do nothing. Do not we need to remove the previous file? So something similar to this:
out = os.path.join(output_dir, result_handler.analyzer_action_str)
if involved_files:
...
else if os.path.exists(out):
os.remove(out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment. This directory contains generated files for for those TUs which involve some other source files during CTU analysis. It doesn't matter if there were files here with the same name because those will be rewritten. The content of this directory behaves the same way as failed
directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bruntib Okay, I try to explain my problem again. Lets assume that you analyzed the same TU two times. The first time the involved_files
variable contains some file (lib.cpp
) so you will create a file (result_handler.analyzer_action_str
) which will contain the involved files (lib.cpp
). If you change something in your code, you analyze your TU again and the involved_files
set is empty you will do nothing. But in the output_dir
there will be a file for this TU (result_handler.analyzer_action_str
) which will contain the lib.cpp
involved file from the previous analysis.
My question was that in this case don't we need to remove this file if it's exist and the involved_files
set is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it doesn't cause any problem if those files are not removed. This is in analogy with failed ZIPs and .plist files: those are also not removed when you have lass analyzed TUs. Though we can remove these and the failed ZIPs if we want to gain some free space, though it's not that significant I think. But I'll check it and do this removal for failed ZIPs accordingly in a next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I did it.
# We assume that only main.c has been analyzed with CTU and it involves | ||
# lib.c during its analysis. | ||
connections_dir = os.path.join(self.report_dir, 'ctu_connections') | ||
connections_file = os.listdir(connections_dir)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect more files in this directory? Will the first file always be main.c
? Should lib.c
be checked here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This folder contains files only for sources that involve other sources during CTU analysis. For analyzing lib.c
no other sources are needed. Actually we can assert that the length of this is 1, I'll do that.
@@ -627,6 +648,9 @@ def __create_timeout(analyzer_process): | |||
handle_failure(source_analyzer, rh, zip_file, | |||
result_base, actions_map) | |||
|
|||
collect_ctu_involved_files(rh, source_analyzer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this collection even if the analysis was successful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do. These files have to be used for replicating false-positive reports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, please add a comment here why the collection is done even if the analysis was successful.
398e352
to
debe0d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Tibi,
I like the overall approach. Could you write an example usage?
I am thinking about something like this:
CodeChecker analyze --ctu x.json -o reports // produces the dir we need
CodeChecker tu_collect reports/what_to_write_here/dir // produces a zip file
Well, actually, maybe I am missing some user docs :) |
When debugging analysis failures it is important to have all involved source files. In case of CTU analysis the tu_collector tool is not informed about what other TUs were used, so CodeChecker now collects this information under the report directory.
debe0d3
to
b1aec31
Compare
@martong Thanks for the warning, I forgot the documentation from the |
b1aec31
to
f8a80b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some tiny comments otherwise LGTM!
@@ -60,3 +60,36 @@ def test_file_existence(self): | |||
self.assertTrue( | |||
any([path.endswith(os.path.join('/', 'hello.c')) for path in files])) | |||
self.assertIn('compilation_database.json', files) | |||
|
|||
def test_ctu_collection(self): | |||
ctu_deps_dir = tempfile.mkdtemp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to create temporary directory on the following way:
Line 163 in ed84de2
with TemporaryDirectory() as zip_dir: |
If something happens between the creation and remove phase it will not mess up my system with unecessary directories.
with open(os.path.join(ctu_deps_dir, hash_fun(ctu_action)), 'w') as f: | ||
f.write(os.path.join(self._test_proj_dir, 'zero.cpp')) | ||
|
||
zip_file_name = tempfile.mkstemp(suffix='.zip')[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my above comment. Use the following approach to create a temp file:
with tempfile.NamedTemporaryFile() as component_f: |
@@ -299,9 +353,20 @@ def zip_tu_files(zip_file, compilation_database, write_mode='w'): | |||
zip_file -- A file name or a file object. | |||
compilation_database -- Either a path of the compilation database JSON file | |||
or a list of the parsed JSON. | |||
file_filter -- TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation is missing (TODO) for this parameter.
CodeChecker generates files under report directory which list which other source files were involved in a CTU analysis. tu_collector needs to collect these files too, so it has been extended with --ctu-deps-dir flag that can be given this generated folder.
f8a80b2
to
ee98547
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
When debugging analysis failures it is important to have all involved source
files. In case of CTU analysis the tu_collector tool is not informed about what
other TUs were used, so CodeChecker now collects this information under the
report directory.