Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when analyzing large file with binary ninja backend becuase the IL function is not available #2249

Closed
xusheng6 opened this issue Jul 31, 2024 · 9 comments

Comments

@xusheng6
Copy link
Contributor

Stack trace:

 Traceback (most recent call last):
  File "/home/[REDACTED]/.local/bin/capa", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/[REDACTED]/App/capa/capa/main.py", line 860, in main
    capabilities, counts = find_capabilities(rules, extractor, disable_progress=args.quiet)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/[REDACTED]/App/capa/capa/capabilities/common.py", line 75, in find_capabilities
    return find_static_capabilities(ruleset, extractor, disable_progress=disable_progress, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/[REDACTED]/App/capa/capa/capabilities/static.py", line 183, in find_static_capabilities
    function_matches, bb_matches, insn_matches, feature_count = find_code_capabilities(
                                                                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/[REDACTED]/App/capa/capa/capabilities/static.py", line 128, in find_code_capabilities
    for feature, va in itertools.chain(extractor.extract_function_features(fh), extractor.extract_global_features()):
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/[REDACTED]/App/capa/capa/features/extractors/binja/extractor.py", line 52, in extract_function_features
    yield from capa.features.extractors.binja.function.extract_features(fh)
  File "/home/[REDACTED]/App/capa/capa/features/extractors/binja/function.py", line 100, in extract_features
    for feature, addr in func_handler(fh):
                         ^^^^^^^^^^^^^^^^
  File "/home/[REDACTED]/App/capa/capa/features/extractors/binja/function.py", line 27, in extract_function_calls_to
    llil = caller.llil
           ^^^^^^^^^^^
  File "/home/[REDACTED]/App/BinaryNinja/binaryninja/python/binaryninja/binaryview.py", line 125, in llil
    return self.function.get_low_level_il_at(self.address, self.arch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/[REDACTED]/App/BinaryNinja/binaryninja/python/binaryninja/function.py", line 1726, in get_low_level_il_at
    llil = self.llil
           ^^^^^^^^^
  File "/home/[REDACTED]/App/BinaryNinja/binaryninja/python/binaryninja/function.py", line 946, in llil
    raise ILException(f"Low level IL was not loaded for {self!r}")
binaryninja.exceptions.ILException: Low level IL was not loaded for <func: x86_64@0x23e750>

This happens because when analyzing large files, binary ninja does not always generate the IL for all the functions. The code should be improved to account for the situation and only try to access the IL if it is available. Furthermore, there should be an option to force binary ninja to generate the IL for all the functions, at the cost of longer analysis time and RAM usage

@xusheng6
Copy link
Contributor Author

A fix will be coming soon for it

@xusheng6
Copy link
Contributor Author

xusheng6 commented Nov 21, 2024

@mr-tz please add the "binary-ninja" tag on this issue and also #2489, #2499, #2496

@xusheng6
Copy link
Contributor Author

xusheng6 commented Nov 21, 2024

We are unlikely to create a way to force analysis to be done when it exceeds the thresholds, at least headlessly. That will be way too easy to lead to runaway analysis and eat all the RAM. In case of obfuscated or complex code, one should first use Binary Ninja GUI to fix the issue, save the database, and then run capa on it. See #2496

@xusheng6 xusheng6 changed the title Crash when analyzing large file with binary ninja backend Crash when analyzing large file with binary ninja backend becuase the IL function is not available Nov 21, 2024
@xusheng6
Copy link
Contributor Author

This is actually caused by Vector35/binaryninja-api#6020

@mr-tz
Copy link
Collaborator

mr-tz commented Nov 21, 2024

Thanks for looking into all these issues, @xusheng6! I love how capa helps to improve other analysis tools.

I've added the labels and will keep an eye out for future related issues.

@xusheng6
Copy link
Contributor Author

Thanks for looking into all these issues, @xusheng6! I love how capa helps to improve other analysis tools.

I've added the labels and will keep an eye out for future related issues.

Capa and binja are helping each other to become better!

@xusheng6
Copy link
Contributor Author

xusheng6 commented Nov 25, 2024

Status update on this:

  1. The crash happens due to an oversight that the IL of a function can be unavailable in an unexpected way. The crash itself is fixed in Various binja backend fixes #2500. Yet, the fix is more like a bandit -- since we are not just skipping the analysis of those functions whose IL cannot be retrieved. This can lead to false negatives in the detection
  2. We triaged the underlying issue in binja and created this issue (On-Demand Function Analysis is Triggering Time and Update Count Limits Vector35/binaryninja-api#6171) with more details on why the IL can be unavailable, when it definitely should be. We have already fixed it in dev 4.3.6482. That said, since capa is testing against the stable build of binja, the fix would not be really available after a few months when we release the next version.

How to validate the binja fix is in effect: run capa with debug mode on the sample b5f0524e69b3a3cf636c7ac366ca57bf5e3a8fdc8a9f01caf196c611a7918a87.elf_, and verify the function 0x8082d40 has 1373 or so features, rather than just a handful

@williballenthin
Copy link
Collaborator

How high does memory usage grow if we cache all of the IL for a program?

Within capa (specifically, the capa Binja backend/integration), we could do an initial pass that fetches the IL for all the functions in the program. Then we could use this later rather than computing the IL on demand. I understand this trades memory usage for performance - is this possible/reasonable?

@xusheng6
Copy link
Contributor Author

How high does memory usage grow if we cache all of the IL for a program?

Within capa (specifically, the capa Binja backend/integration), we could do an initial pass that fetches the IL for all the functions in the program. Then we could use this later rather than computing the IL on demand. I understand this trades memory usage for performance - is this possible/reasonable?

I am not sure. I will think of it.

I am also thinking of some other ways to avoid the "random" access on function ILs so that the pattern will be more cache-friendly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants