Releases: joxeankoret/diaphora
Diaphora 3.2.1
A really minor release containing only the following 2 changes:
PYTHON: Replace `imp` module (removed in Python 3.12)
GUI: Added a basic IDA plugin wrapper for Diaphora
Diaphora 3.2.0
A new release that mostly fixes bugs and adds workarounds for IDA bugs but also includes a new (absolutely experimental) feature: a Ridge classifier to try to learn what are good matches for the current set of binaries being diffed. It's experimental and subject to be heavily changed (or even removed), but I'm starting to ship it (even when disabled by default) so reverse engineers can start testing it.
The following is the whole list of changes:
MISC: Increase version number to 3.2.0.
CORE: Train a Ridge classifier using known good and bad results after `find_partial_matches()` in order to try to better determine what looks like a good match and what does not.
CONFIG: Added parameter `COMMIT_AFTER_EACH_GUI_UPDATE` to force committing.
CONFIG: Added parameter `EXPORTING_COMPILATION_UNITS` to enable/disable exporting them (with some huge databases it might take even hours!).
CONFIG: Added parameter `SHOW_IMPORT_WARNINGS` to enable/disable showing warnings when some important but optional Python packages aren't found.
CONFIG: Added parameters handling SQLite pragmas `SQLITE_JOURNAL_MODE` and `SQLITE_PRAGMA_SYNCHRONOUS`.
CORE: Increase the added similarity score in `deep_ratio` when constants (like strings or cryptographic constants) match.
CORE: Try to use `cdifflib` instead of Python's standard `difflib` when possible to get some performance gains.
EXTRAS: Added independent IDA plugin `extras/diaphora_local.py` to be able to diff functions inside the current binary.
GUI: Display the progress when exporting a large number of compilation units every 1% of CUs exported.
HEUR: Increase the threads join time (`THREADS_WAIT_TIME`) to 1 second.
HEUR: Mark heuristic "Same compilation unit" as slow.
HEUR: Remove the unreliable flag from heuristics "Pseudo-code fuzzy AST hash" and "Loop Count".
ML: Try to use the Ridge classifier as just another method to get a similarity ratio in `check_ratio`.
ML: Simplifications of the supervised learning based experimental engine.
ML: The current local model causes false positives with small functions, and functions with a huge difference in basic blocks. Diaphora will ignore such matches.
VULN: Add pattern "UNC" to potentially detect vulnerabilities fixed in Windows components involving UNC paths.
VULN: Do not use difflib.unified_diff as it's terribly slow; instead use difflib.ndiff.
BUG: Be sure to delete orphaned comments when importing pseudo-code comments.
BUG: Do a commit after all functions are exported so, in case IDA crashes for a reason/bug, Diaphora can properly recover from errors and have all the functions already exported there.
BUG: High addresses in operands could cause the Python's sqlite3 module to crash when inserting into the database.
BUG: Inserting the link between functions and compilation units was terribly-utterly-horribly wrong.
BUG: Messages from threads were being ignored by `log()`.
BUG: The workaround for "max non-trivial tinfo_t count has been reached" was wrong. Now, the Hex-Rays functions cache is cleared every 10,000 rows.
BUG: Timeouts in heuristics were not properly handled.
Diaphora 3.1.2
Yet another (mainly) bug fixes release that includes some interesting new little "features" to find potentially fixed vulnerabilities in Windows Kernel drivers, and significantly improves the code internally. Here is the whole change log:
VULN: Mark as interesting differences with the functions ProbeForRead
and ProbeForWrite
(useful to detect fixed vulnerabilities in Windows Kernel drivers).
CORE: Added a better parallelization engine to run heuristics.
CORE: Out of caution, use the lock at cleanup_matches()
.
DIFF: Better parallelization support for diffing.
HEUR: Do not calculate a similarity ratio for 2 functions if their bytes hash is the same.
HEUR: Do not check for potential previous better matches when the given input ratio is 1.0.
HEUR: Enable by default slow heuristics in databases with up to 4000 functions.
HEUR: Increase MIN_FUNCTIONS_TO_CONSIDER_MEDIUM to 8001.
HOOKS: Added hook for non SQL heuristics on_special_heuristic()
.
KFUZZY: Updated old fuzzy hashing code not touched since 2019.
TESTER: Added checks for call graph callers and callees (fix for #287).
BUG: Do not complain if no microcode mnemonic is found (I cannot reproduce this behaviour at all, but some Mac users are complaining about this).
BUG: Do not directly copy-paste the list of fields for a SELECT command, use get_query_fields()
instead.
BUG: For some reason, the function extract_function_assembly_features()
might receive a list of basic blocks not containing the function's entry point and Diaphora didn't consider this possibility (fix for #288).
BUG: When checking if functions are 100% equals check more fields than just the id, address, mangled_function and nodes.
Diaphora 3.1.1
This is mainly a bug fixes release that, however, includes 2 new heuristics and some experimental enhancements to try to find patched vulnerabilities when doing patch diffing. Here is the whole change log:
DIFF: Added a ratios cache to speed up comparison operations.
EXPORT: Added a column to save how long it took to export a single function.
EXPORT: Use cur.executemany()
instead of cur.execute()
whenever it's possible.
GUI: Added menu item "Show assembly patch".
HEUR: Added heuristic "Related compilation unit" to find functions by matching potential compilation units.
HEUR: Added heuristic "Same constants related matches" to find functions using the same constants in different places.
MISC: Refactored the code for finding potentially fixed vulnerabilities.
MISC: Replace multiple "SELECT *" appearances with just the required fields, where appropriate.
VULN: Added a few new patterns to try to find potentially fixed vulnerabilities.
VULN: Added heuristic to try to find fixed signedness issues for x86 and ARM.
BUG: Diaphora was calling ida_lines.get_srcline() for every assembly line. Fixed by doing it once per basic block.
BUG: The code for calculating the primes assigned to a compilation unit was terribly slow.
BUG: The microcode instructions list was built a lot of times instead of being done only once.
BUG: When importing pseudo-code comments, do not set the treeloc_t.item_preciser_t member itp
when the stored value is None.
Diaphora 3.1
This is a bug fix release. It fixes the following issues since the previous version:
BUG: It seems that the directory name "database" might conflict with an IDA supplied module also called database.py
BUG: The WHERE clause for the Microcode SPP heuristic was wrong.
BUG: Do not crash in patch diffing sessions if there is no pseudo-code available.
GUI: BUG: The main UI dialog might not be 100% visible with some screen resolutions.
GUI: BUG: Adding manual matches was partially wrong.
GUI: BUG: After closing the "Interesting matches" tab there was no way to show it again (like F3 does with other choosers).
WORKAROUND: Fix for issue #263. It seems IDA might crash if the decompiler is not available and Diaphora tries to use it.
Diaphora 3.0
MISC: Increased Diaphora version to 3.0.0.
API: Add support for the fist 'to be exposed' function, used to get the callgraph percent difference.
CONFIG: Added multiple configuration options in diaphora_config.py.
CORE: Added CodeCut support to find anonymous compilation units.
CORE: Added IDAMagicStrings to try to find compilation unit names.
CORE: Added support to enable/disable exporting microcode.
CORE: Added support to find and export Compilation Units.
CORE: By default, run a helper script that tries to find potentially fixed vulnerabilities when a patch diffing session with function names is detected.
CORE: Coallesce contiguous named compilation units using the minimum and maximum address.
CORE: Do not directly add matches to choosers, instead, work with internal Python dict objects and process them when the diffing session is done.
CORE: More refactorizations for properly supporting multimatches and finding the best matches.
CORE: Set a name to compilation units when enough matches indicate the name of the compilation unit using IDAMagicStrings.
DATABASE: Added proper indices and fine tunning of SQL heuristics and queries.
DATABASE: Moved tables and indices definitions to a different file.
DIFF: Support for handling mutiple matches by showing them in a different chooser.
DOC: Documented all functions and members in diaphora.py.
EXPORT: Add the func_id
field to the instructions
table.
EXPORT: Consider data references to functions from functions also code references.
GUI: Add support for Python logging facilities.
GUI: Add support for diffing microcode assembly.
GUI: Add support to "Diff microcode in a graph".
GUI: Added environment variable DIAPHORA_LOG_PRINT to print to stdout instead of using Python logging facilities.
GUI: Added initial support to view callers and callees of functions matches.
GUI: Added microcode related configuration options and updated a bit the main dialogue.
GUI: Added support to configure colors for diffing and call graph viewing.
GUI: Enable, by default, slow heuristics only for databases of ~1,000 functions at much.
GUI: Renamed 'Experimental' to 'Enable Speed Ups', as the old 'experimental' heuristics are either upgraded to 'normal' or removed.
HEUR: Add a filter for a minimum of 3 instructions for heuristic 'Same address, nodes, edges and mnemonics'.
HEUR: Add support for speed ups (internally called 'dirty heuristics') for detected symbols stripped matching and patch diffing.
HEUR: Added 'on_match' events for 'Local Affinity' and 'Callee found diffing matches' heuristics.
HEUR: Added a default ORDER BY clause to order by compilation unit when there is a named compilation unit.
HEUR: Added a minimum ratio of 0.35 for heuristic 'Pseudo-code fuzzy AST hash'.
HEUR: Added a minimum ratio of 0.5 for heuristic 'Pseudo-code fuzzy (normal)'.
HEUR: Added heuristic 'Local affinity' to find matches in functions gaps.
HEUR: Added heuristic 'Same anonymous compilation unit function match'.
HEUR: Added heuristic 'Same compilation unit'.
HEUR: Added heuristic 'Same named compilation unit function match'.
HEUR: Added heuristic 'Same rare basic block mnemonics list'.
HEUR: Added heuristic type HEUR_TYPE_RATIO_MAX_TRUSTED. Results with a bad similarity ratio are assigned to the 'Partial' tab regardless of the calculated ratio.
HEUR: Added heuristics 'Same cleaned microcode' and 'Microcode mnemonics small primes product'.
HEUR: Added self-explanatory new heuristic 'Same rare assembly instruction'.
HEUR: Added support to find matches diffing assembly and pseudo-codes of previous known good matches.
HEUR: All heuristics now select the same fields by calling diaphora_heuristics.get_query_fields()
to retrieve the fields.
HEUR: Allow the heuristic 'Same rare constant' to match functions with at least 3 basic blocks.
HEUR: Always consider functions matching by name the best match, no matter of the ratio that another match might produce.
HEUR: Changed heuristic 'Same nodes, edges and strongly connected components' to 'Same nodes, edges, loops and strongly connected components'. Now loops are also considered for matching.
HEUR: Changed heuristic 'Similar pseudo-code and names' to only consider results with a similarity ratio higher than 0.579.
HEUR: Consider matches only for symbol names that have at least 4 characters for heuristic 'Callee found finding matches'.
HEUR: Consider the first match for heuristic 'Local affinity' function the best match.
HEUR: Do not run assembly based heuristics when diffing different CPU architectures.
HEUR: First proper working version (hopefully) of the support for multimatches.
HEUR: Increased the number of decimal numbers (7) used for comparison ratios.
HEUR: Increased the queries timeout to 5 minutes.
HEUR: Marked heuristic 'Same rare constant' as slow.
HEUR: Moved heurisitc 'Brute force' to the unreliable category.
HEUR: Moved heuristic 'Nodes, edges, complexity and mnemonics with small differences' to the slow ones.
HEUR: Moved heuristic 'Same graph' to the unreliable category.
HEUR: Moved the 'Experimental' heuristics to the 'Partial' category.
HEUR: Order by address the functions for heuristic 'Local affinity', as compilers usually put functions in the same order in binaries.
HEUR: Raised the minimum number of functions to consider running slow heuristics, by default, to 2000.
HEUR: Relax the heuristic 'Same rare constant' to allow good matches with a bad similarity ratio to appear in the 'Partial' tab.
HEUR: Removed heuristic 'Bytes hash and names'.
HEUR: Removed heuristic 'Strongly connected components SPP and names'.
HEUR: Removed heuristics that were not finding anything, namely, 'All or most attributes', 'Same address, nodes, edges and primes (re-ordered instructions)', 'Strongly connected components small-primes-product' and 'Callgraph match'.
HEUR: Removed old wrong and buggy heuristic 'Call address sequence'.
HEUR: Removed the slow flag from heuristics 'Switch structures', 'Pseudo-code fuzzy XXX' and 'Same graph'.
HEUR: Removed unreliable heuristic 'Bytes sum'.
HEUR: Rewrite heuristics 'Same rare KOKA hash' and 'Same rare MD-Index' to use the WITH clause that makes queries much more readable and maintainable.
HEUR: Run slow heuristics at the very end of the diffing process, after the other heuristics.
HEUR: Run the only 2 remaining 'unreliable' heuristics at the very end of the diffing process.
HEUR: The DISTINCT and/or the ORDER BY clauses have been removed in some SQL heuristics because they were causing some queries to never finish triggering SQLite memory errors.
HEUR: The function check_ratio()
now takes into account also microcode to calculate the similarity ratio.
HEUR: Use difflib.unified_diff insted of ndiff because the later is way too slow to call it hundred of thousands of times.
HEUR: When diffing matches to find callees ignore those matches that differ more than 75% of the number of basic blocks.
HEUR: When diffing matches, ignore functions with less than 3 basic blocks to remove potential false positives.
MISC: Added a 'Diaphora:' prefix for log messages.
MISC: Change the text for the 'Call Address sequence' heuristic to show which initial results the matches are based on.
MISC: Fixed some minor typos in the sources.
MISC: Make heuristics flags more pythonic.
MISC: Multiple little refactorizations here and there.
MISC: Renamed heuristic 'Same cleaned up assembly' to 'Same cleaned assembly'
BUG: Add the %POSTFIX% pseudo-field to all SQL heuristics.
BUG: Added the n-th fix to try not to leak cursors at all ever.
BUG: All parallel calls to add_matches_from_query_ratio_max() were wrong.
BUG: Always use the internal dicts for handling matches, never use the choosers except for adding the results at the end.
BUG: Commit every transaction that must be committed.
BUG: Do not analyze the databases each time a diff is started.
BUG: Do not consider IDA's auto-generated names for the 'Same RVA' heuristic.
BUG: Do not crash when there is no chooser (it's None) given for a specific category.
BUG: Do not directly call 'sqlite3.connect()', instead call a wrapper that does whatever initialization is required.
BUG: For every single assembly instruction saved in the SQLite exported database a SELECT statemente was being executed, making it terribly slower.
BUG: Handling timeouts in threads was horribly wrong because there was no code to handle the timeout inside the thread...
BUG: Hopefully final fix for issue #5.
BUG: If a reverser selected File -> Save As from the menu Diaphora would fail to find the .til file and it would crash.
BUG: Instruction level import support was very wrong, even with typos.
BUG: Multiple instances of functions leaking cursors were fixed.
BUG: Regular expression pattern in get_cmp_asm
wasn't properly escaped.
BUG: Removing items from choosers in IDA was broken.
BUG: Some SQL queries were not able to properly execute due to huge B-TREEs being created by SQLite when diffing huge databases.
BUG: Some comparisons (pseudocode and graph) were being shown, wrongly, in a different order than the others.
BUG: Some heuristics were trying to filter with a wrong SQL expression functions starting with the 'sub_' prefix.
BUG: The check to determine if Diaphora should continue finding more callees diffing previous results was wrong.
BUG: The environment variables for multiple items were not being properly handled.
BUG: The logic to handle unmatched choosers was being handled the wrong way (the other way around), which was pretty confusing.
BUG: The members get_cmp_asm
and get_cmp_pseudo
were being called hundreds of thousand of times for no reason when diffing.
BUG: There were still many places were Diaphora could leak cursors in diaphora_ida.py.
BUG: When calling the function check_ratio()
always convert, internally, to float the values of the MD-Indices.
BUG: Workaround implemented for the IDA bug 'max non-trivial tinfo...
Diaphora 2.1.0
Diaphora version 2.1.0: lots of bug fixes, new version of pygments, new core functionality added as well as some little helper scripts. The full list of changes is the following:
MISC: Updated copyright notices.
INFO: Changed version number to 2.1.0
CORE: Allow quick module reloading when developing, so developers don't have to restart IDA every time a diaphora module changes.
CORE: Port instruction operand names.
CORE: Refactored the add_matches_from_xxx() functions. Now we only have a single function with logic.
CORE: Added scripting event 'on_export_crash'.
GUI: Updated pygments to version 2.13.0.
GUI: Added support for diffing assembler and pseudo-code with an external diffing tool.
TOOL: Added a Diaphora script that can be used as a debugging helper.
HEUR: Added heuristic "Same KOKA hash and MD-Index".
HEUR: Removed unreliable heuristic "Loop count".
HEUR: Fixed diaphora_heuristics.py, the testing data to validate heuristics was wrong.
BUG: Use idaapi require instead of import for diaphora modules.
BUG: Fix for issue #227 ("Diaphora only imports symbols inside instructions only 50% of the time).
BUG: When calling hooks.on_match both addresses were always the main database address.
BUG: The function find_callgraph_matches_from() was leaking cursors.
BUG: The function find_brute_force() was leaking cursors.
BUG: Circumvent int/str conversion issues
BUG: Fix deleting items from Unmatched tabs
BUG: Fix re-loaded and re-saved diffs omitting the 2nd DB path
BUG: Make the "fallback" code for missing DB paths work as it was originally intended.
BUG: Workaround for issue #223
BUG: idc.GetDisasm() might fail with an UnicodeDecodeError with, for example, some Chinese characters. The workaround is to build myself the disassembly line.
BUG: Sometimes, the pseudocode diff wasn't showing code properly tabulated.
Diaphora 2.0.6
Diaphora Version 2.0.6
BUG: Do not crash when we cannot analyse one Diaphora SQLite database.
BUG: Diaphora was incorrectly searching the pattern '{}' instead of '[]' for empty list field values. Fix for #219.
GUI: When a reverser uses the "Diff pseudo-code" option and both codes are equal, show a warning message, but also show the diffing.
HEUR: In heuristic "Call Address Sequence" use also the "Partial results" when the function name is the same.
HEUR: Added heuristic "Same RVA". Only matches with a minimum ratio of 0.7 will be considered.
HEUR: Removed the "Slow" flag from the heuristic "Same Rare Constant".
HEUR: Use the 3 calculated fuzzy hashes in heuristic "Pseudo-code Fuzzy Hash".
HEUR: Moved heuristic "Similar Pseudo-code and Names" from the probably unreliable category to normal.
HEUR: Removed wrong heuristics "Similar Small Pseudo-codes" and "Equal Small Pseudo-codes" because they caused a lot of false positives (heuristics for finding matches tend to fail with small functions, and these were no exception).
Also, applied suggestion for issue #220.
Diaphora 2.0.3
This is mainly a bug fixes release. The following is the whole list of things that were fixed, modified or updated:
HEUR: Added heuristic "Same address and mnemonics"
BUG: Fixed many mistakes in the heuristics files
BUG: Added required import when diaphora_ida.py is imported by another Python script
BUG: Change "python" to "python3" in the header of diaphora.py
BUG: Multiple code clean-ups
MISC: Updated copyright notices
BUG: Fix for the bug "Importability for global variables in certain cases is significantly diminished".
BUG: When importing at instruction level, some elements were being missed because only one way was considered (wrongfully).
MISC: Warn users if they are using Python2 instead of Python3 in Diaphora master version
BUG: Always show demangled function names in the choosers.
BUG: Ignore small functions in find_matches_in_hole()
.
BUG: Do not consider good matches anything with a ratio smaller than 0.5 in find_from_matches()
.
BUG: GUI: Pseudo-code diffs where being done the other way around.
BUG: Do not show multiple times the message "special segments cannot be decompiled".
BUG: The function get_function_from_dictionary()
was broken, crashing all project specific scripts.
HEUR: Make a default heuristic the previously experimental 'Call address sequence' one when using best matches.
HEUR: The heuristic "Same rare constant" is now considered a slow one.
HEUR: The heuristic "Equal small pseudo-code" is now considered unreliable.
BUG: IDA home seems not to export "ctree_visitor_t", thus, Diaphora fails with this version.
BUG: When parsing the "preds" for a basic block during export time, the "succs" where being used. This bug has ~5 years.
BUG: When launching Diaphora from the command line, self.hooks was not initialized.
BUG: If a constant started with an invalid utf-8 character the Python error UnicodeDecodeError would be triggered.
BUG: Sometimes, IDA might fail getting the definition of a local type with the usual "UnicodeDecodeError" error.
BUG: Functions with the same name, with the same MD-Index with a value > 10.0 were incorrectly always considered to be absolutely equal even when there were small changes (ie: only one line)
EXPORT: Added field 'userdata' to the exported data.
BUG: Run the diff even if project_script is None
DIFF: Added scripting support for the diffing process.
HEUR: Select all the fields that are commonly used by most SQL based heuristics.
HOOKS: Added a 2nd example to script the diffing process.
BUG: set_func_cmt doesn't accept an ea_t object any more
BUG: set_func_cmt doesn't accept an address (ea_t), it need now a function object.
GUI: Remove HTML characters from messages, they aren't supported any more.
GUI: Set a fixed column width for line wrapping in the pseudo-code diffing view.
GUI: Add menu separators.
GUI: Only show choosers and register the menus if there is anything to show.
CORE: Do not fail when diffing if database versions are different, just show a warning.
Diaphora 2.0.2
Yet another mini bug fix release with a small new feature added:
- GUI: Add a menu item to let reverse engineers relaunch the diffing process again.
Bugs fixed:
- BUG: TIL names can be retrieved as bytes instead of as str.
- BUG: Often we were printing the message 'Exception: number' because the tarjan sort implementation was failing if some node wasn't found.