30 Years of Decompilation and the Unsolved Structuring Problem: Part 2 #7
Replies: 2 comments 5 replies
-
Many thanks for the inspiring sharing, and thanks for your constructing works! May I ask about your comment about the new researches on recovering the variable names & types? i.e., the sp24 paper "Len or index or count, anything but v1": Predicting Variable Names in Decompilation Output with Transfer Learning, and the DIRTY and DIRE work. These works are using languages models to recover stripped information, which is quite a different path of improving decompilation from your work. I felt this path is quite limited by the quality of the dataset, not to mention other natural challenges of the decompilation task. I'm also trying these tools, and I'm curious about how do you think of this trend: using AI or even SOTA LLM to enhance decompilation (like some papers have already focused on this topic: Xu, Xiangzhe, et al. "LmPa: Improving Decompilation by Synergy of Large Language Model and Program Analysis." arXiv preprint arXiv:2306.02546 (2023). Jin, Xin, et al. "Binary code summarization: Benchmarking chatgpt/gpt-4 and other large language models." arXiv preprint arXiv:2312.09601 (2023). ). Many thanks! : ) |
Beta Was this translation helpful? Give feedback.
-
"A perfect decompiler should produce a 0 CFGED, meaning 0 graph edit distance, and the same gotos as the source." This seems to be an incorrect statement based on the fallacy that two different source codes cannot produce the same compiled output especially in the context of compiler optimizations. In fact there is ambiguity which I think can be proven that a source e.g. littered with gotos could produce the same output as structured code. So I think such a claim should be revised, as the most structured source version that would compile to produce such a byte code represented by such a CFG. |
Beta Was this translation helpful? Give feedback.
-
30 Years of Decompilation and the Unsolved Structuring Problem: Part 2
A two-part series on the history of decompiler research and the fight against the unsolved control flow structuring problem. In part 1, we revisit the history of foundational decompilers and techniques, concluding on a look at modern works. In part 2, we deep-dive into the fundamentals of modern control flow structuring techniques, and their limitations, and look to the future.
https://mahaloz.re/dec-history-pt2
Beta Was this translation helpful? Give feedback.
All reactions