-
-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extreme delay in soot.jimple.toolkits.typing.fast.Typing when loading a method body #1053
Comments
I encountered a second app that triggers this problem: ch.financefox.app v1.9.36 (154) In this app at least three methods are affected which take a very long time in the |
I found out more details on the app The interesting part is that in difference to the app I mentioned in my first post ( Affected are 3 methods - 2 of them I was able to identify:
IMHO this indicates that in the As the code of
Digging deeper I found the title of the paper: Efficient Local Type Inference - it seems to be this paper: https://dl.acm.org/citation.cfm?id=1449802 I have to see how to get this paper - hopefully it contains some details that can light up the darkness around the |
Always good when one puts one's papers not beyond a paywall... There is the
free version of Bellamy's work as an undergrad thesis that you can find on
the internet. It probably contains most of the information in the oopsla
version, or more.
…On Sat, Nov 24, 2018, 9:06 AM Jan S. ***@***.*** wrote:
I found out more details on the app ch.financefox.app:
The interesting part is that in difference to the app I mentioned in my
first post (org.telegram.plus) in ch.financefox.app the delay problem
only occurs when the option include_all == true (I currently have to use
this option because of #1060 <#1060>).
However in this case the problem occurs it is even worse as with the app
org.telegram.plus I mentioned in the my first post:
Affected are 3 methods - 2 of them I was able to identify:
- <io.realm.a: java.lang.String toString()> - processing time ~ 2 hours
- <io.realm.al: java.lang.String toString()> - processing time ~ 6
hours
- third method (was'n able to identify it) - processing time *more
than 12 hours*!
IMHO this indicates that in the Typing minimization method is a serious
problem hidden.
Especially because from my point of view there should be no difference
whether the library classes are loaded fully or not.
As the code of Typing.java is almost 10 years old I assume that something
else has changed that violates certain assumptions of the Typing class.
Unfortunately the whole Typing code is nearly free of comments. The only
clue I found is the commit comment by @ericbodden
<https://github.com/ericbodden>: 97b20b5
<97b20b5>
new (faster and more precise) Type Assigner by Ben Bellamy (see OOPSLA '08)
Digging deeper I found the title of the paper: *Efficient Local Type
Inference* - it seems to be this paper:
https://dl.acm.org/citation.cfm?id=1449802 I have to see how to get this
paper - hopefully it contains some details that can light up the darkness
around the Typing class.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1053 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AByidp2lfIroHJcG31jFSSlDXjSO1sqfks5uyVJbgaJpZM4YGDKC>
.
|
Hi all. I know that I have encountered similar problems with individual - usually generated - methods in the past. I think the problem is intrinsic to the algorithms that Soot uses. IIRC they have cubic worst-case complexity, which can be large for some certain methods. It would be great to have an implementation with smaller complexity! |
I am still trying to understand how the type minimizing works in reality and why this performance problem arises. Therefore I was trying to find a simple example that demonstrates the problem. I found the a code snippet and simplified it a bit which helped me to understand what the problem is. Note: If you want to try this example on your own you have to set the option
Now what is the problem here? The problem is caused by the Elvis operator using different types for the true and false case: However this means that the number of potential Now we take into account that the minimize algorithm has cubic complexity which makes it obvious where the problem problem arises... |
Hi Jan. Thanks for further investigating and illustrating the underlying problem! Do you have any ideas on improving the situation at hand? |
On a very short term there is the possibility to parallelize the Other concrete options are difficult because I still don't understand how the Typing are in the end applied. However from an intuitive perspective the most problematic aspect is that the In general IMHO one general problem of the implemented algorithm is that it bases on set theory. Set theory is intuitive but is difficult to implement efficiently. |
Thanks for letting us know your thought on the problem! Using a chess-like algorithm sounds interesting, while I currently can't quite figure out how that would solve the problem out of my head. Let us know if you are further investigating the issue! For now, unfortunately, this seems to be a bit out of scope for us. |
Hi Manuel, I think I expressed it a bit complicated and a chess game tree was only an example for the structure. Think of the typing problem like as a game with only one player. At the beginning we have a method with a set of variables and each variable has a type. This is our game board and the variable typings are the pawns. Changing one type of a variable is a movement in our game. If we have multiple possible types for one variable we have multiple movements of our pawn. Now Soot come into play, however it does not build up a tree, instead it builds every combination of every possibly type of every variable. Looking at our game tree soot builds a set for each possible path within our tree. Of course this explodes with exponential complexity. Now we use a game tree instead. If we make a decision on one of the first levels it only have to be made once where as soot has to make this type decision (and therefore the check which one to use) up to n^2 times. In the end we have to test several paths within our tree, however even in the worst case we don't have to perform as many operations as soot currently does. |
Wow, thanks for the explanation! |
Today I made significant progress on this issue. I added code for automatic conversion of the On the other hand the saved time is really extreme - building the tree runs in But implementing the |
We still have some test classes and cases that are not used (foremost because I don't know how they work and did not find any time to incorporate them into Maven's test phase yet): https://github.com/Sable/soot/tree/master/src/it These are pretty small and I don't know if they would be of much help for testing your typing algorithm. You could use the test framework I've set up for testing Soot as a whole with provided source class files: https://github.com/Sable/soot/blob/master/src/systemTest/java/soot/testing/framework/AbstractTestingFramework.java Generally, it might make sense to take some open-source Java corpus, compile and run the code and then compare it's output to a version that has been read-in and written-out again by Soot. |
@mbenz89 Thanks for the links. The AbstractTestingFramework.java looks interesting. However I am not sure if I really code for read source code as test input. My current test cases directly test the Typing minimization, therefore as input I only need a list of At the moment a student is working on implementing some more test cases. Based on an run on 2000 apps I have identified those that contain methods where the minimization is used (methods that have more than one Regarding my TypingTree algorithm I had to realize that there are some more cases than I initially were aware of. Hopefully I will find some time to proceed with my TypingTree implementation. |
Great! Let us know when you make progress :) |
The last days I was working again on improving the typing minimization performance. Unfortunately I am not capable of designing a new algorithm that is faster than O(n2), but I noticed some room for improvements in the existing minimize algorithm: In recent version of Soot the affected method is Effectively it uses Based on my understanding the following is true for the compare method:
That means our matrix is mirrored on the determinant (considering the results 0, 1, -1, other of compare). I am currently changing the minimize implementing to implement this. As this is still not enough for certain programs to be analyzed I am working on a parallelized version of |
Thanks a lot for the effort! It's been too long since I last looked at the implementation, which is why I cannot confirm right now that what you are proposing will be correct. But it certainly might be. I think we'd be happy to test it out! |
I have merged the improved algorithm and there have been no issues with the FlowDroid test cases (which are more than 1,000 now). We have not seen any issues with our commercial code scanner and its massive test suite either, despite this test suite containing rather complex real-world applications. That makes me rather confident that this change didn't break something obvious. |
Great, thanks a lot @StevenArzt ! |
When loading the app
org.telegram.plus 4.9.1.4 (13760)
Soot ends up in an endless loop (runs several hours without returning) in the methodsoot.jimple.toolkits.typing.fast.Typing.minimize(List<Typing> tgs, IHierarchy h)
.This method is pretty special as it contains ~1700 local variables in Jimple.
In detail this happens if I try to retrieve the body of the method
<org.telegram.ui.ChatActivity: org.telegram.ui.ActionBar.ThemeDescription[] getThemeDescriptions()>
. Processing this methods ends up that the minimize method is called with atgs
of 16384 elements - looks like Soot can't handle this corner case.Or is it simply because the implementation of minimize has a runtime of O(n^2) which means it has to perform 16384^2 = 268435456 compare operations?
As the whole class is not very well commented I don't have a clue of the purpose of this class and of the minimize method in detail. Therefore I can't fix this bug myself.
Any help would be appreciated.
The text was updated successfully, but these errors were encountered: