Java-Profiling

Code with some major performance issues for the profiling/performance lab

Background

In this lab we'll use VisualVM - http://visualvm.github.io/index.html, a Java profiling tool, to analyze performance issues in a Java program that creates and processes large numbers of randomly generated trees.

This lab will be all "in-class"; there won't be anything to turn in, but these concepts and tools are important. The lab write-up leads you to the answer to a significant degree, so you could just race through and, "figure out who dunnit", and not really learn a lot. I'd encourage you instead to stop at each of the questions below and put some significant thought into them. You'll learn a lot more and get a lot more out of the experience if you try to solve a lot of these problems yourself instead of just jumping ahead to my "answer". You might also discover some cool different way to think about the problem that I hadn't thought of.

Get the code

You will use the code in your repository. You don't need to commit anything back since we're not actually turning stuff in, but you might find it helpful if you want to be able to track the changes you make during the day, or come back to this later. After you clone the repository be sure to compile the code.

An overview of the code

The ideas behind this code and the lab come from a real-world performance problem some students and Nic ran into while doing some genetic programming research a few years ago, but He stripped things down to the bits of the code that matter for the lab. Look around a little, probably starting with Main.java. The key thing that happens here is that we construct a large number of randomly generated trees that represent simple arithmetic expressions. Leaves are small integers, and non-leaves (internal nodes) are operators (one of +, -, *, or /). After we construct those trees, we compute the size of them all, printing timing and memory usage information at important points in the process.

Some semi-sophisticated techniques have been employed in an effort to increase flexibility (not really the issue here) and improve performance (which is what we care about). There are two places, for example, where caching has been introduced to try to avoid duplication of computation and memory. One is in RandomTreeFactory.doBuildRandomTree(), which is the core of the algorithm that generates random trees. Here, after we've constructed a random (sub)tree, we look in the cache to see if we've already generated that tree. If we have, then we return the old copy (the one we found in the cache) instead of the newly generated one. This ensures that if we end up generating a fragment like "5 + 3" several dozen times, there will actually only be one copy of it in memory, and every use of it will point to that copy.

The second use of caching is in Tree.acceptVisitor(). Parts of this are fairly tricky because we're using the visitor pattern - http://en.wikipedia.org/wiki/Visitor_pattern here. The visitor pattern is a really beautiful design pattern, but also one of the more complex ones when you first encounter it. All you really need to understand for this lab, though, is that when we call a method like Tree.size(), we'll check to see if we've ever called size() on this tree before, and if we have we'll just return that result rather than recursively traverse the tree again.

Does time to construct the trees seem reasonable?

Run the program (just the program), remembering to hit enter so the program actually keeps going. This may actually take quite some time (depending upon the machine that you are using). Look at and probably scribble down on some scratch paper the timing and memory results. Now cut the number of trees N that get generated in half (this is controlled in Main, or you can use a command line argument) and run it again. Play with a few different values of N and see what you get.

‼️ You'd typically expect the time to construct N trees to be roughly linear in N. Is that true in this case?

Why does construction take so long?

Our preliminary data suggests that construction is indeed taking longer than makes sense. We could just bang around in the code, hoping to stumble across something that helps, or we can take a data-driven approach.

‼️ Before you collect data, look over the code in RandomTreeFactory (and things it calls elsewhere) and sketch out a list of two or three possible places you think might be bottlenecks.

Start up !VisualVM and connect it to our process

We have a nice program in the lab called visualvm (http://visualvm.github.io/index.html/).

Begin by having two tabs open in terminal. Use the first tab to launch visualVM. This will pop up a window for VisualVM 1.3.9. You can't do much from the initial GUI until you've connected to a running Java process. Once you've done that, though, you can do all kinds of cool things.

In the other tab begin the java program: I went to the src directory of my cloned repository and (after compiling the java code) typed java treeshuffle.asgraph.Main. The program prompts you to hit enter DO NOT HIT ENTER YET. First ensure that in the VisualVM application there is now an entry for treeshuffle.asgraph.Main. Double Click on this. You should see tabs for

Overview
Monitor
Threads
Sampler
Profiler

Once that appears you may bring up the terminal tab with the program and hit enter. The Profiler tab does not play nicely with the CPU, but the Memory option works fine-- and you can use it to see which method(s) are using the most memory. We are concerned with the CPU however... so go to the Sampler tab and click on CPU. You can use it to see where the program is spending most of its time. Now watch as the profiling information rolls in...

You actually have to do all this every time you (re)start your program. VisualVM connects to a specific process, and you get a new process whenever you restart the program.

‼️ What method or methods seem to be taking up all the CPU time? Do they make sense given that you're primarily constructing a large number of new trees? Do the relative importance of different methods change over time? Do those changes make sense?

The program will probably run much more slowly with profiling activated, so feel free to kill it if you're getting bored. Make a point of jotting down the results for the top few methods, though, so you can compare these results to results you'll get after you make some changes to the code.

Why all these calls to equals()?

Lots of calls to doBuildRandomTree() make sense, but what's up with all these equals() calls? The one odd bit in RandomTreeFactory is the caching; could that be related to the problem?

‼️ Turn off caching in doBuildRandomTree() and run the program again (either with or without the profiler). Does that seem to help?

One might be tempted to just decide that the caching sucks here and remove it, but that's actually not a good choice.

‼️ Spend a few moments trying to figure out why caching leads to a ton of equals() calls in particular. How are lookups done in HashMaps?

doBuildRandomTree() doesn't seem to directly call equals(), which suggests that maybe there are indirect calls coming from somewhere. It turns out that when we look something up in a HashMap we first compute the hashCode() to identify the bucket to look in, and then do linear search in that bucket. This means that if we have a poor hashCode() definition (one with lots of collisions) then we end up with long (and increasingly long) linear searches in the buckets as we add more and more trees to the cache in RandomTreeFactory. That's not good.

Try to improve Node.hashCode()

The hashCode() in Leaf is pretty reasonable, so the problem is likely in Node.hashCode().

‼️ Look at Node.hashCode(). How might we end up with collisions from this? (Can you come up with two simple trees with the same hash code?)

‼️ Improve Node.hashCode(). I'd use the algorithm in Bloch's Effective Java which is also summarized in this Wikipedia entry - http://en.wikipedia.org/wiki/Java_hashCode(). If you think you have something that improves things, first just run it normally, and if that looks promising run it through VisualVM. Be sure to jot down some notes on the VisualVM results. How do they compare to your earlier results?

Now we have a lot fewer calls to equals() and things run much better. It looks like computing our more complex hashCode(), however, is now a significant consumer of resources. The problem is that we end up re-computing hash codes over and over. Maybe we could cache this as well?

‼️ Implement a simple cache for Node.hashCode(). Create a field (I called mine hash) and initialize it to some known value (I used 0). In the hashCode() method, check and see if your field has the initial value. If it does, then compute the hash code, assign the result to your field, and return it. If it doesn't have the initialized value, it's presumably because we've already computed and cached this hash code, so just return the value of your field.

‼️ Run VisualVM on system with the new Node.hashCode() and record your results. Have we improved things?

Does the cache in !RandomTreeFactory help?

Now that we've resolved a major bottleneck caused by the hash codes, we'll turn out attention to the two big caching mechanisms and see if they actually help much. We go to a fair amount of trouble with all the caching in RandomTreeFactory, for example, and it would be nice to know whether it was worth it.

‼️ Turn off the caching in RandomTreeFactory and run things without VisualVM. Will it even run?

Let's try to figure out what's going on here. Turn the caching back on, and re-run your program and connect it to !VisualVM. This time, instead of choosing the "Profiler" tab, choose the "Monitor" tab. One of the plots there shows the heap usage over time.

‼️ How does the heap usage change over time? We normally expect the memory usage to grow roughly linearly as we create more and more trees. What's happening here? Why?

‼️ Try turning the caching off and watch what the monitor in !VisualVM tells you. You might increase the maximum amount of heap allocated to your Java process to see what effect that has.

Does the cache in Tree help?

In a similar fashion, we should see if the caching in Tree helps.

‼️ Turn it off and collect data both from the command line output and from the !VisualVM profiler. Does this cache help?

What did we learn from all this?

All in all, what did we get from this? What was really important to the performance of the system?

-- Main.NicMcPhee - 29 Nov 2011
-- Main.VincentBorchardt - 05 Nov 2012
-- Main.JohnMcCall - 06 Nov 2013

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src/treeshuffle/asgraph		src/treeshuffle/asgraph
test/treeshuffle/asgraph		test/treeshuffle/asgraph
Install_VisualVM.txt		Install_VisualVM.txt
LICENSE		LICENSE
README.md		README.md
visualvm_launcher_u1_eclipse_36.zip		visualvm_launcher_u1_eclipse_36.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Java-Profiling

Background

Get the code

An overview of the code

Does time to construct the trees seem reasonable?

Why does construction take so long?

Start up !VisualVM and connect it to our process

Why all these calls to equals()?

Try to improve Node.hashCode()

Does the cache in !RandomTreeFactory help?

Does the cache in Tree help?

What did we learn from all this?

About

Releases

Packages

Languages

License

averyalive/Java-Profiling

Folders and files

Latest commit

History

Repository files navigation

Java-Profiling

Background

Get the code

An overview of the code

Does time to construct the trees seem reasonable?

Why does construction take so long?

Start up !VisualVM and connect it to our process

Why all these calls to equals()?

Try to improve Node.hashCode()

Does the cache in !RandomTreeFactory help?

Does the cache in Tree help?

What did we learn from all this?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages