-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LXR Crash on HBase Launch #69
Comments
Hi @kk2049, thanks for reporting the error. I can confirm that the error exists even with the latest MMTk. Our OpenJDK binding does not yet fully support some interfaces, e.g., inspecting heap usage. Another issue I noticed is that you were probably running If you are building JDK with MMTk/LXR on your own, please remove any additional cargo features. Also, for precise performance measurement, MMTk by default counts all the malloc-ed metadata as part of the Java heap size. This differs from OpenJDK GCs. Please enable the Rust feature Also, the runtime GC parameters TRACE_THRESHOLD2, LOCK_FREE_BLOCKS, MAX_SURVIVAL_MB, SURVIVAL_PREDICTOR_WEIGHTED should be removed. These parameters have changed significantly over the past year. Their current default values are relatively optimized and should just work. |
Another issue related to the JVM args: Based on what I found in your AE repo, However, the latest LXR in the LXR branch has fixed this issue and should now respect the numbers specified in |
Re-submitted as an upstream issue: mmtk/mmtk-openjdk#270 |
Thank you for providing valuable information. We are currently in the process of re-evaluating the performance of LXR based on your feedback. However, it appears that the performance we are observing still differs from what is documented in the performance history at Additionally, if there are any recommended stable versions of LXR that you believe would be more suitable for our testing environment? Greatly appreciate your guidance! |
We use an automation tool called running-ng for precise and reproducible performance evaluation. And I use this tool to periodically monitor LXR performance change over time, to check if it still matches the claims we made in the paper. For example, the config for the latest result is stored in these two files. Some noticeable details:
Here are two lusearch logs produced by this evaluation. Each file contains the 40 invocations of LXR or G1. If you manually expand the modifiers in runbms.yml, it should match the the first line of these two log files. The numbers in the log files should also match the graphs here http://squirrel.anu.edu.au/plotty-public/wenyuz/v8/p/CaGxpz The whole evaluation is a relatively automated process. You should be able to reproduce some similar results if you use the same parameters, same commits, and roughly same machine specs. You can also try to directly use running-ng to run the benchmark. You need to update runbms.yml at the following places:
With all these fixed, use the following command to run the benchmark:
LXR is not yet fully stable. But the above evaluation run had the best performance results. So probably just use these commits for your evaluation. You can also use the latest LXR commits, the performance should be a little bit better than this evaluation (I haven't upload the results yet). |
Sorry one thing I forgot to mention: For a fair comparison with ZGC, you may want to disable compressed pointers. But I'm dealing with some issues on production right now. I believe one of my workarounds only works when enabling compressed oops, otherwise it will crash. Recent versions of LXR probably can't run without compressed pointers. (But lxr-2023-07-09 probably can). |
Hi @kk2049, I've recently taken a look at the AE repo again, particularly focusing on the script used for running the dacapo benchmarks. I noticed a few potential areas for refinement in the methodology that you're currently employing. Firstly, a single benchmark invocation cannot lead to a stable result and hence a strong conclusion. There can be noise across different runs. It would be better to run more invocations, perhaps ~40 or more. Please note that this is different to the Secondly, it’s not accurate to compare two JDK configs with more than one variable changed. For instance:
Lastly, it would be better to use the latest stable release of dacapo-23.11-chopin. The version you were using, dacapo-evaluation-git-b00bfa9, is a very early-stage snapshot (~2 years ago) and many issues have been resolved since then in the stable release. We also adapt the above methodology for all our evaluations. Hope this can be helpful for both your evaluation and reproducing our results. |
Thank you for your previous guidance! @wenyuzhao We are working on addressing some of the shortcomings you mentioned. As you may know, we have a modified version of h2 which includes an additional parameter to limit the instantaneous emission rate of requests to a certain value. Currently, we are conducting tests on the response time curves of both lxr and g1. The launch command for lxr has been adjusted based on your provided instructions. The lxr version has been updated according to your recommendation. Could you please assess whether the data we have obtained appears reasonable? Are we employing lxr in the correct manner? Your guidance is highly valued and appreciated! Launch script # LXR
MMTK_PLAN=LXR numactl -C 0-7 -m0 $LXR_JDK/jdk/bin/java -XX:+UseThirdPartyHeap -XX:MaxMetaspaceSize=1024m -XX:MetaspaceSize=1024m -Xmx2382m -Xms2382m -Xlog:gc=info -jar $DACAPO_PATH h2 --throttle $Throughput
# G1
numactl -C 0-7 -m0 $JDK11/java -XX:MaxMetaspaceSize=1024m -XX:MetaspaceSize=1024m -Xmx2382m -Xms2382m -XX:ParallelGCThreads=8 -XX:ConcGCThreads=2 -XX:InitiatingHeapOccupancyPercent=65 -XX:+UseG1GC -Xlog:gc=info -jar $DACAPO_PATH h2 --throttle $Throughput RT Curve
LXR version
|
Looks like LXR is enabled correctly. FYI this commit (4ab99bb) requires to enable the I'm not sure how h2 throttling is enabled. H2 in dacapo is a fixed-workload benchmark. I’m not sure whether scaling throughput like this can work well. It may affect metered latency calculation(?) |
Hello,
I am currently conducting performance testing on LXR with HBase and have encountered some issues. The commit ID of
mmtk-core
I am using is333ffb8ad9f9089fcad4338eae0ca9bf81d4955e
, which is the latest commit of branchlxr-2023-07-09
. Below is the reproduction process for the identified bug.After the script execution is complete, HBase did not start correctly. By examining the logs at
./hbase-2.4.14/log
, the following error messages can be observed.Thank you for your attention to this matter. Let me know if you need any further information.
The text was updated successfully, but these errors were encountered: