Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GR-33602] [RFC] More User-friendly Output for Native Image #3955

Closed
fniephaus opened this issue Oct 27, 2021 · 24 comments · Fixed by #4066
Closed

[GR-33602] [RFC] More User-friendly Output for Native Image #3955

fniephaus opened this issue Oct 27, 2021 · 24 comments · Fixed by #4066
Assignees
Milestone

Comments

@fniephaus
Copy link
Member

fniephaus commented Oct 27, 2021

Update: This feature has landed on master and will ship with the GraalVM 22.0 release.
You can try it out using a GraalVM nightly build.

Description

Currently, the output produced during native image generation is limited to a list of stages, how long each of them took to run, and GC footprint. We want to make this output more user-friendly by providing relevant information for end-users. Overall, the goal is to help users better understand what happens during image generation and how a change, whether it is on their end or ours, influences that process in terms of time to run and memory usage.

We'd like to use this issue to discuss possible features and use cases for this new output mode with the community.

TLDR: Screencast

ni-output-demo.mp4

Full Example

GraalVM Native Image 22.0.0: Generating HelloWorld...
--------------------------------------------------------------------------------
User-provided arguments:
  -H:+JNI -H:+AllowFoldMethods -H:FallbackThreshold=0 \
  --initialize-at-build-time=... (add -H:+ShowAllArguments to show all)
--------------------------------------------------------------------------------
[1/8] Initializing...                                          (5.872s @ 0.96GB)
      1 native library included: SDL2
      12 user-provided features:
        - org.mypackage.MyFeature
        - org.myotherpackage.MyOtherFeature
        - org.myotherpackage.MyOtherFeature
        - org.myotherpackage.MyOtherFeature
        - ... 8 more (add -H:+ShowAllFeatures to show all)
      18 classes registered for reflection
[2/8] Performing analysis...      [~**~****~**~*]            (159.540s @ 2.81GB)
      18,153 reachable methods 
       1,418 reachable classes of 3,123 loaded classes
[3/8] Building universe...                                     (3.090s @ 2.81GB)
[4/8] Parsing methods...          [**********]                (29.150s @ 3.18GB)
[5/8] Inlining trivial methods... [**********]                (21.721s @ 3.48GB)
[6/8] Compiling methods...        [**********]               (100.633s @ 4.18GB)
[7/8] Creating image with 255,854 objects...                  (12.594s @ 4.07GB)
      31.56MB in code size
      12.34MB in heap size from 512 build-time-initialized classes
[8/8] Writing 44.15MB to disk...                               (1,897s @ 4.07GB)
      Image located at '/path/to/image'.
      All generated files listed in '/path/to/build_artifacts.txt'.
--------------------------------------------------------------------------------
Top 5 packages in code area:            Top 5 objects in heap:
   2MB *.com.oracle                      835KB *.[B
 277KB *.java.util                       417KB *.[C
 217KB *.java.lang                       121KB *.Lcom...SystemPropertiesSupplier
 142KB *.java.{security,net,text,math}   120KB *.Ljava/lang/Class
  77KB *.sun.nio                         100KB *.[I
               (add -H:+ShowAllFullBreakdowns to show all)
--------------------------------------------------------------------------------
GC Stats: 16 collection(s) in 1.687s (max committed: 4.18GB, max RSS: 4.02GB).
--------------------------------------------------------------------------------
Finished generating HelloWorld in 345.542s (1204.02s of CPU time).

(Unable to show colors and links.)

Feature List

Features are grouped by priority. Please feel free to propose changes and additional features in the comments.

Must-Have

Name Purpose Complexity
Current stage Users want to know which stage is currently running.
Example: [3/7] Performing analysis...
low
Image location Users need to know where to find the image.
Note: Native image already reports *.build_artifacts.txt
Example: Image located at: '/path/to/image'
trivial
Image size Users care about the size of the image.
Example: [7/7] Writing 44.15 MiB to disk...
trivial
Timing info Users want to know how much time each stage takes to run.
Example: [3/7] Performing analysis... ...done in 159.540s.
medium/low
Reachability info Users want to know how many methods and classes (and packages?) are reachable.
Example: 18,153 methods and 1,418 classes are reachable.
low
Version info Users need this info, for example, to file issues.
Example: Generating HelloWorld with GraalVM Native Image 22.0.0...
trivial
Analysis progress Report analysis progress (e.g., based on iterations during analysis) to give users some feedback of how changes influence the time to analyze their application, without having to know about implementation details.
Example see full example.
medium
High-level statistics Provide some high-level statistics of what's included in the native image (e.g., top 5 largest packages included in the binary)
Example see full example.
medium
Prefix Line prefix such as [my-image:14475] helps to distinguish parallel builds (uncommon use case, but there should be an option). trivial

Should Have

Name Purpose Complexity
Show progress Indicate progress with . (similar to JUnit) or other chars.
Example: Performing Analysis... toooto with t=typeflow, o=object graph.
low
List user-provided Features Users can implement org.graalvm.nativeimage.hosted.Features and having the image builder use them during the build process. When the classpath/modulepath that the user provides, contains features that are active during build it constitutes a major potential to affect every step of the image build in various ways.
Example: Found 5 user-provided Features: MyFeature, ...
low
GC summary Provide a short GC summary (e.g., number of full GCs, time in GC). An excessive number of GCs can cause slowdowns.
Example: GC Stats: 16 collection(s) in 1.687s (used: 0.57; committed: 3.22; max: 11.38)
low
Stage info Provide info (or docs?) that explains what each stage is doing.
Example see "Hyperlinks in console".
medium
Loaded classes Show how many classes were loaded/initialized at startup. Possibly in context with the number of reachable classes.
Example: 1024 classes loaded.
medium
Reflection overhead Inform users about how many classes are registered for reflection. low
Space info Show how large the boot heap is and how much space is needed for methods and constants.
Example: 12.34MiB in code size. 31.56MiB in heap size.
unknown
Report RSS sizes "Provide a more accurate picture of consumed memory. (#3879)"
Question: Can this be supported across all platforms? Probably yes.
Example: 1.11 GB (heap) / 1.87 GB (rss)
medium

Nice to Have

Name Purpose Complexity
Colorful mode Colorizes the output. Common in other tools (e.g., maven, gradle, ...). Turned on by default if a console is detected (System.console() != null && System.getenv("TERM") != null). Can be disabled with a flag. low
Hyperlinks in console "Technically this is of the same kind as colors and test attributes (escape sequence based). The *.build_artifacts.txt file is a good example where this could be useful. Also for Stage info (links to our documentation for each stage) this would be neat. In the terminal output below e.g. /usr/lib/systemd/system/gdm.service is a hyperlink. More info."
Example: [3/7] Performing analysis...
low
GC usage Users may want to know how much memory is consumed after each phase.
Question: Is this info really needed per stage or is max heap size more interesting?
Example: ...done in 159.540s (memory usage at 0.96GB).
medium
List native libraries (and paths) Allow users to understand which native libraries are included. This is something users can control and may want to optimize.
Question: Is this really useful?
Example: 1 native library included: SDL2
low
CPU usage Users want to know how much CPU time is consumed during image generation.
Example: Process CPU time: 120.02s
medium

Won't Have (this time)

Name Purpose Reason
Progress bars using \r Visualize progress over time. \r incompatible with log files and some terminals.
Interactive dashboard Provide overview of entire building process. Terminal cursors incompatible with log files and some terminals.
Info pulled from jvmstat/JFR Provide Java performance metrics. Too low-level for end-users.
User interactions Allow users to interact with image generation process.
Example: 18,153 methods and 1,418 classes are reachable. Press C to continue or S to search for reachable elements (C/S):
Let not use cmd line tools for something like that.
GraalVM banner It is quite common for CLI tools to print a banner these days. Wastes valuable screen estate.
Analysis iterations Report number of iterations during analysis to give users a better understanding of how changes influence the analysis. Knowing how many analysis iterations happen gives valuable information on what happened/went wrong during analysis. E.g. a user might have a Feature implementation that implements duringAnalysis. If this is used in the wrong way it can mess up the analysis phase.
Example: Analysis required 4 rounds of typeflow operations and 9 rounds of object graph checks.
Analysis iterations are an implementation detail that could change at some point and should not be exposed to users.
Print user arguments Show how native image was invoked. Might be redundant (e.g., Quarkus already shows CLI command) and bloat log files.
Example: User-provided arguments: -H:+JNI -H:+AllowFoldMethods -H:FallbackThreshold=0 -H:+ReportExceptionStackTraces ...
Too much output for little benefit
Max heap size Track heap size over time and report max usage/size so that users know how much RAM they need to allocate for their jobs.
Example: Peak memory consumption to generate the image: 5.42GiB
Reporting peak RSS instead
Machine-readable output option Provide an option to produce output readable by automation tools.
Question: What would be a good output format for this? Suggestion: JSON + JSON schema
Left for future work

Other Things to Consider

  • The output should be of reasonable size
    • Native image might run as part of CI and should not bloat log files too much
      • GraalVM team runs builds in parallel so logs may overlay (uncommon for end-users)
    • Maven and Gradle already produce a lot of output
    • Try to stick to the common terminal width of 80 characters
@chrisseaton
Copy link
Contributor

Great idea - never sure why we didn't attempt to show some kind of progress and time estimate given how long it takes to run.

@dougxc
Copy link
Member

dougxc commented Oct 27, 2021

Drop prefix

There needs to be a flag for still having the prefix. This is very useful in CI logs where parallel native image builds can be running.

@fniephaus
Copy link
Member Author

fniephaus commented Oct 27, 2021

There needs to be a flag for still having the prefix. This is very useful in CI logs where parallel native image builds can be running.

Running native image builds in parallel is not something users normally do so this shouldn't be the default. We might keep the current output around for CI or add an option for the prefix.

@chumer
Copy link
Member

chumer commented Oct 27, 2021

I really like the stats part:

--------------------------------------------------------------------------------
Top 5 packages in code area:            Top 5 objects in heap:
   2MB *.com.oracle                      835KB *.[B
 277KB *.java.util                       417KB *.[C
 217KB *.java.lang                       121KB *.Lcom...SystemPropertiesSupplier
 142KB *.java.{security,net,text,math}   120KB *.Ljava/lang/Class
  77KB *.sun.nio                         100KB *.[I
--------------------------------------------------------------------------------

Could be useful to print the option name for showing all packages and objects. Similar to -H:+ShowAllFeatures.

Could we get a preview of the colourful mode? (when its ready)

@fniephaus
Copy link
Member Author

Could be useful to print the option name for showing all packages and objects. Similar to -H:+ShowAllFeatures.

Good idea! Will extend that...

Could we get a preview of the colourful mode? (when its ready)

Of course, I'll share a first implementation when it's ready so that everyone can give this a spin :)

@fniephaus fniephaus changed the title [RFC] More User-friendly Output for Native Image [GR-33602] [RFC] More User-friendly Output for Native Image Oct 27, 2021
@spockz
Copy link

spockz commented Oct 27, 2021

I like all the improvements that are listed here. What I would love is for native-image to print more information on what it was doing when it crashed. My main experience (and significantly improved already) over the last 2 years has been as follows: try compiling with configuration generated by assisted configuration; compiler crashes with some exception minutes into the process or stops with a linking issue or the generated image has issues with symbols. Then look at generated config, intelligently guess what really needs to be in and repeat until it works. This process then tends to repeat on any dependency upgrade or native-image change.

It would save me a lot of time if checks could be moved forward (fail fast(er)) and if on exception a context of the jar/module/class/method that the compiler was currently working on could be printed.

@fniephaus
Copy link
Member Author

What I would love is for native-image to print more information on what it was doing when it crashed.

I agree, there's still room for improvement. While good crash reports are important, I hope you don't mind if we focus on the output of a successful build in this issue and discuss your ideas in a separate one. Could you open another issue for this please?

@fniephaus
Copy link
Member Author

I've updated #3955 (comment) with a screencast of a prototype. Please feel free to leave any feedback.

@zakkak
Copy link
Collaborator

zakkak commented Nov 3, 2021

Great work @fniephaus! This is a huge improvement. Please see below for some comments

GC usage

Question: Is this info really needed per stage or is max heap size more interesting?

I think having a GC usage next to each phase can be valuable as well, e.g.

Parsing methods...          [**********]                (29.150s using 3.18GB -- 10GCs in 1.687s)

Or we could just have the time spent in GCs, since the number of GCs is not necessarily that important (especially for young collections), e.g.

Parsing methods...          [**********]                (29.150s -- 1.687s in GC @ 3.18GB)

To give a usage example of this, with that in place we can easily quantify the impact of -H:+ParseOnce on memory usage (as in graalvm#304 (comment))

Machine-readable output option

What would be a good output format for this?

If we want all data in a single file we need a format that can support hierarchical structures, e.g. json, alternatively we could produce multiple csv files, e.g. one for the data from the different phases, one for the top packages/objects, etc.

I personally like the tidiness of a single file but I also like the simplicity of CSV files and the ability to import data to spreadsheets without any hustle. :) No strong opinion on my side.

Reflection overhead

Unclear what/how to measure; may be useful for Native Image devs but not end-users.

In Quarkus we have observed that the more registrations for reflection we do the higher the overhead.
That said a single number reporting the number of class members registered for reflection could be a good start.
This way users will be able to:

  1. See how registering more members for reflection affects compilation times (and potentially runtime overhead)
  2. Easily spot changes that cause more registrations for reflection, e.g., when updating a library or enabling some Feature.

@fniephaus
Copy link
Member Author

Thanks for your feedback, @zakkak! Glad you like it so far. :)

Parsing methods... [**********] (29.150s -- 1.687s in GC @ 3.18GB)

I'm not sure how much info we really want to display there to be honest. Your suggestion is quite GC-focused and since GCs tend to behave in non-deterministic ways, the output can be quite noisy. Maybe we could add some warnings instead that only show up if we detect an excessive number of GCs? Would that work for your use case and if yes, how would you identify excessive GCs?

If we want all data in a single file we need a format that can support hierarchical structures, e.g. json, alternatively we could produce multiple csv files, e.g. one for the data from the different phases, one for the top packages/objects, etc.

As always, it depends on the use cases we want to support. A single file can only reasonably be produced at the end of a run and provide a summary while multiple smaller files could potentially be produced during the build process. I'm leaning toward a summary file that can be processed by a script during CI, for example, to detect if the image size exceeds a certain threshold. jq is quite popular and easy to install, so maybe JSON is a good choice. We can, of course, always add support for more file formats if there's a need.

A single number reporting the number of class members registered for reflection could be a good start.

That sounds reasonable... will look into this, thanks!

@spockz
Copy link

spockz commented Nov 4, 2021

@fniephaus I really like the demoed output. Reading both your remarks and responding, I would say that any output produced by default on the shell should somehow be actionable for me. It triggers me to think: If I see long GCs that should maybe be a trigger for me to tweak some parameter? If I see the top 5 classes, should I do something about that?

Regarding the machine readable output, json is very easy to deal with so that should be fine. Please do provide something like a json schema to describe the contents of the file as well.

@fniephaus
Copy link
Member Author

any output produced by default on the shell should somehow be actionable for me.

I agree. While I think we should keep the amount of output text to a minimum, one way we can address this is through good documentation that explains the output items and possible actions. We can then add appropriate links from the output to the docs.

Regarding the machine readable output, json is very easy to deal with so that should be fine.

Good!

Please do provide something like a json schema to describe the contents of the file as well.

Noted.

@zakkak
Copy link
Collaborator

zakkak commented Nov 5, 2021

Maybe we could add some warnings instead that only show up if we detect an excessive number of GCs? Would that work for your use case and if yes, how would you identify excessive GCs?

I guess it would, after all no matter the output we will end up using GC logs to better understand what's going on, so a warning would be enough.

What comes to mind is observing the past X GCs and what percentage of the heap they were able to reclaim. If for instance the past 10 GCs where each able to reclaim only 1% that could trigger a warning. But that again seems GC specific, e.g. 10 full GCs in parallelGC reclaiming 1% is terrible, but I guess it's OKish for 10 concurrent minor GCs.

Another alternative would be to estimate the ratio of time spent doing work over time spent in GC for the past Y secs (e.g. 30) if that comes anywhere near 1 or below then we are probably in trouble. Again I am not sure how that would work for concurrent GCs where it's hard to separate the time spent in GC from the time spent in doing "actual" work.

@fniephaus
Copy link
Member Author

Thanks, @zakkak! Those are all good ideas but I have the feeling that we may need more data points (which the new output can provide) before we can come up with a good way to detect GC issues in native image. On my local machine, for example, I can observe larger numbers of GCs when memory pressure is high, but the time spent in GC isn't anywhere near the time "doing work". What do you see in memory-constrained environments where the OOM killer likes to kick in?

Also, I was under the impression that users typically +UseParallelGC. Do Mandrel/Quarkus recommend other GCs or even use a different GC by default?

@zakkak
Copy link
Collaborator

zakkak commented Nov 8, 2021

Hi @fniephaus

Thanks, @zakkak! Those are all good ideas but I have the feeling that we may need more data points (which the new output can provide) before we can come up with a good way to detect GC issues in native image. On my local machine, for example, I can observe larger numbers of GCs when memory pressure is high, but the time spent in GC isn't anywhere near the time "doing work". What do you see in memory-constrained environments where the OOM killer likes to kick in?

In graalvm#304 (comment) (which is quite extreme I admit, since the live data set happens to be almost identical to the available memory) I observed the following:

GraalVM CE 21.3
Total time ~16.5m
Time spend in Full GCs ~11.5m
# of GCs 275
# of Full GCs 125

This is like 70% of the total time spent in GC. If I was to compare the time spend in GC over the time spend in "doing work" per phase I would expect this percentage to go even higher. For instance doing such a comparison for the "universe" phase gives me:

GraalVM CE 21.3
Universe phase 434645.38 ms
Time spend in Full GCs during universe phase 426924.08 ms
# of GCs 55
# of Full GCs 55

This is about 98% of the time being spent in GC, and a 100% rate of Full GCs (indicating that there is not enough space to reclaim in the young generation) reclaiming only a couple hundreds of megabytes (indicating that there is not enough space in the old generation either).

Also, I was under the impression that users typically +UseParallelGC.

Correct.

Do Mandrel/Quarkus recommend other GCs or even use a different GC by default?

The answer is no to both questions. The reason I mention other GCs is because of your earlier comment which I might miss-understood:

Your suggestion is quite GC-focused and since GCs tend to behave in non-deterministic ways, the output can be quite noisy.

My understanding was that by "GC-focused" you were worried that the proposed measurements would only make sense for Parallel GC and not other GCs, but it looks like you didn't mean this.

@fniephaus
Copy link
Member Author

I've implemented a test after the end of each stage that checks whether the time spent in GC dominates the overall time to run the stage and is above 15s. If the test fails, the following warning is displayed:

image

(Here, I decreased both threshold to trigger the warning earlier.)
The GB value displayed in the second line of the warning is peak RSS.

@zakkak
Copy link
Collaborator

zakkak commented Nov 16, 2021

Nice :)

I've implemented a test after the end of each stage that checks whether the time spent in GC dominates the overall time to run the stage and is above 15s.

What's the threshold to determine whether the time spend in GC dominates the overall time? Is this set to 50%?

When you say "overall time" you refer to the overall time spent for the last phase, 66.3s in this case, right?

Do you think it would make sense to be able to tune those thresholds (or disable the warning) to avoid warnings in memory constraint configurations that are known to be stressing the memory but for some reason need to run like this (e.g. lack of resources)?

@fniephaus
Copy link
Member Author

What's the threshold to determine whether the time spend in GC dominates the overall time? Is this set to 50%?

Yes, it's currently set to a 0.5 ratio.

When you say "overall time" you refer to the overall time spent for the last phase, 66.3s in this case, right?

Technically, it checks the delta since the last check. But that's very close to the time spent for the last phase.

Do you think it would make sense to be able to tune those thresholds (or disable the warning) to avoid warnings in memory constraint configurations that are known to be stressing the memory but for some reason need to run like this (e.g. lack of resources)?

Yes, we could/should do that. An option for suppressing such warnings sounds reasonable. Noted!

zakkak added a commit to zakkak/quarkus that referenced this issue Nov 29, 2021
Starting with GraalVM 22.0 `native-image` will produce different
output (see oracle/graal#3955)

This patch makes the gradle integration test whether the GraalVM being
used is < 22.0 and perform the corresponding assertions.
@jerboaa
Copy link
Collaborator

jerboaa commented Nov 29, 2021

@fniephaus Shouldn't this be closed as it's been implemented with e04a9cd ?

@fniephaus
Copy link
Member Author

As mentioned by @jerboaa, we have implemented a new user-friendly build output mode with e04a9cd. The new mode is enabled by default and the old output can be restored with -H:-BuildOutputUseNewStyle. Run native-image --expert-options-all | grep "BuildOutput" to see all options for the new output.

I will keep this issue open for the next few weeks in case there is more feedback from the community. Feel free to report any issues with the new output here or in a new issue.

@fniephaus fniephaus added this to the 22.0 milestone Nov 29, 2021
@eregon
Copy link
Member

eregon commented Dec 4, 2021

Suggestion: add the percentage of time spent in GC, not just the seconds, so

                            21.6s spent in 154 GCs | Peak RSS: 11.48GB | CPU load: ~739.85%
becomes
                            21.6s (7.2%) spent in 154 GCs | Peak RSS: 11.48GB | CPU load: ~739.85%

graalvmbot pushed a commit that referenced this issue Dec 5, 2021
@fniephaus
Copy link
Member Author

Keeping this open for collecting more feedback.

21.6s (7.2%) spent in 154 GCs | Peak RSS: 11.48GB | CPU load: ~739.85%

@eregon 7.2% doesn't make it clear to what it relates to. Maybe (7.2% of total time) or something like that?

@fniephaus fniephaus reopened this Dec 5, 2021
@eregon
Copy link
Member

eregon commented Dec 5, 2021

Yeah, that sounds good.

@fniephaus
Copy link
Member Author

Thanks again for all the feedback on this! The new build output has shipped with the GraalVM 22.0 release.

If you have any further ideas or experience any issues, please file another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants