Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException from aquery with aspects and protobuf output #15716

Closed
cmcgee1024 opened this issue Jun 21, 2022 · 4 comments
Closed

NullPointerException from aquery with aspects and protobuf output #15716

cmcgee1024 opened this issue Jun 21, 2022 · 4 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Performance Issues for Performance teams type: bug

Comments

@cmcgee1024
Copy link

Description of the bug:

When running an aquery on a target that was previously built with an aspect ("—aspects" command-line) and the same daemon process the following NullPointerException (NPE) occurs.

java.lang.NullPointerException
	at com.google.devtools.build.lib.skyframe.actiongraph.v2.ActionGraphDump.dumpAspect(ActionGraphDump.java:247)
	at com.google.devtools.build.lib.query2.aquery.ActionGraphProtoOutputFormatterCallback.processOutput(ActionGraphProtoOutputFormatterCallback.java:104)
	at com.google.devtools.build.lib.query2.engine.OutputFormatterCallback.process(OutputFormatterCallback.java:54)
	at com.google.devtools.build.lib.buildtool.PostAnalysisQueryBuildTool.doPostAnalysisQuery(PostAnalysisQueryBuildTool.java:187)
	at com.google.devtools.build.lib.buildtool.PostAnalysisQueryBuildTool.postProcessAnalysisResult(PostAnalysisQueryBuildTool.java:78)
	at com.google.devtools.build.lib.buildtool.BuildTool.buildTargets(BuildTool.java:227)
	at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:443)
	at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:411)
	at com.google.devtools.build.lib.runtime.commands.AqueryCommand.exec(AqueryCommand.java:137)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:585)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:231)
	at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:550)
	at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:614)
	at io.grpc.Context$1.run(Context.java:579)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

This bug is difficult to reproduce since it is intermittent and happens with a specific complex target in our environment. Here's a generalized template of the command that exhibits the problem in our environment.

bazel shutdown && bazel clean && bazel build --aspects=//tools:foo.bzl%foo_aspect //:help && bazel aquery --aspects=//tools:bar.bzl%bar_aspect —include_aspects=true 'deps(//:help)' --output=proto > /tmp/foo.proto

Note: the foo_aspect is added for the bazel build, but not for the bazel aquery, which uses the bar_aspect.

The NPE does not occur if the aspect is not added to the original build. The output of the aquery must be proto, and not the default text output. Also, the NPE does not occur if the daemon is shut down in between the two commands like this.

bazel shutdown && bazel clean && bazel build --aspects=//tools:foo.bzl%foo_aspect //:help && bazel shutdown && bazel aquery --aspects=//tools:bar.bzl%bar_aspect —include_aspects=true 'deps(//:help)' --output=proto > /tmp/foo.proto

Which operating system are you running Bazel on?

macOS 12.3

What is the output of bazel info release?

5.1.1-8-g30432e50c5

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

This particular NPE does not appear to show up in the typical places.

Any other information, logs, or outputs that you want to share?

WARNING: Targets were missing from graph: [ConfiguredTargetKey{label=@ext//:abc, config=BuildConfigurationValue.Key[deadbeef]}]
WARNING: Targets were missing from graph: [ConfiguredTargetKey{label=@ext//:abc, config=BuildConfigurationValue.Key[deadbeef]}, ConfiguredTargetKey{label=//bazel/tools/toolchain/foo:bar, config=BuildConfigurationValue.Key[deadbeef]}, ConfiguredTargetKey{label=//bazel/tools/toolchain/foo:bar, config=BuildConfigurationValue.Key[deadbeef]}]
WARNING: Targets were missing from graph: [ConfiguredTargetKey{label=@ext2//:def, config=BuildConfigurationValue.Key[deadbeef]}, ConfiguredTargetKey{label=@bazel_tools//tools/jdk:toolchain_java8, config=BuildConfigurationValue.Key[deadbeef]}]
@cmcgee1024
Copy link
Author

cmcgee1024 commented Jun 21, 2022

After further investigation here are some observations.

ActionGraphProtoOutputFormatterCallback.processOutput() is calling accessor.getAspectValues(configuredTarget) but not checking whether the values are null, which causes the NPE around line 104. A similar class ActionGraphTextOutputFormatterCallback has a simple null check in its processOutput() that guards against this. There should probably be a null check in all OutputFormatterCallback classes, including ActionGraphSummaryOutputFormatterCallback. They all call ConfiguredTargetValueAccessor.getAspectValues(), which gets aspect values from walkableGraph.getValue() and this method is declared @nullable, so we can expect nulls to show up in the Collection in some cases.

More than just the NPE, there appears to be some kind of a caching problem as well. Of the two commands above, the first one exhibits this null pointer problem, but only with this specific target. The revDeps looks something like this in ConfiguredTargetValueAccessor:

[0] foo_aspect -> value = null
[1] TargetCompletionValue (not quite sure what this is)
[2] bar_aspect -> value not null

Since the foo_aspect's value is null, I checked walkableGraph.getException() and walkableGraph.isCycle(). Exception is null and cycle is false.

If I run the command above on a different, but similar target I sometimes get this with no NPE:

[0] foo_aspect -> value not null
[1] TargetCompletionValue
[2] bar_aspect -> value not null

However, if I put a shutdown command in between the build and aquery I get this.

[0] bar_aspect -> value not null

From my perspective, I find the last case to match better what I expect to happen here. Something about the build command remains in the daemon in memory and affects the aquery. These aspect values, as long as they are not null, get dumped to the output. So, it appears that depending on the state of the bazel daemon you get different output. I suspect that there's another bug here since the output can sometimes contain this extra aspect, and other times not. I'm not sure if this could affect tools that are interpreting the protocol buffer output.

@sgowroji sgowroji added type: bug team-Performance Issues for Performance teams untriaged labels Jun 22, 2022
@meisterT meisterT added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Jul 28, 2022
@joeleba
Copy link
Member

joeleba commented Apr 19, 2023

Thanks for filing the issue. The NPE shouldn't be allowed to happen, there's a fix coming out today. The inconsistency you described is actually a known limitation of aquery (#14156).

@brentleyjones
Copy link
Contributor

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Apr 19, 2023
@keertk
Copy link
Member

keertk commented Apr 20, 2023

@bazel-io fork 6.2.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Apr 20, 2023
keertk added a commit that referenced this issue Apr 24, 2023
This should have been included in unknown commit (which added the same check for text output).

FIXES #15716.

PiperOrigin-RevId: 525434548
Change-Id: I5fc80fa1f81ccf5f7b0d8b5d826d8418e2239306

Co-authored-by: Googler <leba@google.com>
fweikert pushed a commit to fweikert/bazel that referenced this issue May 25, 2023
This should have been included in unknown commit (which added the same check for text output).

FIXES bazelbuild#15716.

PiperOrigin-RevId: 525434548
Change-Id: I5fc80fa1f81ccf5f7b0d8b5d826d8418e2239306
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Performance Issues for Performance teams type: bug
Projects
None yet
Development

No branches or pull requests

7 participants