Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bzlmod: module extension-invoked repository rules from .bzl files that (transitively) load from external repos yield cycle errors w/usage in WORKSPACE.bzlmod #21289

Closed
rrbutani opened this issue Feb 11, 2024 · 5 comments
Assignees
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P1 I'll work on this now. (Assignee required) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug

Comments

@rrbutani
Copy link
Contributor

rrbutani commented Feb 11, 2024

issue

This is another issue like #20942 but with a twist: rather than being triggered by usages of external repos within the repository rule that's invoked by the module extension, this time the cycle error is triggered by loads (including transitive loads) from an external repo 1 by the .bzl file defining the repository rule.

To recap, here are the conditions necessary to trigger this error:

  • a repository rule that, in it's defining .bzl file, (transitively) loads from an external repo
  • a module extension that invokes this repository rule, producing a repo that's use_repo'd
  • a WORKSPACE.bzlmod file that then loads this repo

Like in #20942 the actual cycle is not displayed (even with #20958 applied):

bazel build ...
INFO: Invocation ID: 38cfc65c-9d9a-4f4e-9731-20c53a5acbc1
ERROR: Failed to load .bzl file '@@_main~ext~some_repo//:info.bzl': possible dependency cycle detected.
ERROR: Error computing the main repository mapping: cycles detected during computation of main repo mapping
Computing main repo mapping:

This issue is not fixed by #20982.

Further, bazel info (on alternate runs) causes Bazel to crash:

bazel info
Starting local Bazel server and connecting to it...
INFO: Invocation ID: 328afa3f-1859-4f67-b4f1-32b7d2f5add8
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.NullPointerException: Cannot invoke "java.lang.Throwable.getMessage()" because "cause" is null
        at com.google.devtools.build.lib.analysis.config.InvalidConfigurationException.<init>(InvalidConfigurationException.java:53)
        at com.google.devtools.build.lib.skyframe.SkyframeExecutor.createBuildConfigurationKey(SkyframeExecutor.java:1854)
        at com.google.devtools.build.lib.skyframe.SkyframeExecutor.getConfiguration(SkyframeExecutor.java:1774)
        at com.google.devtools.build.lib.runtime.commands.InfoCommand.lambda$exec$0(InfoCommand.java:158)
        at com.google.common.base.Suppliers$NonSerializableMemoizingSupplier.get(Suppliers.java:181)
        at com.google.devtools.build.lib.runtime.commands.InfoCommand.exec(InfoCommand.java:215)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:679)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:250)
        at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:604)
        at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:676)
        at io.grpc.Context$1.run(Context.java:566)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)

minimal repro

# MODULE.bazel
bazel_dep(name = "bazel_skylib", version = "1.4.2")

use_repo(use_extension("//:mod_ext.bzl", "ext"), "some_repo")
# BUILD.bazel
# mod_ext.bzl

# i.e. a load from an external (from bzlmod) repo
load("@bazel_skylib//rules:common_settings.bzl", "BuildSettingInfo")

repo = repository_rule(lambda rctx: [
    rctx.file("BUILD.bazel"),
    rctx.file("info.bzl", content = "TEST = 1"),
][0])

ext = module_extension(lambda _: repo(name = "some_repo"))
# WORKSPACE.bzlmod
load("@some_repo//:info.bzl", "TEST") # This load is necessary to induce the error.
print(TEST)

Same behavior as in #20942; we see errors after the first evaluation of .bzl files records the deps.

To reproduce, run:

  • bazel build ...: loads .bzl files, writes marker files
  • bazel clean
  • bazel build ...: recompute repo mapping, using the marker files written

Errors with:

ERROR: Failed to load .bzl file '@@_main~ext~some_repo//:info.bzl': possible dependency cycle detected.
ERROR: Error computing the main repository mapping: cycles detected during computation of main repo mapping
Computing main repo mapping:

Alternatively:

  • bazel info
  • bazel shutdown
  • bazel info

Errors with:

FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.NullPointerException: Cannot invoke "java.lang.Throwable.getMessage()" because "cause" is null
        at com.google.devtools.build.lib.analysis.config.InvalidConfigurationException.<init>(InvalidConfigurationException.java:53)
        at com.google.devtools.build.lib.skyframe.SkyframeExecutor.createBuildConfigurationKey(SkyframeExecutor.java:1854)
        at com.google.devtools.build.lib.skyframe.SkyframeExecutor.getConfiguration(SkyframeExecutor.java:1774)
        at com.google.devtools.build.lib.runtime.commands.InfoCommand.lambda$exec$0(InfoCommand.java:158)
        at com.google.common.base.Suppliers$NonSerializableMemoizingSupplier.get(Suppliers.java:181)
        at com.google.devtools.build.lib.runtime.commands.InfoCommand.exec(InfoCommand.java:215)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:679)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:250)
        at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:604)
        at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:676)
        at io.grpc.Context$1.run(Context.java:566)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)

A bazel clean --expunge is necessary to get bazel build, bazel info, etc. to work again.

python bzlmod test

Alternatively, here's a minimal repro as a bzlmod python test, in the style of @Wyverald's test in #20982:

def testExtensionRepoMappingChange_mainRepoEvalCycleWithWorkspaceCausedByLoad(self):
    # Regression test for #21289
    self.ScratchDir('bar')
    self.ScratchFile('bar/MODULE.bazel', ['module(name = "bar")'])
    self.ScratchFile('bar/BUILD.bazel')
    self.ScratchFile('bar/defs.bzl', ["TEST = 1"])
    self.ScratchFile(
        'MODULE.bazel',
        [
            'ext = use_extension(":ext.bzl", "ext")',
            'use_repo(ext, "repo")',
            'bazel_dep(name="bar",version="0.0")',
            'local_path_override(module_name = "bar", path = "bar")'
        ],
    )
    self.ScratchFile(
        'BUILD.bazel',
        [
            'load("@repo//:defs.bzl", "STR")',
            'print("STR={}".format(STR))',
            'filegroup(name="lol")',
        ],
    )
    self.ScratchFile(
        'ext.bzl',
        [
            'load("@bar//:defs.bzl", "TEST")',
            'def _repo_impl(rctx):',
            '  rctx.file("BUILD")',
            '  rctx.file("defs.bzl", "STR = 1")',
            'repo = repository_rule(_repo_impl)',
            'def _ext_impl(mctx):',
            '  print("ran the extension!")',
            '  repo(name = "repo")',
            'ext = module_extension(_ext_impl)',
        ],
    )
    self.ScratchFile('WORKSPACE.bzlmod', ['load("@repo//:defs.bzl","STR")'])

    _, _, stderr = self.RunBazel(['build', '--enable_workspace', ':lol'])
    self.assertIn('STR=1', '\n'.join(stderr))

    # Shutdown bazel so we use the .marker files:
    self.RunBazel(['shutdown'])
    # Build again. This (currently) fails with a cycle error :(
    _, _, stderr = self.RunBazel(['build', '--enable_workspace', ':lol'])
    self.assertNotIn('ran the extension!', '\n'.join(stderr))

repository rule vs module extension

Important

It's a dependency (transitively) on an external repo from the .bzl file where the repository rule is defined that causes a problem.

If we split mod_ext.bzl in the above so that the repository rule is defined in a separate repo.bzl, this has the issue (external load is present in repo.bzl):

# mod_ext.bzl

load("//:repo.bzl", "repo")

ext = module_extension(lambda mctx: repo(name = "some_repo"))
# repo.bzl

load("@bazel_skylib//rules:common_settings.bzl", "BuildSettingInfo") # !!!

repo = repository_rule(lambda rctx: [
    rctx.file("BUILD.bazel"),
    rctx.file("info.bzl", content = "TEST = 1"),
][0])

And this (external load is only in mod_ext.bzl) does not:

# mod_ext.bzl

load("@bazel_skylib//rules:common_settings.bzl", "BuildSettingInfo") # !!!
load("//:repo.bzl", "repo")

ext = module_extension(lambda mctx: repo(name = "some_repo"))
# repo.bzl

repo = repository_rule(lambda rctx: [
    rctx.file("BUILD.bazel"),
    rctx.file("info.bzl", content = "TEST = 1"),
][0])

details

cycle

As mentioned, Bazel does not report the actual cycle error, even with @fmeum's patch in #20958. This is because these particular cycles do not match this predicate:

&& Iterables.any(cycle, Predicates.or(IS_REPO_RULE, IS_EXTENSION_IMPL))) {

We can see the cycle in the java log though:

Found cycle : [
    Key{repoName=@@, rootModuleShouldSeeWorkspaceRepos=true},
    external,
    com.google.devtools.build.lib.skyframe.ExternalPackageFunction$$Lambda/0x00000008002ce788@fe5a7a7,
    [/dev/bazel/bzlmod-workspace-file-repo-mapping-cycle-issue-repro]/[WORKSPACE], 0,
    KeyForWorkspace{label=@@_main~ext~some_repo//:info.bzl, isBuildPrelude=false},
    CONTAINING_PACKAGE_LOOKUP:@@_main~ext~some_repo//,
    PACKAGE_LOOKUP:@@_main~ext~some_repo//,
    REPOSITORY_DIRECTORY:@@_main~ext~some_repo
] from [
    Key{repoName=@@, rootModuleShouldSeeWorkspaceRepos=true},
    external,
    com.google.devtools.build.lib.skyframe.ExternalPackageFunction$$Lambda/0x00000008002ce788@fe5a7a7,
    [/dev/bazel/bzlmod-workspace-file-repo-mapping-cycle-issue-repro]/[WORKSPACE], 0,
    KeyForWorkspace{label=@@_main~ext~some_repo//:info.bzl, isBuildPrelude=false},
    CONTAINING_PACKAGE_LOOKUP:@@_main~ext~some_repo//,
    PACKAGE_LOOKUP:@@_main~ext~some_repo//,
    REPOSITORY_DIRECTORY:@@_main~ext~some_repo
]

marker file

<output_base>/external/@_main~ext~some_repo.marker has:

149b4849965392c8eb420a2271f82c6a77a5a20850d07ef3ff8e49f6e6b137a7
REPO_MAPPING:,bazel_skylib bazel_skylib~1.4.2
STARLARK_SEMANTICS 0

Removing the REPO_MAPPING for skylib "fixes" the issue (at least until the marker is updated...) as we'd expect.

misc

Running with --noenable_workspace also sidesteps this error, as we'd expect.

I think it's possible to abuse the unfortunate behavior described by @Wyverald here as a "workaround" (i.e. factoring out the literal repository_rule invocation into a separate .bzl file where there are no transitive external loads).

I cannot reproduce this issue on 7.0.2.

root cause

With the above repro and this script git bisect is able to find where this first started failing:

#!/usr/bin/env bash

# See: https://git-scm.com/docs/git-bisect#_bisect_run
#  - 0: good
#  - 125: indeterminate
#  - _: bad

quiet=(
    --ui_event_filters=-info
)

bazel build --incompatible_sandbox_hermetic_tmp=false "${quiet[@]}" //src:bazel || exit 125
readonly bazel_bin=$(realpath $(bazel cquery "${quiet[@]}" //src:bazel --output=files))

git checkout MODULE.bazel.lock || :

readonly repro_workspace=$(realpath bzlmod-workspace-file-repo-mapping-cycle-issue-repro)
readonly install_base="${repro_workspace}/bazel_install_base"
readonly output_base="${repro_workspace}/bazel_output_base"

rm -rf "${install_base}" "${output_base}"

bzl() {
    "${bazel_bin}" \
        --output_base="${output_base}" --install_base="${install_base}" \
        "${1}" "${quiet[@]}" "${@:2}"
}

{
    cd "${repro_workspace}" || exit 125
    bzl info >/dev/null # load `.bzl` files, make .marker files
    bzl shutdown # clear in-memory
    bzl info || exit 1 # construct repo mapping again, using .marker files
}

bzl shutdown # if all succeeded
exit 0
git bisect startgit bisect bad           # a54a393d209ab9c8cf5e80b2a0ef092196c17df3, HEAD as of this writinggit bisect good HEAD~200 # 59ac9ce297eb1bc0c47d4dc5de788908208735e0git bisect run ./bisect.sh
9edaddd6273de776a3f60245f5c56f82f9622caf is the first bad commit

9edaddd comes from #21131 which indeed did not make it into 7.0.2 (but has been applied to the 7.1.0 branch: #21172)


cc: @fmeum @Wyverald @meteorcloudy

I'm not familiar with this part of the Bazel codebase but, IIUC, the issue is that when resolving deps for the mod-ext produced repo (on the path: root mapping -> WORKSPACE.bzlmod -> mod-ext provided repo (some_repo) -> [repo from bzlmod (skylib)]), the root mapping is requested again forming a cycle (even though in reality only bzlmod-produced repos should be accessible to the repo rule since it's instantiated from bzlmod).

It's not clear to me what the best way to fix this would be but please let me know if there's anything I can do to help.

Footnotes

  1. Not sure if it's all external repo loads or just loads from bzlmod produced repos; since repository rules invoked by module extensions don't see WORKSPACE.bzlmod.defined repos it's maybe not a meaningful distinction.

@github-actions github-actions bot added the team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. label Feb 11, 2024
@meteorcloudy meteorcloudy self-assigned this Feb 13, 2024
@Wyverald
Copy link
Member

@meteorcloudy I think this is actually a bug (the same one that @fmeum mentioned about rules_jni on Slack). I'll see if I can get to it this week, but either way, we should fix it for 7.1.0.

@Wyverald
Copy link
Member

@bazel-io fork 7.1.0

Wyverald added a commit that referenced this issue Feb 16, 2024
…SPACE

Mappings for repos defined in WORKSPACE are extremely annoying to verify given the chunked loading (we'd have to record which chunk the repo mapping was used in, and then load that chunk while verifying). This is extremely not worth the effort (especially since nobody really uses repo mappings in WORKSPACE) so we just don't record it.

This also means, during verification, we can safely use the main repo mapping without WORKSPACE (since repos defined in Bzlmod can't see stuff from WORKSPACE anyway).

Fixes #21289.
@Wyverald
Copy link
Member

Thank you so much for the extremely detailed bug report, @rrbutani! Your report made understanding and reproducing the issue much easier.

I've sent #21393 to fix this.

Wyverald added a commit that referenced this issue Feb 20, 2024
…SPACE

Mappings for repos defined in WORKSPACE are extremely annoying to verify given the chunked loading (we'd have to record which chunk the repo mapping was used in, and then load that chunk while verifying). This is extremely not worth the effort (especially since nobody really uses repo mappings in WORKSPACE) so we just don't record it.

This also means, during verification, we can safely use the main repo mapping without WORKSPACE (since repos defined in Bzlmod can't see stuff from WORKSPACE anyway).

Fixes #21289.

Closes #21393.

PiperOrigin-RevId: 608258493
Change-Id: Ife6a01221e6f5e1685d859eaa5acc8b370ab8483
Wyverald added a commit that referenced this issue Feb 20, 2024
…SPACE

Mappings for repos defined in WORKSPACE are extremely annoying to verify given the chunked loading (we'd have to record which chunk the repo mapping was used in, and then load that chunk while verifying). This is extremely not worth the effort (especially since nobody really uses repo mappings in WORKSPACE) so we just don't record it.

This also means, during verification, we can safely use the main repo mapping without WORKSPACE (since repos defined in Bzlmod can't see stuff from WORKSPACE anyway).

Fixes #21289.

Closes #21393.

PiperOrigin-RevId: 608258493
Change-Id: Ife6a01221e6f5e1685d859eaa5acc8b370ab8483
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.1.0 RC1. Please test out the release candidate and report any issues as soon as possible. Thanks!

rrbutani pushed a commit to rrbutani/bazel that referenced this issue Feb 28, 2024
…SPACE

Mappings for repos defined in WORKSPACE are extremely annoying to verify given the chunked loading (we'd have to record which chunk the repo mapping was used in, and then load that chunk while verifying). This is extremely not worth the effort (especially since nobody really uses repo mappings in WORKSPACE) so we just don't record it.

This also means, during verification, we can safely use the main repo mapping without WORKSPACE (since repos defined in Bzlmod can't see stuff from WORKSPACE anyway).

Fixes bazelbuild#21289.

Closes bazelbuild#21393.

PiperOrigin-RevId: 608258493
Change-Id: Ife6a01221e6f5e1685d859eaa5acc8b370ab8483
@Sineaggi
Copy link

Was the bazel stacktrace meant to be fixed in 7.1.x? I see the exact same stacktrace when using https://github.com/GoogleContainerTools/rules_distroless in GoogleContainerTools/rules_distroless#37 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P1 I'll work on this now. (Assignee required) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants
@Wyverald @Sineaggi @meteorcloudy @rrbutani @iancha1992 @rakshitasingh05 and others