Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrapping Bazel in Alpine+ARM hangs forever #17220

Open
seanmor5 opened this issue Jan 15, 2023 · 13 comments
Open

Bootstrapping Bazel in Alpine+ARM hangs forever #17220

seanmor5 opened this issue Jan 15, 2023 · 13 comments
Labels
help wanted Someone outside the Bazel team could own this P3 We're not considering working on this, but happy to review a PR. (No assignee) team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug

Comments

@seanmor5
Copy link

Description of the bug:

Hi there, awhile ago I opened: #16484

I was able to get around the issue by running the container on an x86 Linux machine. I am now trying to do the same thing with aarch64. I assumed the issue was exclusive to Docker just being bad on Mac; however, I am running into the same issue on EC2. I've tried various versions of Alpine and Bazel (5.3, 6.0) with no success. This is what I run to bootstrap:

I've tried this on these EC2 AMIs, as well as on a Raspberry Pi with Alpine 3.16 installed:

  • alpine-ami-3.14.2-aarch64-r0 ami-00604621aea32b1f5
  • alpine-3.16.0-x86_64-bios-cloudinit-r0 ami-0c9f21a3f1772d2d8

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

This is what I've been using to bootstrap:

export BAZEL_VERSION="5.3.0" \
    BAZEL_SHA256SUM="ee801491ff0ec3a562422322a033c9afe8809b64199e4a94c7433d4e14e6b921  bazel-5.3.0-dist.zip" \
    JAVA_HOME="/usr/lib/jvm/default-jvm"

apk update && apk upgrade && \
    apk add --no-cache python3 py3-pip python3-dev && \
    apk add --no-cache libstdc++ openjdk11 && \
    apk add --no-cache bash curl git wget && \
    apk add --no-cache musl-dev make libexecinfo libexecinfo-dev && \
    apk add --no-cache coreutils gcc g++ linux-headers unzip zip && \
    apk add --no-cache automake gcc subversion && \
    DIR=$(mktemp -d) && cd ${DIR} && \
    curl -sLO https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-dist.zip && \
    echo ${BAZEL_SHA256SUM} | sha256sum --check && \
    unzip bazel-${BAZEL_VERSION}-dist.zip && \
    EXTRA_BAZEL_ARGS="--tool_java_runtime_version=local_jdk" bash ./compile.sh && \
    cp ${DIR}/output/bazel /usr/local/bin/ && \
    rm -rf ${DIR}

Which operating system are you running Bazel on?

Alpine

What is the output of bazel info release?

n/a

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

See above

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

Most of the time is just hangs at:

🍃  Building Bazel from scratch......
🍃  Building Bazel with Bazel.
.

Or something like patching repository for one of the first few packages

@sgowroji sgowroji added team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website untriaged type: bug labels Jan 16, 2023
@yesudeep
Copy link
Contributor

yesudeep commented Jan 17, 2023

Could you please paste the output for df -h?

I've seen similar behavior when VMs report running out of space (even when technically they haven't).

@yesudeep
Copy link
Contributor

BTW, I'd recommend using /usr/lib/jvm/java-11-openjdk as opposed to /usr/lib/jvm/default-jvm. Bazel has trouble building with OpenJDK 17+ on some operating systems.

@seanmor5
Copy link
Author

@yesudeep I re-ran in a Docker container on my Mac with the updated the java path to /usr/lib/jv/java-11-openjdk, an increase in memory swap, available memory, and disk space. Here's the result of df -h:


#5 [2/5] RUN df -h
#5 sha256:b69533de15570400c2b99b68d306cb1123a95f090c0badae699f1b9a2302e60a
#5 0.197 Filesystem                Size      Used Available Use% Mounted on
#5 0.198 overlay                 495.8G     49.2G    421.4G  10% /
#5 0.198 tmpfs                    64.0M         0     64.0M   0% /dev
#5 0.198 shm                      64.0M         0     64.0M   0% /dev/shm
#5 0.198 /dev/vda1               495.8G     49.2G    421.4G  10% /etc/resolv.conf
#5 0.198 /dev/vda1               495.8G     49.2G    421.4G  10% /etc/hosts
#5 0.198 tmpfs                    64.0M         0     64.0M   0% /proc/kcore
#5 0.198 tmpfs                    64.0M         0     64.0M   0% /proc/keys
#5 0.198 tmpfs                    64.0M         0     64.0M   0% /proc/timer_list
#5 0.198 tmpfs                    64.0M         0     64.0M   0% /proc/sched_debug
#5 0.198 tmpfs                    15.7G         0     15.7G   0% /sys/firmware
#5 DONE 0.2s

Still gets stuck on "Patching repository....". I will check one of the EC2 instances as well

@meteorcloudy meteorcloudy added P3 We're not considering working on this, but happy to review a PR. (No assignee) help wanted Someone outside the Bazel team could own this and removed untriaged labels Jan 17, 2023
@yesudeep
Copy link
Contributor

yesudeep commented Jan 18, 2023

Namaste @seanmor5

Thank you for responding. I don't have an aarch64 instance/machine handy, but have built myself a Qemu vm image using this quick and dirty script in case someone else wants to test. I was able to reproduce the problem while building bazel on the vm in emulation mode on Linux. I'll look at it in more detail soonish.

#!/usr/bin/env bash

set -o errexit
set -o nounset
set -o pipefail

BINARY=$(basename $0)
BINARY_VERSION="0.0.0"
OS_TYPE="$(echo $OSTYPE | tr '[:upper:]' '[:lower:]')"

OS_DISTRO="alpine"

#OS_CHANNEL="edge"
OS_CHANNEL="latest-stable"

# Version
OS_VERSION="3.17.1"

#------------------------------------------------------------------------------
# VM configuration
#------------------------------------------------------------------------------
# VM directory.
VM_DIR="$HOME/.vm"

# Number of CPU cores and threads.
DEFAULT_VM_CORES=2
DEFAULT_VM_THREADS=4

# Storage and memory
DEFAULT_VM_RAM_SIZE=4G
DEFAULT_VM_DISK_SIZE=54G

#------------------------------------------------------------------------------
# Host system based configuration
#------------------------------------------------------------------------------

HOST_CPU_ARCH="$(uname -m)"

# SSH port forwarding.
SSHD_HOST_ADDR=0.0.0.0
SSHD_HOST_PORT=2222
SSHD_GUEST_PORT=22
SSHD_GUEST_ADDR=10.0.2.15
SSHD_HOST_FWD="tcp:${SSHD_HOST_ADDR}:${SSHD_HOST_PORT}-${SSHD_GUEST_ADDR}:${SSHD_GUEST_PORT}"

#------------------------------------------------------------------------------
# Begin setup
#------------------------------------------------------------------------------

ARGS="$*"

function extract_kernel() {
	TMP_DIR="$1"
	VM_DISK_DIR="$2"

}

function sha_256_digest() {
	SHA256SUM=sha256sum
	case "$OS_TYPE" in
		freebsd*)
			SHA256SUM=gsha256sum
			;;
		*dragonfly*)
			SHA256SUM=gsha256sum
			;;
		*) ;;
	esac
	echo -n "$1" | $SHA256SUM | head -c 10
}
export -f sha_256_digest

function setup_vm() {
	local os_cpu_arch="$HOST_CPU_ARCH"
	local os_version="$OS_VERSION"
	local threads=$DEFAULT_VM_THREADS
	local cores=$DEFAULT_VM_CORES
	local ram_size="$DEFAULT_VM_RAM_SIZE"
	local disk_size="$DEFAULT_VM_DISK_SIZE"
	local extract_only=0

	while (("$#")); do
		case "$1" in
			-a | --arch)
				echo "arch: $2"
				os_cpu_arch="$2"
				shift 2
				;;
			-r | --ram-size | --ram_size)
				echo "ram_size: $2"
				ram_size="$2"
				shift 2
				;;
			-d | --disk-size | --disk_size)
				echo "disk_size: $2"
				disk_size="$2"
				shift 2
				;;
			-t | --threads)
				echo "threads: $2"
				threads="$2"
				shift 2
				;;
			-c | --cores)
				echo "cores: $2"
				cores="$2"
				shift 2
				;;
			-v | --version)
				echo "version: $2"
				os_version="$2"
				shift 2
				;;
			-e | --extract-only)
				extract_only=1
				shift
				;;
			-h | --help)
				echo "help"
				shift
				exit 1
				;;
			*)
				echo "error: unknown arguments"
				echo "help"
				shift
				exit
				;;
		esac
	done

	# Example: https://dl-cdn.alpinelinux.org/alpine/latest-stable/releases/x86_64/
	OS_ISO_URL="https://dl-cdn.alpinelinux.org/alpine/${OS_CHANNEL}/releases/${os_cpu_arch}/alpine-standard-${os_version}-${os_cpu_arch}.iso"
	OS_ISO_NAME="alpine-standard-${os_version}-${os_cpu_arch}.iso"
	OS_REPO="http://dl-cdn.alpinelinux.org/alpine/${OS_CHANNEL}/main/"
	OS_MODLOOP_URL="http://dl-cdn.alpinelinux.org/alpine/${OS_CHANNEL}/releases/${os_cpu_arch}/netboot/modloop-lts"
	VMLINUZ_URL="https://dl-cdn.alpinelinux.org/alpine/${OS_CHANNEL}/releases/${os_cpu_arch}/netboot/vmlinuz-lts"
	INITRAMFS_URL="https://dl-cdn.alpinelinux.org/alpine/${OS_CHANNEL}/releases/${os_cpu_arch}/netboot/initramfs-lts"

	VM_DIR_NAME="${OS_DISTRO}-${OS_CHANNEL}-${os_version}-${os_cpu_arch}"
	VM_DIR_NAME_HASH="$(sha_256_digest $OS_ISO_URL)"
	VM_DISK_DIR="$VM_DIR/$VM_DIR_NAME_HASH-$VM_DIR_NAME"
	VM_DISK=$VM_DISK_DIR/disk.qcow2

	# Prapare disk.
	mkdir -p $VM_DISK_DIR
	if [ ! -f $VM_DISK ]; then
		qemu-img create -f qcow2 $VM_DISK $disk_size
	fi

	#TMP_DIR=$(mktemp -d 2>/dev/null || mktemp -d -t '${OS_DISTRO}-${OS_CHANNEL}-${os_cpu_arch}')
	TMP_DIR="/tmp/$VM_DIR_NAME_HASH"
	mkdir -p $TMP_DIR

	# Now build the command line to run qemu.
	QEMU=qemu-system-${os_cpu_arch}
	QEMU_ARGS=(
		-smp "cores=$cores,threads=$threads"
		-m $ram_size
		-hda $VM_DISK
	)
	if [ "y$HOST_CPU_ARCH" == "y$os_cpu_arch" ]; then
		QEMU_ARGS+=(
			#-cpu host
			#-accel hvf
			-enable-kvm
			-hda $VM_DISK
			-nic user
			-boot d
			-cdrom $TMP_DIR/$OS_ISO_NAME
		)
		wget -P $TMP_DIR/ -c $OS_ISO_URL
		$QEMU "${QEMU_ARGS[@]}"
	elif [ "yaarch64" == "y$os_cpu_arch" ]; then
		QEMU_ARGS+=(
			-M virt
			-cpu cortex-a72
			-initrd $TMP_DIR/initramfs-lts
			-kernel $TMP_DIR/vmlinuz-lts
			--append "console=ttyAMA0 ip=dhcp alpine_repo=$OS_REPO modloop=$OS_MODLOOP_URL"
			-netdev user,id=unet
			-device virtio-net-device,netdev=unet
			-net user
			-nographic
		)
		if [ $extract_only == 0 ]; then
			wget -P $TMP_DIR/ -c $VMLINUZ_URL
			wget -P $TMP_DIR/ -c $INITRAMFS_URL

			$QEMU "${QEMU_ARGS[@]}"
		fi

		sudo modprobe nbd max_part=8
		sudo qemu-nbd --connect=/dev/nbd0 $VM_DISK
		mkdir -p $TMP_DIR/mnt/
		sudo mount /dev/nbd0p1 $TMP_DIR/mnt/
		sudo chmod a+r $TMP_DIR/mnt/initramfs-lts
		sudo chmod a+r $TMP_DIR/mnt/vmlinuz-lts
		cp $TMP_DIR/mnt/vmlinuz-lts $VM_DISK_DIR/vmlinuz-lts.img
		cp $TMP_DIR/mnt/initramfs-lts $VM_DISK_DIR/initramfs-lts.img
		sudo umount /dev/nbd0p1
		sudo nbd-client -d /dev/nbd0
		sudo modprobe -r nbd
	fi

	# Clean up.
	# Disable this while testing to prevent getting throttled by the CDN.
	#rm -rf $TMP_DIR
}

function start_vm() {
	local os_cpu_arch="$HOST_CPU_ARCH"
	local os_version="$OS_VERSION"
	local threads=$DEFAULT_VM_THREADS
	local cores=$DEFAULT_VM_CORES
	local ram_size="$DEFAULT_VM_RAM_SIZE"

	while (("$#")); do
		case "$1" in
			-a | --arch)
				echo "arch: $2"
				os_cpu_arch="$2"
				shift 2
				;;
			-r | --ram-size | --ram_size)
				echo "ram_size: $2"
				ram_size="$2"
				shift 2
				;;
			-t | --threads)
				echo "threads: $2"
				threads="$2"
				shift 2
				;;
			-c | --cores)
				echo "cores: $2"
				cores="$2"
				shift 2
				;;
			-h | --help)
				echo "help"
				shift
				exit 1
				;;
			*)
				echo "error: unknown arguments"
				echo "help"
				shift
				exit
				;;
		esac
	done

	# Example: https://dl-cdn.alpinelinux.org/alpine/latest-stable/releases/x86_64/
	OS_ISO_URL="https://dl-cdn.alpinelinux.org/alpine/${OS_CHANNEL}/releases/${os_cpu_arch}/alpine-standard-${os_version}-${os_cpu_arch}.iso"
	OS_ISO_NAME="alpine-standard-${os_version}-${os_cpu_arch}.iso"
	OS_REPO="http://dl-cdn.alpinelinux.org/alpine/${OS_CHANNEL}/main/"
	OS_MODLOOP_URL="http://dl-cdn.alpinelinux.org/alpine/${OS_CHANNEL}/releases/${os_cpu_arch}/netboot/modloop-lts"
	VMLINUZ_URL="https://dl-cdn.alpinelinux.org/alpine/${OS_CHANNEL}/releases/${os_cpu_arch}/netboot/vmlinuz-lts"
	INITRAMFS_URL="https://dl-cdn.alpinelinux.org/alpine/${OS_CHANNEL}/releases/${os_cpu_arch}/netboot/initramfs-lts"

	VM_DIR_NAME="${OS_DISTRO}-${OS_CHANNEL}-${os_version}-${os_cpu_arch}"
	VM_DIR_NAME_HASH="$(sha_256_digest $OS_ISO_URL)"
	VM_DISK_DIR="$VM_DIR/$VM_DIR_NAME_HASH-$VM_DIR_NAME"
	VM_DISK=$VM_DISK_DIR/disk.qcow2

	# Now build the command line to run qemu.
	QEMU=qemu-system-${os_cpu_arch}
	QEMU_ARGS=(
		-smp "cores=$cores,threads=$threads"
		-m $ram_size
		-hda $VM_DISK
		-nic user
		-boot c
		-nic user,hostfwd=$SSHD_HOST_FWD
	)
	if [ "y$HOST_CPU_ARCH" == "y$os_cpu_arch" ]; then
		QEMU_ARGS+=(
			-enable-kvm
		)
	elif [ "yaarch64" == "y$os_cpu_arch" ]; then
		# See: https://qemu-project.gitlab.io/qemu/system/linuxboot.html
		QEMU_ARGS+=(
			-M virt
			-cpu cortex-a72
			-initrd $VM_DISK_DIR/initramfs-lts.img
			-kernel $VM_DISK_DIR/vmlinuz-lts.img
			--append "console=ttyAMA0 root=/dev/vda3 rw rootfstype=ext4"
			-nographic
		)
	fi

	$QEMU "${QEMU_ARGS[@]}"
}

show_usage() {
	echo "usage"
}

if [ $# -eq 0 ]; then
	show_usage
        exit 1
fi

while (("$#")); do
	case "$1" in
		setup)
			shift
			setup_vm "$@"
			exit
			;;
		start)
			shift
			start_vm "$@"
			exit
			;;
		-h | --help | help)
			shift
			show_usage
			exit 0
			;;
		*)
			echo
			show_usage
			exit 1
			;;
	esac
done

exit 0

@yesudeep
Copy link
Contributor

yesudeep commented Jan 22, 2023

This appears to be reproducible on Clear Linux running on an x86_64 machine (a framework laptop) as well.

Earlier on the same system running Fedora 37, bazel built alright. The file system in use was btrfs then and now
it's using ext4. That's probably the main difference in configuration.

❯ uname -a
Linux ghostname 6.1.7-1247.native #1 SMP Wed Jan 18 08:32:41 PST 2023 x86_64 GNU/Linux

❯ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       916G   42G  828G   5% /
devtmpfs         32G     0   32G   0% /dev
tmpfs            32G  106M   32G   1% /dev/shm
tmpfs            13G  2.7M   13G   1% /run
tmpfs           4.0M     0  4.0M   0% /sys/fs/cgroup
tmpfs            32G  957M   31G   3% /tmp
clr_debug_fuse  916G   42G  828G   5% /usr/lib/debug
clr_debug_fuse  916G   42G  828G   5% /usr/src/debug
tmpfs           6.3G  6.9M  6.3G   1% /run/user/1000

❯ fdisk -l
Disk /dev/nvme0n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 1048576 bytes
Disklabel type: gpt
Disk identifier: 49CB04F8-1A89-48DB-8811-40A543D4865A

Device          Start        End    Sectors   Size Type
/dev/nvme0n1p1   2048     307199     305152   149M EFI System
/dev/nvme0n1p2 307200 1953523711 1953216512 931.4G Linux root (x86-64)

@yesudeep
Copy link
Contributor

yesudeep commented Jan 22, 2023

The build process is blocked waiting to read something that is perhaps unavailable/stuck in an indefinite loop. I'm building this in tmpfs. would that affect the process?

❯ env JAVA_HOME="$JAVA_HOME" \
	EXTRA_BAZEL_ARGS="--tool_java_runtime_version=local_jdk" \
	strace bash ./compile.sh
pipe2([3, 4], 0)                        = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, [INT TERM CHLD], [], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x55ebf1e45a10) = 165935
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGCHLD, {sa_handler=0x55ebf20e7a02, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x55ebf1e82a60}, {sa_handler=0x55ebf20e7a02, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x55ebf1e82a60}, 8) = 0
close(4)                                = 0
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
read(3, 

Update:

On my machine the compile.sh script was blocked waiting with the strace log above. As soon as I set JAVA_HOME to (clear linux-specific):

/usr/lib/jvm/java-1.11.0-openjdk

it worked and started to build. The /usr/bin/java binary on the OS appears to just hang when invoked at the command line. So I'm not sure whether you have the same problem.

❯ java -version
(not responding)

❯ strace java -version
... looks like an indefinite loop ...

On an aarch64 VM an strace log shows:

munmap(0xffff077e8000, 24576)           = 0
mmap(NULL, 45056, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff077d7000
munmap(0xffff077e2000, 24576)           = 0
madvise(0xffff077d8000, 16384, MADV_FREE) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, [INT TERM CHLD], [], 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [INT TERM CHLD], 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
clone(child_stack=NULL, flags=SIGCHLD)  = 9780
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [INT TERM CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
madvise(0xffff077dd000, 16384, MADV_FREE) = 0
munmap(0xffff077d7000, 45056)           = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {sa_handler=0xaaace15d5b50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xffff079a5a90}, {sa_handler=0xaaace15f4110, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xffff079a5a90}, 8) = 0
wait4(-1,

@yesudeep
Copy link
Contributor

yesudeep commented Jan 22, 2023

Still gets stuck on "Patching repository....". I will check one of the EC2 instances as well

  1. Namaste @seanmor5 could you please try prefixing the bash command with strace (example below) and paste the tail of the log here to determine where the compile process is blocked waiting? (You may have to install strace.)
❯ doas apk add strace

❯ env JAVA_HOME="<your java home>" \
	EXTRA_BAZEL_ARGS="--tool_java_runtime_version=local_jdk" \
	strace bash ./compile.sh
  1. Could you please also paste the tail of the result of invoking the following?
❯ strace java -version

@seanmor5
Copy link
Author

seanmor5 commented Jan 29, 2023

@yesudeep Hello, I just got around to this:

strace java -version

output:

mprotect(0xffffb63ce000, 634880, PROT_READ) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [], 8) = 0
membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED, 0) = -1 EPERM (Operation not permitted)
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
rt_sigaction(SIGRT_2, {sa_handler=0xffffb657a0e4, sa_mask=~[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0xffffb65a2e64}, NULL, 8) = 0
rt_sigaction(SIGRT_2, {sa_handler=SIG_IGN, sa_mask=~[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0xffffb65a2e64}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
getpid()                                = 3275
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffffb65fc000
mmap(NULL, 73728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffffb555b000
getpid()                                = 3275
rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
membarrier(MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, 0) = 0
mmap(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffffb535a000
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [], 8) = 0
clone(child_stack=0xffffb555aab0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|0x400000, parent_tid=[3276], tls=0xffffb555aba8, child_tidptr=0xffffb6603230) = 3276
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0xffffb555ab08, FUTEX_WAIT_PRIVATE, 2, NULLopenjdk version "11.0.18" 2023-01-17
OpenJDK Runtime Environment (build 11.0.18+10-alpine-r0)
OpenJDK 64-Bit Server VM (build 11.0.18+10-alpine-r0, mixed mode)
) = 0
munmap(0xffffb535a000, 2101248)         = 0
exit_group(0)                           = ?
+++ exited with 0 +++

Here is the tail of the output of running the compilation where it gets stuck:

munmap(0xffffb12e6000, 4096)            = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, [INT TERM CHLD], [], 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [INT TERM CHLD], 8) = 0
rt_sigprocmask(SIG_SETMASK, [INT TERM CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [INT TERM CHLD], 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
clone(child_stack=NULL, flags=SIGCHLD)  = 4325
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [INT TERM CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
munmap(0xffffb12e7000, 8192)            = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {sa_handler=0xaaaad1d94b1c, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xffffb1406e64}, {sa_handler=0xaaaad1db0698, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xffffb1406e64}, 8) = 0

@strophy
Copy link

strophy commented Jan 31, 2023

I'm also experiencing this error sporadically while trying to build Bazel 6.0.0 for arm64 in Alpine for arm64 under QEMU, although it strangely succeeded once for me in CI. Locally it seems to fail at a slightly different place each time. The following Dockerfile hangs for me:

FROM alpine:3.17
RUN apk update && \
    apk add --no-cache \
    bash \
    build-base \
    curl \
    linux-headers \
    openjdk11-jdk \
    python3 \
    strace \
    unzip \
    zip

# Build Bazel
# TODO: Remove when Bazel 5.2.0+ is available in Alpine
# https://github.com/bazelbuild/bazel/pull/14391
ARG BAZEL_VERSION=6.0.0
RUN mkdir -p /tmp/bazel-release
WORKDIR /tmp/bazel-release
RUN curl -sSLO https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-dist.zip && unzip -q bazel-${BAZEL_VERSION}-dist.zip
RUN env JAVA_HOME="/usr/lib/jvm/java-11-openjdk" EXTRA_BAZEL_ARGS="--tool_java_runtime_version=local_jdk --curses=no" bash ./compile.sh
RUN install -D output/bazel /usr/local/bin/bazel

After some time, CPU and network usage go to zero but the command never exits. I can reproduce this in QEMU on x86_64 and on native arm64 AWS Graviton2 CPUs. I tried starting the Alpine image from scratch and adding strace to a few commands (this doesn't work when using QEMU, it works on native arm64 Docker only):

/tmp/bazel-release # strace java --version
execve("/usr/bin/java", ["java", "--version"], 0xffffc069d098 /* 7 vars */) = 0
set_tid_address(0xffff9b754230)         = 31
brk(NULL)                               = 0xaaaae81f2000
brk(0xaaaae81f4000)                     = 0xaaaae81f4000
mmap(0xaaaae81f2000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xaaaae81f2000
readlinkat(AT_FDCWD, "/proc/self/exe", "/usr/lib/jvm/java-11-openjdk/bin"..., 512) = 37
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/bin/../lib/jli/libjli.so", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=67264, ...}) = 0
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 960) = 960
mmap(NULL, 135168, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xffff9b681000
mmap(0xffff9b6a0000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0xf000) = 0xffff9b6a0000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/bin/../lib/jli/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/bin/../lib/jli/../libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/bin/../lib/jli/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/bin/../lib/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld-musl-aarch64.path", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fstat(3, {st_mode=S_IFREG|0755, st_size=132880, ...}) = 0
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 960) = 960
mmap(NULL, 200704, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xffff9b650000
mmap(0xffff9b67f000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x1f000) = 0xffff9b67f000
close(3)                                = 0
mprotect(0xffff9b6a0000, 4096, PROT_READ) = 0
mprotect(0xffff9b67f000, 4096, PROT_READ) = 0
mprotect(0xaaaad9f4f000, 4096, PROT_READ) = 0
readlinkat(AT_FDCWD, "/proc/self/exe", "/usr/lib/jvm/java-11-openjdk/bin"..., 4096) = 37
faccessat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/lib/libjava.so", F_OK) = 0
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/lib/jvm.cfg", O_RDONLY|O_LARGEFILE) = 3
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff9b74d000
read(3, "-server KNOWN\n-client IGNORE\n", 1024) = 29
read(3, "", 1024)                       = 0
close(3)                                = 0
munmap(0xffff9b74d000, 4096)            = 0
newfstatat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/lib/server/libjvm.so", {st_mode=S_IFREG|0644, st_size=15872104, ...}, 0) = 0
execve("/usr/lib/jvm/java-11-openjdk/bin/java", ["java", "--version"], 0xaaaad9f50a80 /* 8 vars */) = 0
set_tid_address(0xffff9ccb5230)         = 31
brk(NULL)                               = 0xaaaad7a36000
brk(0xaaaad7a38000)                     = 0xaaaad7a38000
mmap(0xaaaad7a36000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xaaaad7a36000
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/lib/server/libjli.so", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/lib/libjli.so", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/../lib/libjli.so", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
readlinkat(AT_FDCWD, "/proc/self/exe", "/usr/lib/jvm/java-11-openjdk/bin"..., 512) = 37
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/bin/../lib/jli/libjli.so", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=67264, ...}) = 0
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 960) = 960
mmap(NULL, 135168, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xffff9cbe2000
mmap(0xffff9cc01000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0xf000) = 0xffff9cc01000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/lib/server/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/lib/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/../lib/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/bin/../lib/jli/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/bin/../lib/jli/../libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/bin/../lib/jli/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/bin/../lib/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld-musl-aarch64.path", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fstat(3, {st_mode=S_IFREG|0755, st_size=132880, ...}) = 0
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 960) = 960
mmap(NULL, 200704, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xffff9cbb1000
mmap(0xffff9cbe0000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x1f000) = 0xffff9cbe0000
close(3)                                = 0
mprotect(0xffff9cc01000, 4096, PROT_READ) = 0
mprotect(0xffff9cbe0000, 4096, PROT_READ) = 0
mprotect(0xaaaace7df000, 4096, PROT_READ) = 0
readlinkat(AT_FDCWD, "/proc/self/exe", "/usr/lib/jvm/java-11-openjdk/bin"..., 4096) = 37
faccessat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/lib/libjava.so", F_OK) = 0
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/lib/jvm.cfg", O_RDONLY|O_LARGEFILE) = 3
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff9ccae000
read(3, "-server KNOWN\n-client IGNORE\n", 1024) = 29
read(3, "", 1024)                       = 0
close(3)                                = 0
munmap(0xffff9ccae000, 4096)            = 0
newfstatat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/lib/server/libjvm.so", {st_mode=S_IFREG|0644, st_size=15872104, ...}, 0) = 0
openat(AT_FDCWD, "/usr/lib/jvm/java-11-openjdk/lib/server/libjvm.so", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=15872104, ...}) = 0
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 960) = 960
mmap(NULL, 16343040, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xffff9bc1b000
mmap(0xffff9ca70000, 1314816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0xe55000) = 0xffff9ca70000
mmap(0xffff9cb3e000, 471040, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xffff9cb3e000
close(3)                                = 0
mprotect(0xffff9ca70000, 634880, PROT_READ) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [], 8) = 0
membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED, 0) = -1 EPERM (Operation not permitted)
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
rt_sigaction(SIGRT_2, {sa_handler=0xffff9cc273c4, sa_mask=~[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0xffff9cc52a90}, NULL, 8) = 0
rt_sigaction(SIGRT_2, {sa_handler=SIG_IGN, sa_mask=~[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0xffff9cc52a90}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
getpid()                                = 31
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff9ccae000
mmap(NULL, 73728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff9bc09000
getpid()                                = 31
rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
membarrier(MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, 0) = 0
mmap(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff9ba08000
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [], 8) = 0
clone(child_stack=0xffff9bc08ab0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|0x400000, parent_tid=[32], tls=0xffff9bc08ba8, child_tidptr=0xffff9ccb5230) = 32
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0xffff9bc08b08, FUTEX_WAIT_PRIVATE, 2, NULLopenjdk 11.0.18 2023-01-17
OpenJDK Runtime Environment (build 11.0.18+10-alpine-r0)
OpenJDK 64-Bit Server VM (build 11.0.18+10-alpine-r0, mixed mode)
) = 0
munmap(0xffff9ba08000, 2101248)         = 0
exit_group(0)                           = ?
+++ exited with 0 +++

Tail of env JAVA_HOME="/usr/lib/jvm/java-11-openjdk" EXTRA_BAZEL_ARGS="--tool_java_runtime_version=local_jdk --curses=no" strace bash ./compile.sh

--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2212, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 2212
wait4(-1, 0xffffcd31a0d0, WNOHANG, NULL) = -1 ECHILD (No child process)
rt_sigreturn({mask=[INT]})              = 19
read(3, "", 4096)                       = 0
close(3)                                = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {sa_handler=0xaaaaacc75b50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xffffa9c4ca90}, {sa_handler=0xaaaaacc94110, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xffffa9c4ca90}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=0xaaaaacc94110, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xffffa9c4ca90}, {sa_handler=0xaaaaacc75b50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xffffa9c4ca90}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
pipe2([3, 4], 0)                        = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, [INT TERM CHLD], [], 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [INT TERM CHLD], 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
clone(child_stack=NULL, flags=SIGCHLD)  = 2213
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [INT TERM CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGCHLD, {sa_handler=0xaaaaacc78d34, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0xffffa9c4ca90}, {sa_handler=0xaaaaacc78d34, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0xffffa9c4ca90}, 8) = 0
close(4)                                = 0
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2213, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 2213
wait4(-1, 0xffffcd31b1f0, WNOHANG, NULL) = -1 ECHILD (No child process)
rt_sigreturn({mask=[INT]})              = 0
read(3, "/tmp/bazel-release\n", 4096)   = 19
read(3, "", 4096)                       = 0
close(3)                                = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {sa_handler=0xaaaaacc75b50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xffffa9c4ca90}, {sa_handler=0xaaaaacc94110, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xffffa9c4ca90}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=0xaaaaacc94110, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xffffa9c4ca90}, {sa_handler=0xaaaaacc75b50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xffffa9c4ca90}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, [INT TERM CHLD], [], 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [INT TERM CHLD], 8) = 0
rt_sigprocmask(SIG_SETMASK, [INT TERM CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [INT TERM CHLD], 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
clone(child_stack=NULL, flags=SIGCHLD)  = 2214
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [INT TERM CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {sa_handler=0xaaaaacc75b50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xffffa9c4ca90}, {sa_handler=0xaaaaacc94110, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xffffa9c4ca90}, 8) = 0
wait4(-1, 

I don't really understand any of this, but I noticed child processes are exiting in a sequence like 2212, 2213, etc. When this output was displayed, the container was running the following processes, including 2214 (which I think is hung):

/ # ps aux | grep jvm
  998 root      0:18 /usr/lib/jvm/java-11-openjdk/bin/java -XX:+HeapDumpOnOutOfMemoryError -Xverify:none -Dfile.encoding=ISO-8859-1 -XX:HeapDumpPath=/tmp/bazel_XXgEemBN -Djava.util.logging.config.file=/tmp/bazel_XXgEemBN/javalog.properties -jar /tmp/bazel_XXgEemBN/archive/libblaze.jar --batch --install_base=/tmp/bazel_XXgEemBN/archive --output_base=/tmp/bazel_XXgEemBN/out --failure_detail_out=/tmp/bazel_XXgEemBN/failure_detail.rawproto --output_user_root=/tmp/bazel_XXgEemBN/user_root --install_md5= --default_system_javabase=/usr/lib/jvm/java-11-openjdk --workspace_directory=/tmp/bazel-release --nofatal_event_bus_exceptions build --ignore_unsupported_sandboxing --startup_time=329 --extract_data_time=523 --rc_source=/dev/null --isatty=1 --build_python_zip --client_env=PWD=/tmp/bazel-release --client_env=JAVA_HOME=/usr/lib/jvm/java-11-openjdk --client_env=SHLVL=2 --client_env=HOME=/root --client_env=HOSTNAME=418b04cc98ce --client_env=TERM=xterm --client_env=OLDPWD=/tmp/bazel-release --client_env=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin --client_env=EXTRA_BAZEL_ARGS=--tool_java_runtime_version=local_jdk --curses=no --client_cwd=/tmp/bazel-release --spawn_strategy=standalone --nojava_header_compilation --strategy=Javac=worker --worker_quit_after_build --ignore_unsupported_sandboxing --compilation_mode=opt --distdir=derived/distdir --extra_toolchains=//scripts/bootstrap:bootstrap_toolchain_definition --tool_java_runtime_version=local_jdk --curses=no --verbose_failures --javacopt=-g -source 11 -target 11 --stamp --embed_label 6.0.0- (@non-git) src:bazel_nojdk --action_env=PATH --host_platform=@local_config_platform//:host --platforms=@local_config_platform//:host
 1257 root      0:00 {skyframe-evalua} /usr/lib/jvm/java-11-openjdk/bin/java -XX:+HeapDumpOnOutOfMemoryError -Xverify:none -Dfile.encoding=ISO-8859-1 -XX:HeapDumpPath=/tmp/bazel_XXgEemBN -Djava.util.logging.config.file=/tmp/bazel_XXgEemBN/javalog.properties -jar /tmp/bazel_XXgEemBN/archive/libblaze.jar --batch --install_base=/tmp/bazel_XXgEemBN/archive --output_base=/tmp/bazel_XXgEemBN/out --failure_detail_out=/tmp/bazel_XXgEemBN/failure_detail.rawproto --output_user_root=/tmp/bazel_XXgEemBN/user_root --install_md5= --default_system_javabase=/usr/lib/jvm/java-11-openjdk --workspace_directory=/tmp/bazel-release --nofatal_event_bus_exceptions build --ignore_unsupported_sandboxing --startup_time=329 --extract_data_time=523 --rc_source=/dev/null --isatty=1 --build_python_zip --client_env=PWD=/tmp/bazel-release --client_env=JAVA_HOME=/usr/lib/jvm/java-11-openjdk --client_env=SHLVL=2 --client_env=HOME=/root --client_env=HOSTNAME=418b04cc98ce --client_env=TERM=xterm --client_env=OLDPWD=/tmp/bazel-release --client_env=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin --client_env=EXTRA_BAZEL_ARGS=--tool_java_runtime_version=local_jdk --curses=no --client_cwd=/tmp/bazel-release --spawn_strategy=standalone --nojava_header_compilation --strategy=Javac=worker --worker_quit_after_build --ignore_unsupported_sandboxing --compilation_mode=opt --distdir=derived/distdir --extra_toolchains=//scripts/bootstrap:bootstrap_toolchain_definition --tool_java_runtime_version=local_jdk --curses=no --verbose_failures --javacopt=-g -source 11 -target 11 --stamp --embed_label 6.0.0- (@non-git) src:bazel_nojdk --action_env=PATH --host_platform=@local_config_platform//:host --platforms=@local_config_platform//:host
 2214 root      0:08 /usr/lib/jvm/java-11-openjdk/bin/java -XX:+HeapDumpOnOutOfMemoryError -Xverify:none -Dfile.encoding=ISO-8859-1 -XX:HeapDumpPath=/tmp/bazel_XXDODcEg -Djava.util.logging.config.file=/tmp/bazel_XXDODcEg/javalog.properties -jar /tmp/bazel_XXDODcEg/archive/libblaze.jar --batch --install_base=/tmp/bazel_XXDODcEg/archive --output_base=/tmp/bazel_XXDODcEg/out --failure_detail_out=/tmp/bazel_XXDODcEg/failure_detail.rawproto --output_user_root=/tmp/bazel_XXDODcEg/user_root --install_md5= --default_system_javabase=/usr/lib/jvm/java-11-openjdk --workspace_directory=/tmp/bazel-release --nofatal_event_bus_exceptions build --ignore_unsupported_sandboxing --startup_time=329 --extract_data_time=523 --rc_source=/dev/null --isatty=1 --build_python_zip --client_env=PWD=/tmp/bazel-release --client_env=JAVA_HOME=/usr/lib/jvm/java-11-openjdk --client_env=SHLVL=2 --client_env=HOME=/root --client_env=HOSTNAME=418b04cc98ce --client_env=TERM=xterm --client_env=OLDPWD=/tmp/bazel-release --client_env=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin --client_env=EXTRA_BAZEL_ARGS=--tool_java_runtime_version=local_jdk --curses=no --client_cwd=/tmp/bazel-release --spawn_strategy=standalone --nojava_header_compilation --strategy=Javac=worker --worker_quit_after_build --ignore_unsupported_sandboxing --compilation_mode=opt --distdir=derived/distdir --extra_toolchains=//scripts/bootstrap:bootstrap_toolchain_definition --tool_java_runtime_version=local_jdk --curses=no --verbose_failures --javacopt=-g -source 11 -target 11 --stamp --embed_label 6.0.0- (@non-git) src:bazel_nojdk --action_env=PATH --host_platform=@local_config_platform//:host --platforms=@local_config_platform//:host
 2461 root      0:00 {skyframe-evalua} /usr/lib/jvm/java-11-openjdk/bin/java -XX:+HeapDumpOnOutOfMemoryError -Xverify:none -Dfile.encoding=ISO-8859-1 -XX:HeapDumpPath=/tmp/bazel_XXDODcEg -Djava.util.logging.config.file=/tmp/bazel_XXDODcEg/javalog.properties -jar /tmp/bazel_XXDODcEg/archive/libblaze.jar --batch --install_base=/tmp/bazel_XXDODcEg/archive --output_base=/tmp/bazel_XXDODcEg/out --failure_detail_out=/tmp/bazel_XXDODcEg/failure_detail.rawproto --output_user_root=/tmp/bazel_XXDODcEg/user_root --install_md5= --default_system_javabase=/usr/lib/jvm/java-11-openjdk --workspace_directory=/tmp/bazel-release --nofatal_event_bus_exceptions build --ignore_unsupported_sandboxing --startup_time=329 --extract_data_time=523 --rc_source=/dev/null --isatty=1 --build_python_zip --client_env=PWD=/tmp/bazel-release --client_env=JAVA_HOME=/usr/lib/jvm/java-11-openjdk --client_env=SHLVL=2 --client_env=HOME=/root --client_env=HOSTNAME=418b04cc98ce --client_env=TERM=xterm --client_env=OLDPWD=/tmp/bazel-release --client_env=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin --client_env=EXTRA_BAZEL_ARGS=--tool_java_runtime_version=local_jdk --curses=no --client_cwd=/tmp/bazel-release --spawn_strategy=standalone --nojava_header_compilation --strategy=Javac=worker --worker_quit_after_build --ignore_unsupported_sandboxing --compilation_mode=opt --distdir=derived/distdir --extra_toolchains=//scripts/bootstrap:bootstrap_toolchain_definition --tool_java_runtime_version=local_jdk --curses=no --verbose_failures --javacopt=-g -source 11 -target 11 --stamp --embed_label 6.0.0- (@non-git) src:bazel_nojdk --action_env=PATH --host_platform=@local_config_platform//:host --platforms=@local_config_platform//:host

Is the space in --embed_label 6.0.0- (@non-git) valid syntax?

@strophy
Copy link

strophy commented Mar 2, 2023

@yesudeep I'm currently able to reliably reproduce this hang while building Bazel 6.0.0 under Alpine in QEMU in my local Fedora x86_64 with btrfs environment, but the exact same build succeeds (albeit very slowly) under Github Actions. Is there anything I can check to help narrow things down?

@strophy
Copy link

strophy commented Mar 10, 2023

I can reproduce this with Bazel 6.1.0 on native ARM as well. I've uploaded the tail 4000 lines of strace output here: https://pastebin.com/abN6i5qa

Stepping inside the hanging container and viewing the contents of /tmp/bazel_08JYbuZU/phase shows Building output/bazel. The output to the user shows Fetching repository @bazelci_rules; Patching repository 95s and the timer will keep increasing forever.

Full strace output including "Building Bazel from scratch", then hanging immediately after "Building Bazel with Bazel" (1.7MB txt file) can be downloaded here: https://drive.google.com/file/d/1m_dPN3xYRvNT8_f-k6KtKgt7K8swCHBx

@seaurching
Copy link

@yesudeep @meteorcloudy @seanmor5 @strophy @sgowroji I build it on mips64le, error for somethings

root@ed7d09768525:~# env EXTRA_BAZEL_ARGS="--tool_java_runtime_version=local_jdk" BAZEL_JAVAC_OPTS="-J-Xms1g -J-Xmx64g"  bash ./compile.sh --jobs=10
🍃  Building Bazel from scratch.. ....
🍃  Building Bazel with Bazel.
.OpenJDK 64-Bit Zero VM warning: Options -Xverify:none and -noverify were deprecated in JDK 13 and will likely be removed in a future release.
Loading:
    Fetching repository @bazelci_rules; Patching repository
INFO: Repository bazelci_rules instantiated at:
  /root/WORKSPACE:258:18: in <toplevel>
  /root/distdir.bzl:94:17: in dist_http_archive
Repository rule http_archive defined at:
  /root/tools/build_defs/repo/http.bzl:372:31: in <toplevel>
ERROR: An error occurred during the fetch of repository 'bazelci_rules':
   Traceback (most recent call last):
	File "/root/tools/build_defs/repo/http.bzl", line 143, column 10, in _http_archive_impl
		patch(ctx, auth = auth)
	File "/root/tools/build_defs/repo/utils.bzl", line 193, column 21, in patch
		fail("Error applying patch command %s:\n%s%s" %
Error in fail: Error applying patch command test -f BUILD && chmod u+w BUILD || true:
java.io.IOException: Cannot run program "bash" (in directory "/tmp/bazel_3r6NM4mZ/out/external/bazelci_rules"): error=0, Failed to exec spawn helper: pid: 6613, exit value: 1
ERROR: /root/WORKSPACE:258:18: fetching http_archive rule //external:bazelci_rules: Traceback (most recent call last):
	File "/root/tools/build_defs/repo/http.bzl", line 143, column 10, in _http_archive_impl
		patch(ctx, auth = auth)
	File "/root/tools/build_defs/repo/utils.bzl", line 193, column 21, in patch
		fail("Error applying patch command %s:\n%s%s" %
Error in fail: Error applying patch command test -f BUILD && chmod u+w BUILD || true:
java.io.IOException: Cannot run program "bash" (in directory "/tmp/bazel_3r6NM4mZ/out/external/bazelci_rules"): error=0, Failed to exec spawn helper: pid: 6613, exit value: 1
ERROR: Error computing the main repository mapping: no such package '@bazelci_rules//': Error applying patch command test -f BUILD && chmod u+w BUILD || true:
java.io.IOException: Cannot run program "bash" (in directory "/tmp/bazel_3r6NM4mZ/out/external/bazelci_rules"): error=0, Failed to exec spawn helper: pid: 6613, exit value: 1
Loading:

ERROR: Could not build Bazel

@strophy
Copy link

strophy commented Jul 7, 2024

I tried building again with Bazel 7.2.1 and did not encounter this error anymore, allowing me to package Bazel for Alpine here: https://pkgs.alpinelinux.org/package/edge/testing/aarch64/bazel7

Can anyone else confirm this is no longer an issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Someone outside the Bazel team could own this P3 We're not considering working on this, but happy to review a PR. (No assignee) team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug
Projects
None yet
Development

No branches or pull requests

6 participants