Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build error with golang:1.20-alpine3.17 platform=linux/arm64 using confluent-kafka-go v2.1.0 #981

Closed
1 task done
everesio opened this issue Apr 10, 2023 · 17 comments
Closed
1 task done
Assignees

Comments

@everesio
Copy link

Description

ARM64 build using golang:1.20-alpine3.17 fails. AMD64 using confluent-kafka-go v2.1.0 build succeeds.
ARM64 and AMD64 with v2.0.2 are also successful.

go mod tidy && go mod vendor
docker buildx build --build-arg TARGETARCH=arm64 .
[+] Building 164.6s (11/11) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                   0.1s
 => => transferring dockerfile: 352B                                                                                                                                                                                                                   0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                                                                                                                        0.0s
 => [internal] load metadata for docker.io/library/golang:1.20-alpine3.17                                                                                                                                                                              0.9s
 => [auth] library/golang:pull token for registry-1.docker.io                                                                                                                                                                                          0.0s
 => [1/6] FROM docker.io/library/golang:1.20-alpine3.17@sha256:08e9c086194875334d606765bd60aa064abd3c215abfbcf5737619110d48d114                                                                                                                        0.0s
 => [internal] load build context                                                                                                                                                                                                                      0.4s
 => => transferring context: 104.94MB                                                                                                                                                                                                                  0.3s
 => CACHED [2/6] RUN echo arm64                                                                                                                                                                                                                        0.0s
 => [3/6] RUN apk add alpine-sdk ca-certificates                                                                                                                                                                                                      27.5s
 => [4/6] WORKDIR /code                                                                                                                                                                                                                                0.1s
 => [5/6] ADD . /code                                                                                                                                                                                                                                  0.3s
 => ERROR [6/6] RUN CGO_ENABLED=1 GO111MODULE=on GOOS=linux GOARCH=arm64 go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .                                                                                                      135.7s
------
 > [6/6] RUN CGO_ENABLED=1 GO111MODULE=on GOOS=linux GOARCH=arm64 go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .:
#0 135.6 # main
#0 135.6 /usr/local/go/pkg/tool/linux_arm64/link: running gcc failed: exit status 1
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: /code/vendor/github.com/confluentinc/confluent-kafka-go/v2/kafka/librdkafka_vendor/librdkafka_musl_linux_arm64.a(rdkafka_sasl_cyrus.o): in function `rd_kafka_sasl_cyrus_close':
#0 135.6 (.text+0xb4): undefined reference to `sasl_dispose'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: /code/vendor/github.com/confluentinc/confluent-kafka-go/v2/kafka/librdkafka_vendor/librdkafka_musl_linux_arm64.a(rdkafka_sasl_cyrus.o): in function `rd_kafka_sasl_cyrus_recv':
#0 135.6 (.text+0x1a0): undefined reference to `sasl_client_step'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x1c8): undefined reference to `sasl_errdetail'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x35c): undefined reference to `sasl_getprop'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x38c): undefined reference to `sasl_getprop'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x3ac): undefined reference to `sasl_getprop'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: /code/vendor/github.com/confluentinc/confluent-kafka-go/v2/kafka/librdkafka_vendor/librdkafka_musl_linux_arm64.a(rdkafka_sasl_cyrus.o): in function `rd_kafka_sasl_cyrus_client_new':
#0 135.6 (.text+0xf74): undefined reference to `sasl_client_new'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0xfd4): undefined reference to `sasl_client_start'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0xff4): undefined reference to `sasl_errdetail'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x110c): undefined reference to `sasl_listmech'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x1180): undefined reference to `sasl_errstring'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: /code/vendor/github.com/confluentinc/confluent-kafka-go/v2/kafka/librdkafka_vendor/librdkafka_musl_linux_arm64.a(rdkafka_sasl_cyrus.o): in function `rd_kafka_sasl_cyrus_global_init':
#0 135.6 (.text+0x16dc): undefined reference to `sasl_client_init'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x170c): undefined reference to `sasl_errstring'
#0 135.6 collect2: error: ld returned 1 exit status
#0 135.6
------
Dockerfile:12
--------------------
  10 |     ADD . "/code"
  11 |
  12 | >>> RUN CGO_ENABLED=1 GO111MODULE=on GOOS=linux GOARCH=$TARGETARCH go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .
  13 |
--------------------

How to reproduce

  1. Use consumer example https://github.com/confluentinc/confluent-kafka-go/tree/master/examples/consumer_example
  2. go.mod
module main

go 1.20

require github.com/confluentinc/confluent-kafka-go/v2 v2.1.0
  1. Dockerfile
FROM --platform=linux/$TARGETARCH  golang:1.20-alpine3.17 as builder

ARG TARGETARCH
RUN echo $TARGETARCH

RUN apk add alpine-sdk ca-certificates

WORKDIR "/code"
ADD . "/code"

RUN CGO_ENABLED=1 GO111MODULE=on GOOS=linux GOARCH=$TARGETARCH go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .
  1. Failed build
go mod tidy && go mod vendor
docker buildx build --build-arg TARGETARCH=arm64 .
  1. Successful build
go mod tidy && go mod vendor
docker buildx build --build-arg TARGETARCH=amd64 .
  1. arm64 and amd64 are successful after go.mod dependency is downgraded
require github.com/confluentinc/confluent-kafka-go/v2 v2.0.2

Checklist

Please provide the following information:

  • confluent-kafka-go and librdkafka version (LibraryVersion()):
    confluent-kafka-go v2.1.0
@milindl milindl self-assigned this Apr 11, 2023
@flaxinger
Copy link

flaxinger commented May 31, 2023

this needs a bit more attention. wasted too much time on this. 🥲

@saranonearth
Copy link

Just try making the following changes


FROM --platform=linux/$TARGETARCH  golang:1.20-alpine3.17 as builder

ARG TARGETARCH
RUN echo $TARGETARCH

RUN apk update && apk add bash ca-certificates git gcc g++ libc-dev librdkafka-dev pkgconf

WORKDIR "/code"
ADD . "/code"

RUN go build -tags musl -o main .

@AndriyKalashnykov
Copy link

Just try making the following changes


FROM --platform=linux/$TARGETARCH  golang:1.20-alpine3.17 as builder

ARG TARGETARCH
RUN echo $TARGETARCH

RUN apk update && apk add bash ca-certificates git gcc g++ libc-dev librdkafka-dev pkgconf

WORKDIR "/code"
ADD . "/code"

RUN go build -tags musl -o main .

This approach doesn't work with librdkafka-dev v2.3.0, but was working with v2.2.0

@kimgr
Copy link

kimgr commented Nov 21, 2023

The root cause appears to be that librdkafka now requires Cyrus SASL, but the confluent-kafka-go wrappers don't spell out a link dependency to it.

All the workarounds above seem to avoid solving this problem by instead installing a system librdkafka-dev which requires -tags dynamic per https://github.com/confluentinc/confluent-kafka-go/#librdkafka (not sure why earlier posted workaround examples work without it; we saw linker errors still).

To fix what I understand to be the root cause, we can:

  • Ensure cyrus-sasl-dev (for Alpine, see librdkafka sasl docs for other platforms) is installed in the build and run environment
  • Tell cgo to explicitly link libsasl2.so

I adapted the repro case from the original report for go1.21 + alpine3.18 with the requisite flags:

FROM --platform=linux/$TARGETARCH golang:1.21.4-alpine3.18

ARG TARGETARCH
RUN echo $TARGETARCH

RUN apk update
RUN apk add \
    gcc \
    musl-dev \
    # explicitly install SASL package
    cyrus-sasl-dev

WORKDIR "/code"
ADD . "/code"

RUN CGO_ENABLED=1 \
    GO111MODULE=on \
    GOOS=linux \
    GOARCH=$TARGETARCH \
    # explicitly link to libsasl2 installed as part of cyrus-sasl-dev
    CGO_LDFLAGS="-lsasl2" \
    go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .

This works on my arm64/M1 Mac for TARGETARCH of both arm64 and amd64.

@kimgr
Copy link

kimgr commented Nov 21, 2023

As far as fixing the root cause bug; I'm not sure why there's now a hard link dependency on libsasl2.so. But I see that the Darwin cgo LDFLAGS have -lsasl2 as part of the distribution: https://github.com/confluentinc/confluent-kafka-go/blob/master/kafka/build_darwin_arm64.go#L9. There's probably reasons why this can't work on Linux in general, but it might be a thread to start pulling on.

@AndriyKalashnykov
Copy link

The root cause appears to be that librdkafka now requires Cyrus SASL, but the confluent-kafka-go wrappers don't spell out a link dependency to it.

All the workarounds above seem to avoid solving this problem by instead installing a system librdkafka-dev which requires -tags dynamic per https://github.com/confluentinc/confluent-kafka-go/#librdkafka (not sure why earlier posted workaround examples work without it; we saw linker errors still).

To fix what I understand to be the root cause, we can:

  • Ensure cyrus-sasl-dev (for Alpine, see librdkafka sasl docs for other platforms) is installed in the build and run environment
  • Tell cgo to explicitly link libsasl2.so

I adapted the repro case from the original report for go1.21 + alpine3.18 with the requisite flags:

FROM --platform=linux/$TARGETARCH golang:1.21.4-alpine3.18

ARG TARGETARCH
RUN echo $TARGETARCH

RUN apk update
RUN apk add \
    gcc \
    musl-dev \
    # explicitly install SASL package
    cyrus-sasl-dev

WORKDIR "/code"
ADD . "/code"

RUN CGO_ENABLED=1 \
    GO111MODULE=on \
    GOOS=linux \
    GOARCH=$TARGETARCH \
    # explicitly link to libsasl2 installed as part of cyrus-sasl-dev
    CGO_LDFLAGS="-lsasl2" \
    go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .

This works on my arm64/M1 Mac for TARGETARCH for both arm64 and amd64.

Kim, this is very helpful! Thanks for the research.
One may think, why my minimalist example is not a part of Confluent CI/CD pipeline as it can catch breaking changes.

@kimgr
Copy link

kimgr commented Nov 22, 2023

It turns out the docs at https://github.com/confluentinc/librdkafka/wiki/Using-SASL-with-librdkafka#4-install-sasl-modules-on-client-host say:

Note: librdkafka must be built with SASL support (which is enabled by default if libsasl2-dev is installed at buildtime)

So I think what happened is that @emasab who built librdkafka for 2.3.0 happens to have Cyrus SASL/libsasl2 installed in their environment, and thereby confluent-kafka-go got an indirect dependency on the Cyrus SASL distribution.

I don't know anything about SASL, but it looks like librdkafka has minimal built-in support, so presumably earlier releases happened to build without the Cyrus dependency and only got the base support.

@kimgr
Copy link

kimgr commented Nov 23, 2023

Followup: we actually ran into a problem with the proposed workaround -- CGO_LDFLAGS are injected before the cgo LDFLAGS, and gcc -l switches are sensitive to order (beautifully described here: https://eli.thegreenplace.net/2013/07/09/library-order-in-static-linking).

There's a supremely hacky way to work around this too, using a dangling -Wl,--start-group before -lsasl2;

CGO_LDFLAGS="-Wl,--start-group -lsasl2"

GCC complains with

bin/ld: missing --end-group; added as last command line option

but essentially fixes the unclosed group for you.

@kimgr
Copy link

kimgr commented Nov 24, 2023

And as a final workaround tip: you can use a more modern linker which doesn't have the input order requirements: lld or mold.

Here's a Dockerfile to use mold

FROM --platform=linux/$TARGETARCH golang:1.21.4-alpine3.18

ARG TARGETARCH
RUN echo $TARGETARCH

RUN apk update
RUN apk add \
    gcc \
    # use mold for convenient extra linker inputs
    mold \
    musl-dev \
    # explicitly install SASL package
    cyrus-sasl-dev

WORKDIR "/code"
ADD . "/code"

RUN CGO_ENABLED=1 \
    GO111MODULE=on \
    GOOS=linux \
    GOARCH=$TARGETARCH \
    # explicitly link to libsasl2 installed as part of cyrus-sasl-dev
    CGO_LDFLAGS="-fuse-ld=mold -lsasl2" \
    go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .

This gets rid of the warning from gcc/ld about the unclosed group.

@sagikazarmark
Copy link

@kimgr appreaciate the detailed workarounds.

Unfortunately, the last one does not work for me.

It fails with the following error:

10.59 /usr/local/go/pkg/tool/linux_arm64/link: running aarch64-alpine-linux-musl-clang failed: exit status 1                                                                                                                                                                                                                                                                                                                                                                        
10.59 mold: fatal: library not found: sasl2

I have cyrus-sasl-dev installed.

(An extra piece of information: I use xx to cross-compile which may be an issue here)

#151 might also be related

Based on your earlier comment, however, this might be an issue with the bundled libs, so I'm thinking about building them myself, making sure cyrus-sasl-dev is not present.

If that is the problem, then I believe there should be a patch release fixing the libraries.

@kimgr
Copy link

kimgr commented Jan 13, 2024

@sagikazarmark

I have cyrus-sasl-dev installed.

You mentioned xx. I'm not familiar with it, but I'm assuming you've installed cyrus-sasl-dev using xx-apk in the build context?

https://github.com/tonistiigi/xx?tab=readme-ov-file#xx-apk-xx-apt-xx-apt-get---installing-packages-for-target-architecture

I wonder if a cross linker needs to be used too, or if you can somehow tell mold where to look for libraries for the target architecture.

Sorry, I don't have any clue, really.

@emasab
Copy link
Contributor

emasab commented Apr 1, 2024

Thank you all for raising awareness on this issue.

So I think what happened is that @emasab who built librdkafka for 2.3.0 happens to have Cyrus SASL/libsasl2 installed in their environment, and thereby confluent-kafka-go got an indirect dependency on the Cyrus SASL distribution.

That didn't happen because we configure and build these static binaries in a Semaphore pipeline, not on our laptops. Then we import those binaries locally to push them to confluent-kafka-go.

I believe the issue is here in the release pipeline:

As it should be

                        if attr in a.info and \
                           a.info[attr] == m.attributes[origattr]:

because it's excluding the files the files that have the attribute extra=gssapi.
Given it's not excluding them, depending on the order, the version with libsasl2 or the one without it could be taken.

That explains why the issue is present in 2.1.0 and 2.3.0 but not in 2.2.0 and 2.0.2.
Going to create a PR to fix it before our upcoming 2.4.0 release.

@emasab
Copy link
Contributor

emasab commented Apr 1, 2024

Then we import those binaries locally to push them to confluent-kafka-go.

There's room for security improvements here. We have to make this step run on CI too.

@emasab
Copy link
Contributor

emasab commented Apr 1, 2024

v2.1.1-linux-arm64-musl isn't affected either. But better to use the workaround at take latest fixes in 2.3.0 at the moment.

@emasab
Copy link
Contributor

emasab commented Apr 1, 2024

Confirmed that the only affected ones are these ones, by looking for rdkafka_sasl_cyrus.o in archive files.

  • v2.1.0-linux-arm64-musl
  • v2.3.0-linux-arm64-musl

@emasab
Copy link
Contributor

emasab commented Apr 1, 2024

Raised this PR. And confirmed that the produced binaries don't include rdkafka_sasl_cyrus.o, except for darwin where it's expected to have it.

@milindl
Copy link
Contributor

milindl commented May 24, 2024

Closing this as it's fixed in 2.4.0

@milindl milindl closed this as completed May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants