Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jaeger agent arm support #1656

Closed
jishminor opened this issue Jul 5, 2019 · 12 comments · Fixed by #2176
Closed

Jaeger agent arm support #1656

jishminor opened this issue Jul 5, 2019 · 12 comments · Fixed by #2176
Labels
help wanted Features that maintainers are willing to accept but do not have cycles to implement

Comments

@jishminor
Copy link

Requirement - what kind of business use case are you trying to solve?

Run jaeger agent on armv7l architecture.

Problem - what in Jaeger blocks you from solving the requirement?

Running Jaeger agent on arm causes segfault in the go atomics library.
Line 99 here: https://golang.org/src/runtime/internal/atomic/atomic_arm.go

In go it is the responsibility of the developer to properly pad structs such that they are 64 bit aligned. Issue referenced here: golang/go#11891

Proposal - what do you suggest to solve the problem or improve the existing situation?

I have not identified the exact struct which is not properly mem aligned, but if found, rearranging the ordering of struct fields would solve the problem.

Stack Trace


July 5th 2019, 14:03:27.050 | goroutine 35 [running]:
-- | --

  | July 5th 2019, 14:03:27.050 | u0009/go/src/github.com/jaegertracing/jaeger/vendor/github.com/uber-go/atomic/atomic.go:190

  | July 5th 2019, 14:03:27.050 | github.com/jaegertracing/jaeger/vendor/github.com/uber-go/atomic.(*Uint64).Inc(...)

  | July 5th 2019, 14:03:27.050 | panic: runtime error: invalid memory address or nil pointer dereference

  | July 5th 2019, 14:03:27.050 |  

  | July 5th 2019, 14:03:27.050 | runtime/internal/atomic.goXadd64(0x1d34ef4, 0x1, 0x0, 0x1d83440, 0x88870)

  | July 5th 2019, 14:03:27.050 | u0009/go/src/github.com/jaegertracing/jaeger/vendor/github.com/uber-go/atomic/atomic.go:200

  | July 5th 2019, 14:03:27.050 | github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go.(*PeerList).choosePeer(0x1d65400, 0x0, 0x1d65401, 0x0)

  | July 5th 2019, 14:03:27.050 | u0009/usr/local/go/src/runtime/internal/atomic/atomic_arm.go:99 +0x1c

  | July 5th 2019, 14:03:27.050 | github.com/jaegertracing/jaeger/vendor/github.com/uber-go/atomic.(*Uint64).Add(...)

  | July 5th 2019, 14:03:27.050 | u0009/go/src/github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go/peer.go:213 +0x238

  | July 5th 2019, 14:03:27.050 | [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x120c0]

  | July 5th 2019, 14:03:27.051 | github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go.(*PeerList).GetNew(0x1d65400, 0x0, 0x0, 0x0, 0x0)

  | July 5th 2019, 14:03:27.051 | u0009/go/src/github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go/peer.go:146 +0x24

  | July 5th 2019, 14:03:27.051 | github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go.(*PeerList).Get(0x1d65400, 0x0, 0x1ed200b, 0x895353, 0xd)

  | July 5th 2019, 14:03:27.051 | u0009/go/src/github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go/peer.go:133 +0x98

  | July 5th 2019, 14:03:27.052 | github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go.(*SubChannel).BeginCall(0x1d3b680, 0x9c6d18, 0x1df2450, 0x1ed2000, 0x18, 0x1eccbfc, 0x45892c, 0x0, 0x1eccbb8)

  | July 5th 2019, 14:03:27.052 | u0009/go/src/github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go/subchannel.go:86 +0x48

  | July 5th 2019, 14:03:27.052 | github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go.(*Channel).RunWithRetry(0x1dd4000, 0x9c6d18, 0x1df2450, 0x1dd02c0, 0x0, 0x0)

  | July 5th 2019, 14:03:27.052 | github.com/jaegertracing/jaeger/thrift-gen/jaeger.(*tchanCollectorClient).SubmitBatches(0x1c0dcc0, 0xb6d890d0, 0x1df2450, 0x1df2458, 0x1, 0x1, 0xb6d890d0, 0x9c6818, 0x681f34, 0x82bc28, ...)

  | July 5th 2019, 14:03:27.052 | u0009/go/src/github.com/jaegertracing/jaeger/cmd/agent/app/reporter/tchannel/reporter.go:83 +0x80

  | July 5th 2019, 14:03:27.052 | github.com/jaegertracing/jaeger/thrift-gen/jaeger.(*agentProcessorEmitBatch).Process(0x1e1e160, 0x2, 0x9cebc0, 0x1eb36c0, 0x9cebc0, 0x1eb36c0, 0x0, 0x40fdb0, 0x20)

  | July 5th 2019, 14:03:27.052 | github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go/thrift.(*client).Call(0x1d83460, 0xb6d890d0, 0x1df2450, 0x891a71, 0x9, 0x895353, 0xd, 0x9c1218, 0x1e78070, 0x9c1230, ...)

  | July 5th 2019, 14:03:27.052 | github.com/jaegertracing/jaeger/cmd/agent/app/reporter/tchannel.(*Reporter).submitAndReport(0x1c25830, 0x1e78050, 0x8a31be, 0x1d, 0x1, 0x0, 0x0, 0x0)

  | July 5th 2019, 14:03:27.052 | github.com/jaegertracing/jaeger/cmd/agent/app/reporter.(*MetricsReporter).EmitBatch(0x1c8cd40, 0x1e78020, 0x0, 0x0)

  | July 5th 2019, 14:03:27.052 | u0009/go/src/github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go/thrift/client.go:136 +0x168

  | July 5th 2019, 14:03:27.052 | u0009/go/src/github.com/jaegertracing/jaeger/thrift-gen/jaeger/tchan-jaeger.go:44 +0xc0

  | July 5th 2019, 14:03:27.052 | github.com/jaegertracing/jaeger/cmd/agent/app/reporter/tchannel.(*Reporter).EmitBatch.func1(0xb6d890d0, 0x1df2450, 0x1df2450, 0xb6d890d0)

  | July 5th 2019, 14:03:27.052 | github.com/jaegertracing/jaeger/cmd/agent/app/reporter/tchannel.(*Reporter).EmitBatch(0x1c25830, 0x1e78020, 0x0, 0x1ec601c)

  | July 5th 2019, 14:03:27.052 | u0009/go/src/github.com/jaegertracing/jaeger/cmd/agent/app/reporter/metrics.go:77 +0x4c

  | July 5th 2019, 14:03:27.052 | github.com/jaegertracing/jaeger/thrift-gen/jaeger.(*AgentProcessor).Process(0x1c8d490, 0x9cebc0, 0x1eb36c0, 0x9cebc0, 0x1eb36c0, 0x1, 0x0, 0x0)

  | July 5th 2019, 14:03:27.052 | u0009/go/src/github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go/thrift/client.go:62 +0xc0

  | July 5th 2019, 14:03:27.052 | u0009/go/src/github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go/thrift/client.go:139 +0x10c

  | July 5th 2019, 14:03:27.052 | u0009/go/src/github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go/retry.go:223 +0x15c

  | July 5th 2019, 14:03:27.052 | github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go/thrift.(*client).startCall(0x1d83460, 0x9c6d18, 0x1df2450, 0x1ed2000, 0x18, 0x1eccbfc, 0xd, 0x1ed2000, 0x18)

  | July 5th 2019, 14:03:27.052 | github.com/jaegertracing/jaeger/vendor/github.com/uber/tchannel-go/thrift.(*client).Call.func1(0x9c6d18, 0x1df2450, 0x1def100, 0x82d220, 0x1def100)

  | July 5th 2019, 14:03:27.052 | u0009/go/src/github.com/jaegertracing/jaeger/cmd/agent/app/reporter/tchannel/reporter.go:97 +0xcc

  | July 5th 2019, 14:03:27.052 | u0009/go/src/github.com/jaegertracing/jaeger/cmd/agent/app/reporter/tchannel/reporter.go:86 +0x74

  | July 5th 2019, 14:03:27.052 | u0009/go/src/github.com/jaegertracing/jaeger/thrift-gen/jaeger/agent.go:138 +0xa0

  | July 5th 2019, 14:03:27.053 | u0009/go/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:86 +0x4c

  | July 5th 2019, 14:03:27.053 | u0009/go/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:115 +0x220

  | July 5th 2019, 14:03:27.053 | created by github.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).Serve

  | July 5th 2019, 14:03:27.053 | u0009/go/src/github.com/jaegertracing/jaeger/thrift-gen/jaeger/agent.go:112 +0x250

  | July 5th 2019, 14:03:27.053 | github.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).processBuffer(0x1d2cfc0)

@jishminor
Copy link
Author

jishminor commented Jul 5, 2019

I will additionally note that I have successfully run the agent on arm64.

It is only arm which has the problem.

@yurishkuro
Copy link
Member

I believe there is a linter somewhere that verifies word alignment of struct fields used for synchronization. Having said that, uber-go/atomic looks fine as far as alignment https://github.com/uber-go/atomic/blob/master/atomic.go

@jishminor
Copy link
Author

jishminor commented Jul 5, 2019

Yeah you are right. It's only one value. Alignment should not be an issue.

// Uint64 is an atomic wrapper around a uint64.
type Uint64 struct{ v uint64 }

And the segfault it created in this method:

//go:nosplit
func goXadd64(addr *uint64, delta int64) uint64 {
	if uintptr(unsafe.Pointer(addr))&7 != 0 {
		*(*int)(nil) = 0 // crash on unaligned uint64
	}
	_ = *addr // if nil, fault before taking the lock
	var r uint64
	addrLock(addr).lock()
	r = *addr + uint64(delta)
	*addr = r
	addrLock(addr).unlock()
	return r
}

More information:
In go on 32 bit archs the alignment guarantees for int64 is 4 bytes and 8 bytes for 64 bit archs. Article reference here: https://go101.org/article/memory-layout.html

the size of the built-in type int64 is 8 bytes, the alignment guarantee of type int64 is 4 bytes on 32-bit architectures and 8 bytes on 64-bit architectures.

@arruda
Copy link

arruda commented Feb 5, 2020

Hi there, just a update question:
I've read that recently ubuntu 19.10 is targeting ARM devices such as raspberry pi 4 (arm64).
And that they'll allow microk8 by default, and it appears to have support for Jaeger operator.
Does this implies that jaeger somehow now can run on arm64?
If not, what is the current status of this issue (I couldn't get a clear idea on what state this is from the history here).

@jpkrohling
Copy link
Contributor

@arruda do you have any sources? We do not provide arm64 binaries for the operator (yet?) but I'd be interested in knowing what their plans are.

@arruda
Copy link

arruda commented Feb 5, 2020

@jpkrohling This were the places I've read this (not sure if this implementation indeed have a jaeger operator addon or not, but it sounds like it may have):
https://ubuntu.com/blog/ubuntu-19-10-delivers-kubernetes-at-the-edge-multi-cloud-infrastructure-economics-and-an-integrated-ai-ml-developer-experience

And from another site a bit more detailed on the microk8 having the jaeger addon (just not sure how reliable the info from there is):
http://linuxgizmos.com/ubuntu-19-10-on-the-edge-raspberry-pi-4-support-and-microk8s/

@jpkrohling
Copy link
Contributor

Thanks for the reference! I wasn't aware of it, and it looks like it's indeed using the Jaeger Operator:

https://microk8s.io/docs/addons

As for the status, there is a PR that adds this to our CI, but it hasn't been merged yet: #1973.

@arruda
Copy link

arruda commented Feb 6, 2020

@jpkrohling I just tested building the all-in-one for the arm64 using docker buildx locally, and indeed it works.
I tested running it in my Raspberry Pi 4 as well, and I built from the release tag 1.16.0
Thats some great news =D 👍
I'll just keep using my own built image while the travis CI with the official build is not yet available. Basically I went through all the process in the make build-etc-etc.., but using the the docker buildx targeting arm64

@jishminor
Copy link
Author

I had intentions of contributing the arm64 support to travis CI, but still have yet to get around to #1973.

@MrXinWang
Copy link
Contributor

Hi @jishminor! I am wondering will you propose a PR to add support for arm64 binary and container? If not, since I am working on another project which used jaeger tracing, I have a PR to support it :) Can I propose it? or I wait for yours?

@jishminor
Copy link
Author

@MrXinWang If you have a PR ready to go, go ahead and propose it!

@MrXinWang
Copy link
Contributor

@jishminor Thanks! PR #2176 created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Features that maintainers are willing to accept but do not have cycles to implement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants