Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race between Agent.Run() and Agent.Stop() #1624

Closed
tigrannajaryan opened this issue Jun 21, 2019 · 1 comment
Closed

Data race between Agent.Run() and Agent.Stop() #1624

tigrannajaryan opened this issue Jun 21, 2019 · 1 comment

Comments

@tigrannajaryan
Copy link
Contributor

tigrannajaryan commented Jun 21, 2019

Requirement - what kind of business use case are you trying to solve?

Agent.Stop() must not fail if called after Agent.Run() returns.

Problem - what in Jaeger blocks you from solving the requirement?

Agent.Stop() panics or shows a data race during tests.

Proposal - what do you suggest to solve the problem or improve the existing situation?

Fix the data race. Make sure Stop() waits until effects of Agent.Run() are finished or interrupts them gracefully. Alternatively make Agent.Run() synchronous so that it waits until all starting effects are complete.

The root cause of the race is that Stop() calls WaitGroup.Wait() before Run() calls WaitGroup.Add() for the first time which is prohibited: https://golang.org/pkg/sync/#WaitGroup.Add

Reproduction

Add the following test to /github.com/jaegertracing/jaeger/cmd/agent/app/agent_test.go:

func TestStartStopRace(t *testing.T) {
	resetDefaultPrometheusRegistry()
	cfg := Builder{
		Processors: []ProcessorConfiguration{
			{
				Model:    jaegerModel,
				Protocol: compactProtocol,
				Server: ServerConfiguration{
					HostPort: "127.0.0.1:0",
				},
			},
		},
	}
	logger, logBuf := testutils.NewLogger()
	mBldr := &jmetrics.Builder{HTTPRoute: "/metrics", Backend: "prometheus"}
	mFactory, err := mBldr.CreateMetricsFactory("jaeger")
	require.NoError(t, err)
	agent, err := cfg.CreateAgent(fakeCollectorProxy{}, logger, mFactory)
	require.NoError(t, err)
	if err := agent.Run(); err != nil {
		t.Errorf("error from agent.Run(): %s", err)
	}

	agent.Stop()

	for i := 0; i < 1000; i++ {
		if strings.Contains(logBuf.String(), "agent's http server exiting") {
			return
		}
		time.Sleep(time.Millisecond)
	}
	t.Fatal("Expecting server exit log")
}

and run go test -v -race github.com/jaegertracing/jaeger/cmd/agent/app

The output shows a data race:

==================
WARNING: DATA RACE
Write at 0x00c00018a8f8 by goroutine 19:
  sync.(*WaitGroup).Wait()
      /usr/local/go/src/internal/race/race.go:41 +0xef
  github.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).Stop()
      /home/tigran/go/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:102 +0x11d

Previous read at 0x00c00018a8f8 by goroutine 18:
  sync.(*WaitGroup).Add()
      /usr/local/go/src/internal/race/race.go:37 +0x169
  github.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).Serve()
      /home/tigran/go/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:84 +0x62

Goroutine 19 (running) created at:
  github.com/jaegertracing/jaeger/cmd/agent/app.(*Agent).Stop()
      /home/tigran/go/src/github.com/jaegertracing/jaeger/cmd/agent/app/agent.go:88 +0xa1
  github.com/jaegertracing/jaeger/cmd/agent/app.TestStartStopRace()
      /home/tigran/go/src/github.com/jaegertracing/jaeger/cmd/agent/app/agent_test.go:176 +0x4ca
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:865 +0x163

Goroutine 18 (running) created at:
  github.com/jaegertracing/jaeger/cmd/agent/app.(*Agent).Run()
      /home/tigran/go/src/github.com/jaegertracing/jaeger/cmd/agent/app/agent.go:75 +0x2bf
  github.com/jaegertracing/jaeger/cmd/agent/app.TestStartStopRace()
      /home/tigran/go/src/github.com/jaegertracing/jaeger/cmd/agent/app/agent_test.go:172 +0x442
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:865 +0x163
==================
--- FAIL: TestStartStopRace (0.01s)
    testing.go:809: race detected during execution of test
pavolloffay pushed a commit that referenced this issue Jun 26, 2019
* Fix data race between Agent.Run() and Agent.Stop()

We had a data race with Agent.Stop() failing if called immediately
after Agent.Run() returns.

The root cause of the race was that Stop() called WaitGroup.Wait()
before Run() called WaitGroup.Add() for the first time which is
prohibited: https://golang.org/pkg/sync/#WaitGroup.Add

This change moves WaitGroup.Add() to earlier stage which guarantees
that WaitGroup.Wait() will be called after that.

Github issue: #1624

Testing done:

Added automated tests which was failing before the bug was fixed
and does not fail any more after the fix.

Also verified that `make test` passes.

Signed-off-by: Tigran Najaryan <tigran@najaryan.net>

* Fix based on PR comments

Signed-off-by: Tigran Najaryan <tigran@najaryan.net>
@pavolloffay
Copy link
Member

done in #1625

outdoorSpirit pushed a commit to outdoorSpirit/Go-Jag that referenced this issue May 3, 2024
* Fix data race between Agent.Run() and Agent.Stop()

We had a data race with Agent.Stop() failing if called immediately
after Agent.Run() returns.

The root cause of the race was that Stop() called WaitGroup.Wait()
before Run() called WaitGroup.Add() for the first time which is
prohibited: https://golang.org/pkg/sync/#WaitGroup.Add

This change moves WaitGroup.Add() to earlier stage which guarantees
that WaitGroup.Wait() will be called after that.

Github issue: jaegertracing/jaeger#1624

Testing done:

Added automated tests which was failing before the bug was fixed
and does not fail any more after the fix.

Also verified that `make test` passes.

Signed-off-by: Tigran Najaryan <tigran@najaryan.net>

* Fix based on PR comments

Signed-off-by: Tigran Najaryan <tigran@najaryan.net>
laursisask pushed a commit to dropbox-code/repo-3 that referenced this issue Oct 15, 2024
* Fix data race between Agent.Run() and Agent.Stop()

We had a data race with Agent.Stop() failing if called immediately
after Agent.Run() returns.

The root cause of the race was that Stop() called WaitGroup.Wait()
before Run() called WaitGroup.Add() for the first time which is
prohibited: https://golang.org/pkg/sync/#WaitGroup.Add

This change moves WaitGroup.Add() to earlier stage which guarantees
that WaitGroup.Wait() will be called after that.

Github issue: jaegertracing/jaeger#1624

Testing done:

Added automated tests which was failing before the bug was fixed
and does not fail any more after the fix.

Also verified that `make test` passes.

Signed-off-by: Tigran Najaryan <tigran@najaryan.net>

* Fix based on PR comments

Signed-off-by: Tigran Najaryan <tigran@najaryan.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants