Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several race conditions in CaptureManager #117

Closed
5 tasks done
fako1024 opened this issue May 9, 2023 · 6 comments · Fixed by #119
Closed
5 tasks done

Several race conditions in CaptureManager #117

fako1024 opened this issue May 9, 2023 · 6 comments · Fixed by #119
Assignees
Labels
bug Something isn't working critical Issues that should be looked at with priority
Milestone

Comments

@fako1024
Copy link
Collaborator

fako1024 commented May 9, 2023

After straightening out a couple of issues found in #88 via fako1024/slimcap#33 running the E2E test on a basic flow (single interface, data being piped through a mock ring buffer source from a pcap file) revealed several data races, all of which seem to be related to the CaptureManager (respectively the state machine of the individual captures). The following ones I have found:

  • Capture process is still running when Free() is called
    Way back I think we had a quick chat about whether it's ensured that the capture is actually stopped when closing() is run. According to the test this doesn't seem to be the case:
WARNING: DATA RACE
Write at 0x00c0000d4940 by goroutine 13:
  github.com/fako1024/slimcap/capture/afpacket/afring.(*MockSource).Free()
      /home/fako/Develop/go/src/github.com/fako1024/slimcap/capture/afpacket/afring/afring_mock.go:329 +0x52
  github.com/els0r/goProbe/pkg/capture.closing()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:409 +0xa3
  github.com/els0r/goProbe/pkg/capture.(*Capture).Run.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:324 +0xc9

Previous read at 0x00c0000d4940 by goroutine 23:
  github.com/fako1024/slimcap/capture/afpacket/afring.(*Source).nextPacket()
      /home/fako/Develop/go/src/github.com/fako1024/slimcap/capture/afpacket/afring/afring.go:331 +0x5e
  github.com/fako1024/slimcap/capture/afpacket/afring.(*Source).NextPacket()
      /home/fako/Develop/go/src/github.com/fako1024/slimcap/capture/afpacket/afring/afring.go:136 +0x57
  github.com/fako1024/slimcap/capture/afpacket/afring.(*MockSource).NextPacket()
      <autogenerated>:1 +0x6e
  github.com/els0r/goProbe/pkg/capture.(*Capture).capturePacket()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:446 +0x99
  github.com/els0r/goProbe/pkg/capture.(*Capture).process()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:514 +0x125
  github.com/els0r/goProbe/pkg/capture.capturing.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:352 +0x39

Goroutine 13 (running) created at:
  github.com/els0r/goProbe/pkg/capture.(*Capture).Run()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:315 +0x8e
  github.com/els0r/goProbe/pkg/capture.(*Manager).enable()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture_manager.go:108 +0x464
  github.com/els0r/goProbe/pkg/capture.(*Manager).Update.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture_manager.go:210 +0x3e
  github.com/els0r/goProbe/pkg/capture.(*RunGroup).Run.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/rungroup.go:26 +0x73

Goroutine 23 (running) created at:
  github.com/els0r/goProbe/pkg/capture.capturing()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:352 +0x15b
  github.com/els0r/goProbe/pkg/capture.(*Capture).Run.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:324 +0xc9
  • Access to flow map length while still running
    It seems that process() is still adding packets to the initial map while rotate() checks if there are any flows present at all (this is probably the easiest thing to fix because it seems to me that this pre-check is simply not guarded by the current rotation copy-on-write magic):
WARNING: DATA RACE
Read at 0x00c000200db0 by goroutine 13:
  github.com/els0r/goProbe/pkg/capture.(*FlowLog).Len()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/flow_log.go:55 +0xc4
  github.com/els0r/goProbe/pkg/capture.(*Capture).rotate()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:534 +0x74
  github.com/els0r/goProbe/pkg/capture.captureCommandRotate.execute()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:203 +0x31
  github.com/els0r/goProbe/pkg/capture.capturing()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:365 +0x2e2
  github.com/els0r/goProbe/pkg/capture.(*Capture).Run.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:324 +0xc9

Previous write at 0x00c000200db0 by goroutine 23:
  runtime.mapassign_faststr()
      /usr/local/go/src/runtime/map_faststr.go:203 +0x0
  github.com/els0r/goProbe/pkg/capture.(*FlowLog).Add()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/flow_log.go:115 +0x16b
  github.com/els0r/goProbe/pkg/capture.(*Capture).capturePacket()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:461 +0x40e
  github.com/els0r/goProbe/pkg/capture.(*Capture).process()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:514 +0x125
  github.com/els0r/goProbe/pkg/capture.capturing.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:352 +0x39

Goroutine 13 (running) created at:
  github.com/els0r/goProbe/pkg/capture.(*Capture).Run()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:315 +0x8e
  github.com/els0r/goProbe/pkg/capture.(*Manager).enable()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture_manager.go:108 +0x464
  github.com/els0r/goProbe/pkg/capture.(*Manager).Update.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture_manager.go:210 +0x3e
  github.com/els0r/goProbe/pkg/capture.(*RunGroup).Run.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/rungroup.go:26 +0x73

Goroutine 23 (running) created at:
  github.com/els0r/goProbe/pkg/capture.capturing()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:352 +0x15b
  github.com/els0r/goProbe/pkg/capture.(*Capture).Run.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:324 +0xc9
  • Writeout handler still processing after / during Close()
    The writeout handler seems to still be attempting to perform its routines while / after the channel is being closed. Probably also not too tricky (I think I saw some random errors regarding sending on closed channel, maybe that's related):
WARNING: DATA RACE
Write at 0x00c000100070 by goroutine 8:
  runtime.closechan()
      /usr/local/go/src/runtime/chan.go:357 +0x0
  github.com/els0r/goProbe/pkg/goprobe/writeout.(*Handler).Close()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/goprobe/writeout/handler.go:57 +0x6e4
  github.com/els0r/goProbe/pkg/e2etest.runGoProbe()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/e2etest/e2e_test.go:231 +0x6c5
  github.com/els0r/goProbe/pkg/e2etest.testE2E()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/e2etest/e2e_test.go:162 +0x429
  github.com/els0r/goProbe/pkg/e2etest.TestE2EBasic()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/e2etest/e2e_test.go:57 +0x165
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1576 +0x216
  testing.(*T).Run.func1()
      /usr/local/go/src/testing/testing.go:1629 +0x47

Previous read at 0x00c000100070 by goroutine 16:
  runtime.chansend()
      /usr/local/go/src/runtime/chan.go:160 +0x0
  github.com/els0r/goProbe/pkg/goprobe/writeout.(*Handler).FullWriteout()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/goprobe/writeout/handler.go:97 +0x2e5
  github.com/els0r/goProbe/pkg/goprobe/writeout.(*Handler).HandleRotations.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/goprobe/writeout/handler.go:160 +0x393

Goroutine 8 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1629 +0x805
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:2036 +0x8d
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1576 +0x216
  testing.runTests()
      /usr/local/go/src/testing/testing.go:2034 +0x87c
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:1906 +0xb44
  github.com/els0r/goProbe/pkg/e2etest.TestMain()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/e2etest/e2e_test.go:522 +0x12e
  main.main()
      _testmain.go:59 +0x324

Goroutine 16 (running) created at:
  github.com/els0r/goProbe/pkg/goprobe/writeout.(*Handler).HandleRotations()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/goprobe/writeout/handler.go:133 +0xfa
  github.com/els0r/goProbe/pkg/e2etest.runGoProbe()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/e2etest/e2e_test.go:225 +0x6a7
  github.com/els0r/goProbe/pkg/e2etest.testE2E()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/e2etest/e2e_test.go:162 +0x429
  github.com/els0r/goProbe/pkg/e2etest.TestE2EBasic()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/e2etest/e2e_test.go:57 +0x165
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1576 +0x216
  testing.(*T).Run.func1()
      /usr/local/go/src/testing/testing.go:1629 +0x47
  • Access to packet stats while capture is running
    There is a race when accessing the (mock) stats in rotate() because packets are being processed (and hence the counter is accessed) at the same time. This is arguably more of a problem in the slimcap mock source (I'll address that one in a separate issue, I guess an atomic counter won't add too much overhead) but I'm still raising it here because it might hint at an issue with the state machine:
WARNING: DATA RACE
Read at 0x00c0000962d8 by goroutine 21:
  github.com/fako1024/slimcap/capture/afpacket/socket.(*MockFileDescriptor).IncrementPacketCount()
      /home/fako/Develop/go/src/github.com/fako1024/slimcap/capture/afpacket/socket/socket_mock.go:48 +0xa9b
  github.com/fako1024/slimcap/capture/afpacket/afring.(*MockSource).addPacket()
      /home/fako/Develop/go/src/github.com/fako1024/slimcap/capture/afpacket/afring/afring_mock.go:159 +0xa69
  github.com/fako1024/slimcap/capture/afpacket/afring.(*MockSource).addPacket-fm()
      <autogenerated>:1 +0x84
  github.com/fako1024/slimcap/capture/pcap.(*Source).NextPacketFn()
      /home/fako/Develop/go/src/github.com/fako1024/slimcap/capture/pcap/pcap.go:150 +0xd4
  github.com/fako1024/slimcap/capture/afpacket/afring.(*MockSource).AddPacketFromSource()
      /home/fako/Develop/go/src/github.com/fako1024/slimcap/capture/afpacket/afring/afring_mock.go:110 +0xc5
  github.com/fako1024/slimcap/capture/afpacket/afring.(*MockSource).Pipe.func1()
      /home/fako/Develop/go/src/github.com/fako1024/slimcap/capture/afpacket/afring/afring_mock.go:193 +0xd6
  github.com/fako1024/slimcap/capture/afpacket/afring.(*MockSource).Pipe.func2()
      /home/fako/Develop/go/src/github.com/fako1024/slimcap/capture/afpacket/afring/afring_mock.go:204 +0x47

Previous write at 0x00c0000962d8 by goroutine 13:
  github.com/fako1024/slimcap/capture/afpacket/socket.(*MockFileDescriptor).GetSocketStats()
      /home/fako/Develop/go/src/github.com/fako1024/slimcap/capture/afpacket/socket/socket_mock.go:59 +0x104
  github.com/fako1024/slimcap/event.(*Handler).GetSocketStats()
      /home/fako/Develop/go/src/github.com/fako1024/slimcap/event/poll.go:81 +0x4f
  github.com/fako1024/slimcap/capture/afpacket/afring.(*Source).Stats()
      /home/fako/Develop/go/src/github.com/fako1024/slimcap/capture/afpacket/afring/afring.go:268 +0xce
  github.com/fako1024/slimcap/capture/afpacket/afring.(*MockSource).Stats()
      <autogenerated>:1 +0x51
  github.com/els0r/goProbe/pkg/capture.(*Capture).tryGetCaptureStats()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:590 +0xc4
  github.com/els0r/goProbe/pkg/capture.(*Capture).rotate()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:556 +0x351
  github.com/els0r/goProbe/pkg/capture.captureCommandRotate.execute()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:203 +0x31
  github.com/els0r/goProbe/pkg/capture.capturing()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:365 +0x2e2
  github.com/els0r/goProbe/pkg/capture.(*Capture).Run.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:324 +0xc9

Goroutine 21 (running) created at:
  github.com/fako1024/slimcap/capture/afpacket/afring.(*MockSource).Pipe()
      /home/fako/Develop/go/src/github.com/fako1024/slimcap/capture/afpacket/afring/afring_mock.go:191 +0x19e
  github.com/els0r/goProbe/pkg/e2etest.newPcapSource.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/e2etest/e2e_test.go:393 +0x754
  github.com/els0r/goProbe/pkg/e2etest.setupSources.func1.1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/e2etest/e2e_test.go:290 +0x93
  github.com/els0r/goProbe/pkg/capture.initializing()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:337 +0xcd
  github.com/els0r/goProbe/pkg/capture.(*Capture).Run.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:324 +0xc9

Goroutine 13 (running) created at:
  github.com/els0r/goProbe/pkg/capture.(*Capture).Run()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:315 +0x8e
  github.com/els0r/goProbe/pkg/capture.(*Manager).enable()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture_manager.go:108 +0x464
  github.com/els0r/goProbe/pkg/capture.(*Manager).Update.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture_manager.go:210 +0x3e
  github.com/els0r/goProbe/pkg/capture.(*RunGroup).Run.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/rungroup.go:26 +0x73
  • Concurrent access to logger
    It seems that there is at least one race condition when interacting with the logger still (although I recall that the new logger should(tm) be race-free and safe for concurrent use):
==================
WARNING: DATA RACE
Read at 0x00c000022228 by goroutine 176:
  github.com/els0r/goProbe/pkg/capture.(*Capture).Status()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:595 +0x64
  github.com/els0r/goProbe/pkg/capture.(*Manager).Status.func2()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture_manager.go:292 +0x67
  github.com/els0r/goProbe/pkg/capture.(*RunGroup).Run.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/rungroup.go:26 +0x73

Previous write at 0x00c000022228 by goroutine 173:
  github.com/els0r/goProbe/pkg/capture.(*Capture).setState()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:273 +0x14f
  github.com/els0r/goProbe/pkg/capture.initializing()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:297 +0x35
  github.com/els0r/goProbe/pkg/capture.(*Capture).Run.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:291 +0xc9

Goroutine 176 (running) created at:
  github.com/els0r/goProbe/pkg/capture.(*RunGroup).Run()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/rungroup.go:24 +0xe4
  github.com/els0r/goProbe/pkg/capture.(*Manager).Status()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture_manager.go:291 +0x138
  github.com/els0r/goProbe/pkg/e2etest.mockIfaces.KillGoProbeOnceDone()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/e2etest/types.go:156 +0x6c
  github.com/els0r/goProbe/pkg/e2etest.runGoProbe.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/e2etest/e2e_test.go:216 +0x64

Goroutine 173 (running) created at:
  github.com/els0r/goProbe/pkg/capture.(*Capture).Run()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:282 +0x8e
  github.com/els0r/goProbe/pkg/capture.(*Manager).enable()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture_manager.go:110 +0x464
  github.com/els0r/goProbe/pkg/capture.(*Manager).Update.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture_manager.go:212 +0x3e
  github.com/els0r/goProbe/pkg/capture.(*RunGroup).Run.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/rungroup.go:26 +0x73
==================
==================
WARNING: DATA RACE
Read at 0x00c0000f4490 by goroutine 176:
  context.(*valueCtx).Value()
      /usr/local/go/src/context/context.go:623 +0x48
  github.com/els0r/goProbe/pkg/logging.getFields()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/logging/init.go:213 +0x7b
  github.com/els0r/goProbe/pkg/logging.fromContext()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/logging/init.go:263 +0x5f
  github.com/els0r/goProbe/pkg/logging.FromContext()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/logging/init.go:289 +0x3e
  github.com/els0r/goProbe/pkg/capture.(*Capture).Status()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:595 +0x84
  github.com/els0r/goProbe/pkg/capture.(*Manager).Status.func2()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture_manager.go:292 +0x67
  github.com/els0r/goProbe/pkg/capture.(*RunGroup).Run.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/rungroup.go:26 +0x73

Previous write at 0x00c0000f4490 by goroutine 173:
  context.WithValue()
      /usr/local/go/src/context/context.go:593 +0x109
  github.com/els0r/goProbe/pkg/logging.WithFields()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/logging/init.go:256 +0x508
  github.com/els0r/goProbe/pkg/capture.(*Capture).setState()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:273 +0x13b
  github.com/els0r/goProbe/pkg/capture.initializing()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:297 +0x35
  github.com/els0r/goProbe/pkg/capture.(*Capture).Run.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:291 +0xc9

Goroutine 176 (running) created at:
  github.com/els0r/goProbe/pkg/capture.(*RunGroup).Run()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/rungroup.go:24 +0xe4
  github.com/els0r/goProbe/pkg/capture.(*Manager).Status()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture_manager.go:291 +0x138
  github.com/els0r/goProbe/pkg/e2etest.mockIfaces.KillGoProbeOnceDone()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/e2etest/types.go:156 +0x6c
  github.com/els0r/goProbe/pkg/e2etest.runGoProbe.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/e2etest/e2e_test.go:216 +0x64

Goroutine 173 (running) created at:
  github.com/els0r/goProbe/pkg/capture.(*Capture).Run()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture.go:282 +0x8e
  github.com/els0r/goProbe/pkg/capture.(*Manager).enable()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture_manager.go:110 +0x464
  github.com/els0r/goProbe/pkg/capture.(*Manager).Update.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/capture_manager.go:212 +0x3e
  github.com/els0r/goProbe/pkg/capture.(*RunGroup).Run.func1()
      /home/fako/Develop/go/src/github.com/els0r/goProbe/pkg/capture/rungroup.go:26 +0x73
==================

@els0r TBH I'm not sure if that's all of them, but I think we'll have to urgently take a look at them (they are not cosmetic, the E2E test randomly fails even without race detection turned on, and since they are not solely related to the mock sources I'm pretty sure we might see weird issues in the wild at some point)... Maybe something we can discuss in a call as well?

@fako1024 fako1024 added bug Something isn't working critical Issues that should be looked at with priority labels May 9, 2023
@fako1024 fako1024 added this to the v4 Release milestone May 9, 2023
@fako1024
Copy link
Collaborator Author

fako1024 commented May 9, 2023

Note: This is on my local working branch of #88 and uses the changes in this PR: fako1024/slimcap#34

@fako1024
Copy link
Collaborator Author

fako1024 commented May 9, 2023

Raised fako1024/slimcap#35 to cover packet counter issue.

@fako1024
Copy link
Collaborator Author

@els0r I was playing around a little: The first two are quite easily solvable without introducing any performance penalty (albeit with a bit of a cringe because the require additional synchronization outside of the existing state machine / run group stuff). The slimcap counter issue I can easily solve as well. The writeout channel stuff is a bit more contrived from what I see. I'll keep you posted...

@fako1024
Copy link
Collaborator Author

fako1024 commented May 10, 2023

More issues (or rather details) found when assessing:

  • Current state logic / counters: There is apparently no way to extract the number of processed packets (except for calling StatusAll() on the CaptureManager, which unfortunately also resets the counters, and if a rotation happens some time in between they get reset as well, making it impossible to actually track what has been processed)...
  • State changes are not atomic / guarded by the existing mutex (at least not everywhere):
func (c *Capture) setState(s State) {
	c.state = s
	c.ctx = logging.WithFields(c.ctx, "state", s.String())

	// log state transition
	logger := logging.FromContext(c.ctx)
	logger.Debugf("interface state transition")
}

I encountered this when trying to extract the state of each capture (not Status, since that also resets the counters) using a RunGroup similar to all the other ***All() methods of the CaptureManager. Also, I'm afraid the concept of capturesCopy does not prevent race conditions.

@els0r
Copy link
Owner

els0r commented May 11, 2023

First off: thanks for looking into this and embracing the ugly. As discussed, let's try to rework this part and keep only what we need.

The thing with the logger is concerning. I wonder if it has to do with using c.ctx vs. a regular context. Definitely something we should look out for when restructuring the code base.

I want to get rid of storing the context inside the structs capture and captureManager and make it part of the function calls whenever possible.

@fako1024
Copy link
Collaborator Author

First off: thanks for looking into this and embracing the ugly. As discussed, let's try to rework this part and keep only what we need.

The thing with the logger is concerning. I wonder if it has to do with using c.ctx vs. a regular context. Definitely something we should look out for when restructuring the code base.

I want to get rid of storing the context inside the structs capture and captureManager and make it part of the function calls whenever possible.

Context makes sense given the stack trace from the race, so maybe that's the culprit indeed. Nice digging!! 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working critical Issues that should be looked at with priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants