Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

journald_receiver does not work with the filestorage extension #31476

Closed
lionas32 opened this issue Feb 28, 2024 · 7 comments · Fixed by #31550
Closed

journald_receiver does not work with the filestorage extension #31476

lionas32 opened this issue Feb 28, 2024 · 7 comments · Fixed by #31550
Labels
bug Something isn't working extension/storage/filestorage needs triage New item requiring triage receiver/journald

Comments

@lionas32
Copy link

lionas32 commented Feb 28, 2024

Component(s)

extension/storage/filestorage, receiver/journald

What happened?

Description

journald_receiver does not work as specified with the filestorage extension

Steps to Reproduce

  1. Create an otel collector with the journaldreceiver and filestorageextension
  2. In a config set the journaldreceiver storage attribute to the ID of a filestorage component
  3. Run the collector with the config

Expected Result

The collector runs and stores the cursor

Actual Result

The collector crashes

Collector version

v0.95.0

Environment information

Environment

OS: Ubuntu 22.04
Compiler: go 1.22

OpenTelemetry Collector configuration

extensions:
  file_storage:
    directory: /var/lib/otelcol/file_storage

receivers:
  journald:
    storage: file_storage
    matches:
      - SYSLOG_FACILITY: "1" # user
    priority: debug

processors:
  batch:
    send_batch_size: 5
    send_batch_max_size: 10
    timeout: 5s

exporters:
  debug:
    verbosity: detailed
    sampling_initial: 5
    sampling_thereafter: 5


service:
  extensions: [file_storage]
  pipelines:
    logs/global:
      receivers: [journald]
      processors: [batch]
      exporters: [debug]

Log output

24-02-27T12:53:55.896Z        info    adapter/receiver.go:45  Starting stanza receiver        {"kind": "receiver", "name": "journald", "data_type": "logs"}
2024-02-27T12:53:55.896Z        info    service@v0.95.0/service.go:206  Starting shutdown...
2024-02-27T12:53:55.896Z        info    adapter/receiver.go:140 Stopping stanza receiver        {"kind": "receiver", "name": "journald", "data_type": "logs"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x26e4aba]

goroutine 1 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/journald.(*Input).Stop(0xc000640c60)
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/operator/input/journald/journald.go:351 +0x1a
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/pipeline.(*DirectedPipeline).stop(0x30?)
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/pipeline/directed.go:75 +0x17f
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/pipeline.(*DirectedPipeline).Stop.func1()
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/pipeline/directed.go:48 +0x25
sync.(*Once).doSlow(0x3b5da20?, 0x699cfe0?)
        sync/once.go:74 +0xc2
sync.(*Once).Do(...)
        sync/once.go:65
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/pipeline.(*DirectedPipeline).Stop(0xc000a55d40)
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/pipeline/directed.go:47 +0x85
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/adapter.(*receiver).Shutdown(0xc00095fef0, {0x4800df0, 0x6a5f2c0})
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/adapter/receiver.go:141 +0x6f
go.opentelemetry.io/collector/service/internal/graph.(*Graph).ShutdownAll(0xc000a058c0, {0x4800df0, 0x6a5f2c0})
        go.opentelemetry.io/collector/service@v0.95.0/internal/graph/graph.go:435 +0x1a8
go.opentelemetry.io/collector/service.(*Service).Shutdown(0xc00095f4d0, {0x4800df0, 0x6a5f2c0})
        go.opentelemetry.io/collector/service@v0.95.0/service.go:212 +0xcf
go.opentelemetry.io/collector/otelcol.(*Collector).setupConfigurationComponents(0xc000979c20, {0x4800df0, 0x6a5f2c0})
        go.opentelemetry.io/collector/otelcol@v0.95.0/collector.go:191 +0x705
go.opentelemetry.io/collector/otelcol.(*Collector).Run(0xc000979c20, {0x4800df0, 0x6a5f2c0})
        go.opentelemetry.io/collector/otelcol@v0.95.0/collector.go:229 +0x52
go.opentelemetry.io/collector/otelcol.NewCommand.func1(0xc000232c08, {0x3ebbfb5?, 0x7?, 0x3eb7179?})
        go.opentelemetry.io/collector/otelcol@v0.95.0/command.go:27 +0x6c
github.com/spf13/cobra.(*Command).execute(0xc000232c08, {0xc000052550, 0x2, 0x2})
        github.com/spf13/cobra@v1.8.0/command.go:983 +0xaca
github.com/spf13/cobra.(*Command).ExecuteC(0xc000232c08)
        github.com/spf13/cobra@v1.8.0/command.go:1115 +0x3ff
github.com/spf13/cobra.(*Command).Execute(0x4066380?)
        github.com/spf13/cobra@v1.8.0/command.go:1039 +0x13
main.runInteractive({0x4066380, {{0x3ee0907, 0x13}, {0x3f27603, 0x24}, {0x3eb80d9, 0x5}}, 0x0, {0x0, 0x0}, ...})
        go.opentelemetry.io/collector/cmd/builder/main.go:27 +0x3d
main.run(...)
        go.opentelemetry.io/collector/cmd/builder/main_others.go:10
main.main()
        go.opentelemetry.io/collector/cmd/builder/main.go:20 +0x118

Additional context

Using the ocb builder to build the collector with the mentioned components.

The folder /var/lib/otelcol/file_storage is created with the user running the otel collector having access to it.

@lionas32 lionas32 added bug Something isn't working needs triage New item requiring triage labels Feb 28, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@lionas32 lionas32 changed the title journald_receiver does not work with the filestorage extension to store cursor journald_receiver does not work with the filestorage extension Feb 28, 2024
@djaglowski
Copy link
Member

Based on the stack trace, it seems the receiver is trying to call a non-existent cancel method here. It's not clear to me how this is possible though because we save the cancel function immediately upon start and do nothing else with it until stopping.

Still looking into it but I don't see any obvious connection to the storage extension. @lionas32, can you confirm this only happens when using the storage extension?

@lionas32
Copy link
Author

lionas32 commented Feb 29, 2024

If I remove storage: file_storage in the journald_receiver the collector runs fine. I have a more complex config setup tracking multiple facilities and some extra units, plus with some extra receivers, processors and expoters. This runs fine as well as long as I don't specify the storage attribute for the journald_receiver.

@lionas32
Copy link
Author

lionas32 commented Mar 3, 2024

Printing the error message of the error produced here gives me cannot start pipelines: storage client: open /var/lib/otelcol/file_storage/receiver_journald_: permission denied. Turns out I was missing the execute bit on the /var/lib/otelcol/file_storage folder... 🤦‍♂️I guess this something that should be printed to the user before crashing @djaglowski

@lionas32
Copy link
Author

lionas32 commented Mar 3, 2024

I see that the storage client is being set here, which is also where the above error is generated. Could it be that it never reaches Start function of the journald receiver due to that (which is also where the cancel function is set)?

@djaglowski
Copy link
Member

Thanks for digging into it further to expose that error message.

Of course the error should be shown and no panic should occur. The surprising behavior here is that Shutdown is being called on the journald receiver despite the fact that Start was never called on it. The error from file_storage's Start is resulting in a call to every component's Shutdown, which seems unintended to me. I'll open a ticket to track this, but #31550 should protect against the panic in the meantime and allow the error to propagate correctly.

@lionas32
Copy link
Author

lionas32 commented Mar 4, 2024

Thank you for looking into it! Much appreciated.

DougManton pushed a commit to DougManton/opentelemetry-collector-contrib that referenced this issue Mar 13, 2024
XinRanZhAWS pushed a commit to XinRanZhAWS/opentelemetry-collector-contrib that referenced this issue Mar 13, 2024
ghost pushed a commit to opsramp/opentelemetry-collector-contrib that referenced this issue May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working extension/storage/filestorage needs triage New item requiring triage receiver/journald
Projects
None yet
2 participants