Skip to content

Commit

Permalink
Fix detection if the collector is running as service on Windows (open…
Browse files Browse the repository at this point in the history
…-telemetry#9042)

**Description:**
Fix run as a service detection on Windows. Instead of trying to detect
if it is a service or not, for which both `svc.IsAnInteractiveSession()`
and `svc.IsWindowsService()` are somehow broken, try to run as a Windows
service, if that fails fallback to run as an interactive process. This
follows a recommendation of the Windows [service API
documentation](https://learn.microsoft.com/en-us/windows/win32/api/winsvc/nf-winsvc-startservicectrldispatchera#return-value).
The new code calls `svc.Run` and in case of error checks for
`windows.ERROR_FAILED_SERVICE_CONTROLLER_CONNECT`. If this is the error
the application can safely assume that it is not running as a service.

The duration of a call to `svc.Run` failing with this error was below 3
microseconds in the current GH runner and below 5 microseconds on my
box. While this value seems fine for startup I'm keeping the
`NO_WINDOWS_SERVICE` option instead of deprecating it (it doesn't seem
worth the trouble of deprecating it).

**Link to tracking Issue:**
Fix open-telemetry#7350

**Testing:**
Fix tested on the Splunk fork that deploys the collector as a service
and as an interactive process on Windows containers.

**Documentation:**
Added changelog.
  • Loading branch information
pjanotti authored and sokoide committed Dec 18, 2023
1 parent 1c6340e commit 241ee20
Show file tree
Hide file tree
Showing 4 changed files with 46 additions and 57 deletions.
25 changes: 25 additions & 0 deletions .chloggen/fix-running-as-service-detection.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: bug_fix

# The name of the component, or a single word describing the area of concern, (e.g. otlpreceiver)
component: cmd/otelcorecol

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Fix the code detecting if the collector is running as a service on Windows.

# One or more tracking issues or pull requests related to the change
issues: [7350]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: Removed the `NO_WINDOWS_SERVICE` environment variable given it is not needed anymore.

# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
38 changes: 10 additions & 28 deletions cmd/builder/internal/builder/templates/main_windows.go.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -6,41 +6,23 @@
package main

import (
"errors"
"fmt"
"os"
"golang.org/x/sys/windows"
"golang.org/x/sys/windows/svc"
"go.opentelemetry.io/collector/otelcol"
)

func run(params otelcol.CollectorSettings) error {
if useInteractiveMode, err := checkUseInteractiveMode(); err != nil {
return err
} else if useInteractiveMode {
return runInteractive(params)
} else {
return runService(params)
}
}

func checkUseInteractiveMode() (bool, error) {
// If environment variable NO_WINDOWS_SERVICE is set with any value other
// than 0, use interactive mode instead of running as a service. This should
// be set in case running as a service is not possible or desired even
// though the current session is not detected to be interactive
if value, present := os.LookupEnv("NO_WINDOWS_SERVICE"); present && value != "0" {
return true, nil
}

isInteractiveSession, err := svc.IsAnInteractiveSession()
if err != nil {
return false, fmt.Errorf("failed to determine if we are running in an interactive session: %w", err)
}
return isInteractiveSession, nil
}

func runService(params otelcol.CollectorSettings) error {
// do not need to supply service name when startup is invoked through Service Control Manager directly
// No need to supply service name when startup is invoked through
// the Service Control Manager directly.
if err := svc.Run("", otelcol.NewSvcHandler(params)); err != nil {
if errors.Is(err, windows.ERROR_FAILED_SERVICE_CONTROLLER_CONNECT) {
// Per https://learn.microsoft.com/en-us/windows/win32/api/winsvc/nf-winsvc-startservicectrldispatchera#return-value
// this means that the process is not running as a service, so run interactively.
return runInteractive(params)
}

return fmt.Errorf("failed to start collector server: %w", err)
}

Expand Down
38 changes: 10 additions & 28 deletions cmd/otelcorecol/main_windows.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,7 @@ configuration issue. This could be due to a firewall, DNS, or proxy
issue. Note that the Collector does have
[proxy support](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter#proxy-support).

### Startup failing in Windows Docker containers
### Startup failing in Windows Docker containers (v0.90.1 and earlier)

The process may fail to start in a Windows Docker container with the following
error: `The service process could not connect to the service controller`. In
Expand Down

0 comments on commit 241ee20

Please sign in to comment.