Skip to content

Commit

Permalink
fix(overlord): allow ensure when state is available (#482)
Browse files Browse the repository at this point in the history
Managers can explicitly call ```state.EnsureBefore()``` if they wish to
apply changes sooner than the scheduled 5min ensure overlord timer.

Although managers get a state instance at creation time, during
```overlord.New()```, it is currently forbidden to call ensure early on
before the ```Daemon``` started the ```overlord.Loop()```. The current
restriction stems from the fact that ```state.EnsureBefore()``` adjusts
the ensure timer expiry time, and the timer is only created when
```overlord.Loop()``` is called.

**The problem:**

Since the overlord ensure loop is only started later on during
```daemon.Start()```, a manager has no direct way to know when it's safe
to call Ensure. Managers currently have to use the first call to their
```Ensure()``` method as an indication the overlord loop was entered,
which is unnecessary manager boilerplate code. See
```overlord/checkstate/manager.go```.

Managers (and managers with dependencies on other managers), may want to
implement state changes during callbacks. These events can come from
outside the overlord framework (e.g. from the Linux kernel or early on
during manager ```StartUp```), meaning that handlers using state may be
triggered asynchronously from the Daemon startup sequence, and arrive
before the overlord Loop is started.

**Solution:**

Allow ```state.EnsureBefore()``` to be called before the timer exists.
Since an implicit ensure is performed on overlord loop entry, simply
returning while the timer does not exist is the same as saying an ensure
is already scheduled.
  • Loading branch information
flotter authored Aug 19, 2024
1 parent 069cb81 commit b0d5e05
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 12 deletions.
10 changes: 1 addition & 9 deletions internals/overlord/checkstate/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ import (
"reflect"
"sort"
"sync"
"sync/atomic"

"gopkg.in/tomb.v2"

Expand All @@ -38,8 +37,7 @@ const (

// CheckManager starts and manages the health checks.
type CheckManager struct {
state *state.State
ensureDone atomic.Bool
state *state.State

failureHandlers []FailureFunc

Expand Down Expand Up @@ -87,7 +85,6 @@ func NewManager(s *state.State, runner *state.TaskRunner) *CheckManager {
}

func (m *CheckManager) Ensure() error {
m.ensureDone.Store(true)
return nil
}

Expand Down Expand Up @@ -164,11 +161,6 @@ func (m *CheckManager) PlanChanged(newPlan *plan.Plan) {
shouldEnsure = true
}
}
if !m.ensureDone.Load() {
// Can't call EnsureBefore before Overlord.Loop is running (which will
// call m.Ensure for the first time).
return
}
if shouldEnsure {
m.state.EnsureBefore(0) // start new tasks right away
}
Expand Down
33 changes: 30 additions & 3 deletions internals/overlord/overlord.go
Original file line number Diff line number Diff line change
Expand Up @@ -337,22 +337,49 @@ func (o *Overlord) ensureBefore(d time.Duration) {
o.ensureLock.Lock()
defer o.ensureLock.Unlock()
if o.ensureTimer == nil {
panic("cannot use EnsureBefore before Overlord.Loop")
// While the timer is not setup we have not yet entered the overlord loop.
// Since the overlord loop will unconditionally perform an ensure on entry,
// the ensure is already scheduled.
return
}
now := time.Now()
next := now.Add(d)

// If this requested ensure must take place before the currently scheduled
// ensure time, let's reschedule the pending ensure.
if next.Before(o.ensureNext) {
o.ensureTimer.Reset(d)
o.ensureNext = next
return
}

// Go timers do not take sleep/suspend time into account (CLOCK_MONOTONIC,
// not CLOCK_BOOTTIME). This means that following a wakeup, the timer will
// only then continue to countdown, while the o.ensureNext wallclock time
// could point to a time that already expired.
// https://github.com/golang/go/issues/24595
// 1. https://github.com/canonical/snapd/pull/1150
// 2. https://github.com/canonical/snapd/pull/6472
//
// If we detect a wake-up condition where the scheduled expiry time is in
// the past, let's reschedule the ensure to happen right now.
if o.ensureNext.Before(now) {
// timer already expired, it will be reset in Loop() and
// next Ensure() will be called shortly.
// We have to know if the timer already expired. If this is true then
// it means a channel write has already taken place, and no further
// action is required. The overlord loop will ensure soon after this:
//
// https://go.dev/wiki/Go123Timer:
// < Go 1.23: buffered channel (overlord loop may not have observed yet)
// >= Go 1.23: unbuffered channel (overlord loop already observed)
//
// In both these cases, the overlord loop ensure will still take place.
if !o.ensureTimer.Stop() {
return
}
// Since the scheduled time was reached, and the timer did not expire
// before we called stop, we know that due to a sleep/suspend activity
// the real timer will expire at some arbitrary point in the future.
// Instead, let's get that ensure completed right now.
o.ensureTimer.Reset(0)
o.ensureNext = now
}
Expand Down

0 comments on commit b0d5e05

Please sign in to comment.