Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customize shutdown signal to send SIGINT to Postgres #152

Merged
merged 2 commits into from
Oct 7, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
.blobs/
blobs/
.dev_builds/
/.vscode/
dev_releases/
config/dev.yml
config/private.yml
Expand Down
27 changes: 23 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,23 +33,23 @@ repositories:

3. All contributions must be sent using GitHub pull requests as they create a
nice audit trail and structured approach.

The originating github user has to either have a github id on-file with the
list of approved users that have signed the CLA or they can be a public
"member" of a GitHub organization for a group that has signed the corporate
CLA. This enables the corporations to manage their users themselves instead of
having to tell us when someone joins/leaves an organization. By removing a user
from an organization's GitHub account, their new contributions are no longer
approved because they are no longer covered under a CLA.

If a contribution is deemed to be covered by an existing CLA, then it is
analyzed for engineering quality and product fit before merging it.

If a contribution is not covered by the CLA, then the automated CLA system
notifies the submitter politely that we cannot identify their CLA and ask them
to sign either an individual or corporate CLA. This happens automatically as a
comment on pull requests.

When the project receives a new CLA, it is recorded in the project records, the
CLA is added to the database for the automated system uses, then we manually
make the Pull Request as having a CLA on-file.
Expand Down Expand Up @@ -94,3 +94,22 @@ The other dependencies are updated manually using their distributables (be that
source code or, in the case of libseccomp, the amended source code from their
GitHub release). We have an RSS bot in Slack watching these releases so that we
can respond to them quickly.

## Re-generating mocks

1. Install the `mockgen` utility
```bash
go install github.com/golang/mock/mockgen@v1.6.0
```

2. For example, if you need to re-generate the runC lifecycle mocks, the
result is a concatenation of some commented license header with the result of
some `mockgen` invocation. This can be done as follows:
```bash
cd src
cat \
<(sed -e 's|^|// |' -e 's/ $//' bpm/runc/lifecycle/mock_lifecycle/header.txt) \
<(echo) \
<(mockgen -source=bpm/runc/lifecycle/lifecycle.go) \
> bpm/runc/lifecycle/mock_lifecycle/mocks.go
```
1 change: 1 addition & 0 deletions docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ directory of your job.
| `persistent_disk` | boolean | No | Whether or not an persistent disk should be mounted into the container at `/var/vcap/store/JOB`. |
| `additional_volumes` | volume[] | No | A list of additional volumes to mount inside this process. The paths which can be used are restricted (see volume note below). |
| `unsafe` | unsafe | No | The unsafe configuration for this process (see below). |
| `shutdown_signal` | string | No | The first signal to send to the process when trying to shut it down. Can be either `TERM` or `INT`. Defaults to `TERM`. |

[capabilities]: http://man7.org/linux/man-pages/man7/capabilities.7.html

Expand Down
14 changes: 13 additions & 1 deletion src/bpm/commands/stop.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,18 @@ func stop(cmd *cobra.Command, _ []string) error {
logger.Info("starting")
defer logger.Info("complete")

jobCfg, err := bpmCfg.ParseJobConfig()
if err != nil {
logger.Error("failed-to-parse-config", err)
return fmt.Errorf("failed to parse job configuration: %s", err)
}

procCfg, err := processByNameFromJobConfig(jobCfg, procName)
if err != nil {
logger.Error("process-not-defined", err)
return fmt.Errorf("process %q not present in job configuration (%s)", procName, bpmCfg.JobConfig())
}

runcLifecycle, err := newRuncLifecycle()
if err != nil {
return err
Expand All @@ -74,7 +86,7 @@ func stop(cmd *cobra.Command, _ []string) error {
return fmt.Errorf("failed to get job-process status: %s", err)
}

if err := runcLifecycle.StopProcess(logger, bpmCfg, DefaultStopTimeout); err != nil {
if err := runcLifecycle.StopProcess(logger, bpmCfg, procCfg, DefaultStopTimeout); err != nil {
logger.Error("failed-to-stop", err)
}

Expand Down
17 changes: 17 additions & 0 deletions src/bpm/config/job_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import (
yaml "gopkg.in/yaml.v2"

"bpm/bosh"
"bpm/runc/client"
)

type JobConfig struct {
Expand All @@ -44,6 +45,7 @@ type ProcessConfig struct {
PersistentDisk bool `yaml:"persistent_disk"`
WorkDir string `yaml:"workdir"`
Unsafe *Unsafe `yaml:"unsafe"`
ShutdownSignal string `yaml:"shutdown_signal"`
}

type Limits struct {
Expand Down Expand Up @@ -127,9 +129,24 @@ func (c *ProcessConfig) Validate(boshEnv *bosh.Env, defaultVolumes []string) err
}
}

if c.ShutdownSignal != "" && c.ShutdownSignal != "TERM" && c.ShutdownSignal != "INT" {
return fmt.Errorf(
"shutdown signal should either be 'TERM' or 'INT' (or left unspecified), but got '%s'",
c.ShutdownSignal)
}

return nil
}

func (c *ProcessConfig) ParseShutdownSignal() client.Signal {
switch c.ShutdownSignal {
case "INT":
return client.Int
default:
return client.Term
}
}

func (c *ProcessConfig) AddVolumes(
volumes []string,
boshEnv *bosh.Env,
Expand Down
3 changes: 2 additions & 1 deletion src/bpm/integration/bash_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ import (
)

func defaultBash(path string) string {
return fmt.Sprintf(`trap "echo 'Received a Signal' && kill -9 $child" SIGTERM;
return fmt.Sprintf(`trap "echo 'Received a TERM signal' && kill -9 $child" SIGTERM;
trap "echo 'Received an INT signal' && kill -9 $child" SIGINT;
echo $LANG;
echo "Logging to STDOUT";
echo "Logging to STDERR" 1>&2;
Expand Down
27 changes: 22 additions & 5 deletions src/bpm/integration/stop_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ var _ = Describe("stop", func() {
<-session.Exited
Expect(session).To(gexec.Exit(0))

Eventually(fileContents(stdout)).Should(ContainSubstring("Received a Signal"))
Eventually(fileContents(stdout)).Should(ContainSubstring("Received a TERM signal"))
})

It("removes the pid file", func() {
Expand All @@ -111,7 +111,8 @@ var _ = Describe("stop", func() {
<-session.Exited
Expect(session).To(gexec.Exit(0))

Expect(runcCommand(runcRoot, "state", containerID).Run()).To(HaveOccurred())
nonexistentContainerErr := runcCommand(runcRoot, "state", containerID).Run()
Expect(nonexistentContainerErr).To(HaveOccurred())
})

It("removes the bundle directory", func() {
Expand Down Expand Up @@ -173,21 +174,37 @@ var _ = Describe("stop", func() {
})
})

Context("when the job-process doesn't not exist", func() {
Context("when the job-process doesn't not exist and has no config", func() {
BeforeEach(func() {
bpmLog = filepath.Join(boshRoot, "sys", "log", "non-existent", "bpm.log")
})

It("ignores that and is successful", func() {
It("complaints that the config cannot be found", func() {
command := exec.Command(bpmPath, "stop", "non-existent")
command.Env = append(command.Env, fmt.Sprintf("BPM_BOSH_ROOT=%s", boshRoot))

session, err := gexec.Start(command, GinkgoWriter, GinkgoWriter)
Expect(err).ShouldNot(HaveOccurred())
<-session.Exited

Expect(session).To(gexec.Exit(1))
Expect(fileContents(bpmLog)()).To(ContainSubstring("failed-to-parse-config"))
Expect(fileContents(bpmLog)()).To(ContainSubstring("no such file or directory"))
})
})

Context("when the shutdown signal is SIGINT", func() {
BeforeEach(func() {
cfg.Processes[0].ShutdownSignal = "INT"
})

It("signals the container with a SIGINT", func() {
session, err := gexec.Start(command, GinkgoWriter, GinkgoWriter)
Expect(err).ToNot(HaveOccurred())
<-session.Exited
Expect(session).To(gexec.Exit(0))
Expect(fileContents(bpmLog)()).To(ContainSubstring("job-already-stopped"))

Eventually(fileContents(stdout)).Should(ContainSubstring("Received an INT signal"))
})
})
})
3 changes: 3 additions & 0 deletions src/bpm/runc/client/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ type Signal int
const (
Term Signal = iota
Quit
Int
)

func (s Signal) String() string {
Expand All @@ -41,6 +42,8 @@ func (s Signal) String() string {
return "TERM"
case Quit:
return "QUIT"
case Int:
return "INT"
default:
return "unknown"
}
Expand Down
4 changes: 2 additions & 2 deletions src/bpm/runc/lifecycle/lifecycle.go
Original file line number Diff line number Diff line change
Expand Up @@ -231,8 +231,8 @@ func (j *RuncLifecycle) ListProcesses() ([]*models.Process, error) {
return processes, nil
}

func (j *RuncLifecycle) StopProcess(logger lager.Logger, cfg *config.BPMConfig, exitTimeout time.Duration) error {
err := j.runcClient.SignalContainer(cfg.ContainerID(), client.Term)
func (j *RuncLifecycle) StopProcess(logger lager.Logger, cfg *config.BPMConfig, procCfg *config.ProcessConfig, exitTimeout time.Duration) error {
err := j.runcClient.SignalContainer(cfg.ContainerID(), procCfg.ParseShutdownSignal())
if err != nil {
return err
}
Expand Down
34 changes: 29 additions & 5 deletions src/bpm/runc/lifecycle/lifecycle_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -462,10 +462,34 @@ var _ = Describe("RuncJobLifecycle", func() {
Times(1)

setupMockDefaults()
err := runcLifecycle.StopProcess(logger, bpmCfg, exitTimeout)
err := runcLifecycle.StopProcess(logger, bpmCfg, procCfg, exitTimeout)
Expect(err).ToNot(HaveOccurred())
})

Context("when the shutdown signal is SIGINT", func() {
BeforeEach(func() {
procCfg.ShutdownSignal = "INT"
})

It("stops the container with an interrupt signal", func() {
fakeRuncClient.
EXPECT().
ContainerState(expectedContainerID).
Return(&specs.State{
Status: "stopped",
}, nil)

fakeRuncClient.
EXPECT().
SignalContainer(expectedContainerID, client.Int).
Times(1)

setupMockDefaults()
err := runcLifecycle.StopProcess(logger, bpmCfg, procCfg, exitTimeout)
Expect(err).ToNot(HaveOccurred())
})
})

Context("when the container does not stop immediately", func() {
It("polls the container state every second until it stops", func() {
gomock.InOrder(
Expand Down Expand Up @@ -499,7 +523,7 @@ var _ = Describe("RuncJobLifecycle", func() {
)

setupMockDefaults()
err := runcLifecycle.StopProcess(logger, bpmCfg, exitTimeout)
err := runcLifecycle.StopProcess(logger, bpmCfg, procCfg, exitTimeout)
Expect(err).ToNot(HaveOccurred())
})

Expand Down Expand Up @@ -536,7 +560,7 @@ var _ = Describe("RuncJobLifecycle", func() {
)

setupMockDefaults()
err := runcLifecycle.StopProcess(logger, bpmCfg, exitTimeout)
err := runcLifecycle.StopProcess(logger, bpmCfg, procCfg, exitTimeout)
Expect(err).To(MatchError("failed to stop job within timeout"))
})
})
Expand Down Expand Up @@ -575,7 +599,7 @@ var _ = Describe("RuncJobLifecycle", func() {
)

setupMockDefaults()
err := runcLifecycle.StopProcess(logger, bpmCfg, exitTimeout)
err := runcLifecycle.StopProcess(logger, bpmCfg, procCfg, exitTimeout)
Expect(err).To(MatchError("failed to stop job within timeout"))
})
})
Expand All @@ -593,7 +617,7 @@ var _ = Describe("RuncJobLifecycle", func() {

It("returns an error", func() {
setupMockDefaults()
err := runcLifecycle.StopProcess(logger, bpmCfg, exitTimeout)
err := runcLifecycle.StopProcess(logger, bpmCfg, procCfg, exitTimeout)
Expect(err).To(Equal(expectedErr))
})
})
Expand Down
Loading