Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better messaging for missing volume binaries on host #36280

Merged
merged 1 commit into from
Nov 10, 2016

Conversation

rkouj
Copy link
Contributor

@rkouj rkouj commented Nov 5, 2016

What this PR does / why we need it:
When mount binaries are not present on a host, the error returned is a generic one.
This change is to check the mount binaries before the mount and return a user-friendly error message.

This change is specific to GCI and the flag is experimental now.

#36098

Release note:
Introduces a flag check-node-capabilities-before-mount which if set, enables a check (CanMount()) prior to mount operations to verify that the required components (binaries, etc.) to mount the volume are available on the underlying node. If the check is enabled and CanMount() returns an error, the mount operation fails. Implements the CanMount() check for NFS.


This change is Reviewable

Sample output post change :

rkouj@rkouj0:~/go/src/k8s.io/kubernetes$ kubectl describe pods
Name: sleepyrc-fzhyl
Namespace: default
Node: e2e-test-rkouj-minion-group-oxxa/10.240.0.3
Start Time: Mon, 07 Nov 2016 21:28:36 -0800
Labels: name=sleepy
Status: Pending
IP:
Controllers: ReplicationController/sleepyrc
Containers:
sleepycontainer1:
Container ID:
Image: gcr.io/google_containers/busybox
Image ID:
Port:
Command:
sleep
6000
QoS Tier:
cpu: Burstable
memory: BestEffort
Requests:
cpu: 100m
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
data:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 127.0.0.1
Path: /export
ReadOnly: false
default-token-d13tj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-d13tj
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message


7s 7s 1 {default-scheduler } Normal Scheduled Successfully assigned sleepyrc-fzhyl to e2e-test-rkouj-minion-group-oxxa
6s 3s 4 {kubelet e2e-test-rkouj-minion-group-oxxa} Warning FailedMount Unable to mount volume kubernetes.io/nfs/32c7ef16-a574-11e6-813d-42010af00002-data (spec.Name: data) on pod sleepyrc-fzhyl (UID: 32c7ef16-a574-11e6-813d-42010af00002). Verify that your node machine has the required components before attempting to mount this volume type. Required binary /sbin/mount.nfs is missing

@k8s-github-robot k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Nov 5, 2016
@saad-ali saad-ali assigned saad-ali and unassigned thockin Nov 5, 2016
@@ -77,6 +77,10 @@ type Attributes struct {
type Mounter interface {
// Uses Interface to provide the path for Docker binds.
Volume

//CanMount returns if the volume can be mounted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space at the start of comments.

Also, I recommend being more descriptive on interface comments so that those trying to implement the interface quickly understand everything they need to know to implement the method:
CanMount is called immediately prior to SetUp to check if the required components (binaries, etc.) are available on the underlying node to complete the subsequent SetUp (mount) operation. If CanMount returns false, the mount operation is aborted and an event is generated indicating that the node does not have the required binaries to complete mount. If CanMount returns true, the mount operation continues normally. The CanMount check can be enabled or disabled using the experimental-check-mount-binaries binary flag.

nit: Personal preference, I like to maintain a column width of 80.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@@ -177,6 +177,10 @@ func (b *vsphereVolumeMounter) SetUp(fsGroup *int64) error {
return b.SetUpAt(b.GetPath(), fsGroup)
}

func (b *vsphereVolumeMounter) CanMount() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments to public methods. Even if the existing code isn't great, make the new code you add better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -119,14 +118,17 @@ type OperationExecutor interface {
func NewOperationExecutor(
kubeClient internalclientset.Interface,
volumePluginMgr *volume.VolumePluginMgr,
recorder record.EventRecorder) OperationExecutor {
recorder record.EventRecorder,
checkBinariesBeforeMount bool) OperationExecutor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming sucks but I'm going to recommend checkNodeCapabilitiesBeforeMount instead of checkBinariesBeforeMount in all the files. It is more verbose, but it is less likely to become inaccurate: the checks you have at the moment only look for binaries, but other plugins may want to check for kernel modules or something else in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -877,6 +882,12 @@ func (oe *operationExecutor) generateMountVolumeFunc(
}
}

if oe.checkBinariesBeforeMount && !volumeMounter.CanMount() {
oe.recorder.Eventf(volumeToMount.Pod, api.EventTypeWarning, kevents.FailedMountVolume, "Unable to mount volume %v (spec.Name: %v) on pod %v (UID: %v). Binary required for mounting, does not exist on the node machine. Verify that your node machine has the required binaries before attempting to mount this volume type", volumeToMount.VolumeName, volumeToMount.VolumeSpec.Name(), volumeToMount.Pod.Name, volumeToMount.Pod.UID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating the string 3 times, create it once, and use the variable, so that it is easy to modify and keep consistent:

 errMsg := fmt.Stringf("...", ...)
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an easy one. My bad.

@@ -77,6 +77,10 @@ type Attributes struct {
type Mounter interface {
// Uses Interface to provide the path for Docker binds.
Volume

//CanMount returns if the volume can be mounted
CanMount() bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning a bool, how about returning an error. So that if the check fails plugins can return an error with a message for what exactly is missing. e.g. Required binary /sbin/mount.nfs4 is missing. The caller, operation executor, can then use this message in the user visible error to provide more actionable information.

(Update the comment accordingly).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -207,6 +207,11 @@ func (b *glusterfsMounter) GetAttributes() volume.Attributes {
}
}

func (b *glusterfsMounter) CanMount() bool {
return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also implement the check for Gluster. Check with @jingxu97 what the required client binaries are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will get this change out in a separate PR.

// This flag if set checks for mount binaries on the host
// If the binaries are not present, the mount is failed with
// a specific error message
ExperimentalCheckMountBinariesType bool `json: "experimentalCheckMountBinariesType,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

This flag, if set, enables a check prior to mount operations to verify that the required components (binaries, etc.) to mount the volume are available on the underlying node. If the check is enabled and fails the mount operation fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on options.go and other types.go file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -249,6 +249,7 @@ func (s *KubeletServer) AddFlags(fs *pflag.FlagSet) {
fs.BoolVar(&s.ProtectKernelDefaults, "protect-kernel-defaults", s.ProtectKernelDefaults, "Default kubelet behaviour for kernel tuning. If set, kubelet errors if any of kernel tunables is different than kubelet defaults.")

// Hidden flags for experimental features that are still under development.
fs.BoolVar(&s.ExperimentalCheckMountBinariesType, "experimental-check-mount-binaries", s.ExperimentalCheckMountBinariesType, "[Experimental] if set true, the kubelet will check for binaries on the host before performing the mount")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you have to add the new flag to hack/verify-flags/known-flags.txt or the verify-flags-underscore.py will fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -729,7 +732,8 @@ func NewMainKubelet(kubeCfg *componentconfig.KubeletConfiguration, kubeDeps *Kub
klet.containerRuntime,
kubeDeps.Mounter,
klet.getPodsDir(),
kubeDeps.Recorder)
kubeDeps.Recorder,
kubeCfg.ExperimentalCheckMountBinariesType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is how you can implement the "put the pod in a failed state" functionality:

  • In the volumeManager, introduce a new custom error, NonRecoverableVolumeError.
  • Have operation executor return the new NonRecoverableVolumeError instead of a standard error when the CanMount() check fails.
  • In this file, modify volumeManager.WaitForAttachAndMount to handle the new volumemanager.NonRecoverableVolumeError and when it is detected, in addition to the usual error handling behavior, set the pod status to failed.
    • See rejectPod() method in this file for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will get this change out in a separate PR

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE Node e2e failed for commit 0b844b251315fde32837d683449a3f619da29dfa. Full PR test history.

The magic incantation to run this job again is @k8s-bot node e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@rkouj
Copy link
Contributor Author

rkouj commented Nov 7, 2016

@saad-ali PTAL

@rkouj rkouj force-pushed the better-mount-error branch 2 times, most recently from 5b05da3 to f35c506 Compare November 7, 2016 21:48
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 7, 2016
@@ -393,6 +394,72 @@ var _ = framework.KubeDescribe("Volumes [Feature:Volumes]", func() {
// Must match content of test/images/volumes-tester/nfs/index.html
testVolumeClient(cs, config, volume, nil, "Hello from NFS!")
})

It("should be mountable after checking mount binaries", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment about what this test does and what it expects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also how is this test different from the standard NFS test above and how did you verify that it is working?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is the same as the previous one and I will remove it.
Unnecessary for the current one.

@@ -371,6 +373,9 @@ type operationExecutor struct {

// recorder is used to record events in the API server
recorder record.EventRecorder

//This flag if set checks the binaries before mount
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space at the start of comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkNodeCapabilitiesBeforeMount, if set, enables the CanMount check, which verifies that the components (binaries, etc.) required to mount the volume are available on the underlying node before attempting mount.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -877,6 +882,15 @@ func (oe *operationExecutor) generateMountVolumeFunc(
}
}

canMountErr := volumeMounter.CanMount()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's not even call CanMount() if checkNodeCapabilitiesBeforeMount == false. It doesn't make much of a difference, but in the case that there is a bug in a CanMount() implementation that causes a panic, for example, checkNodeCapabilitiesBeforeMount can then be used to completely avoid the check (vs the way it is CanMount() would always be called).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that in mind previously but didn't change it because it affects the readability of code. (It's not immediately visible if the CanMount() is called in the second statement of the if block and for now since it's unlikely to cause a panic with only the nfs implementation.)

But in the future some other implementation of CanMount() can cause a panic, so I'll move it.

canMountErr := volumeMounter.CanMount()

if oe.checkNodeCapabilitiesBeforeMount && canMountErr != nil {
errMsg := fmt.Sprintf("Unable to mount volume %v (spec.Name: %v) on pod %v (UID: %v). %s", volumeToMount.VolumeName, volumeToMount.VolumeSpec.Name(), volumeToMount.Pod.Name, volumeToMount.Pod.UID, canMountErr.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put the bulk of the common message here, and have plugins only return only the description of what exactly is missing:

Unable to mount volume %v (spec.Name: %v) on pod %v (UID: %v). Components required for mounting do not exist on the node machine: %v. Verify that your node machine has the required components before attempting to mount this volume type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -326,6 +326,10 @@ func (_ *FakeVolume) GetAttributes() Attributes {
}
}

func (fv *FakeVolume) CanMount() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated for new interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// If not, it returns an error
func (nfsMounter *nfsMounter) CanMount() error {
exe := exec.New()
missingBinaryMsg := "Binary required for mounting, does not exist on the node machine. Verify that your node machine has the required binaries before attempting to mount this volume type"
Copy link
Member

@saad-ali saad-ali Nov 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the missingBinaryMsg here, and have the plugins only return exactly what is missing. e.g. /sbin/mount_nfs is missing (since it will otherwise be duplicated in each plugin).

Copy link
Contributor Author

@rkouj rkouj Nov 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent behind this message is that it would change with each plugin and I wanted to make this nfs specific (since it has the word binaries in it).

Like you mentioned in the previous comment that for different plugins, the mount capabilities might not necessarily depend on binaries, each plugin would describe what is missing and based on the error, there would a specific message as to which binary was missing.

I have changed it now.

@@ -323,7 +323,7 @@ func Test_Run_Positive_VolumeUnmountControllerAttachEnabled(t *testing.T) {
asw := cache.NewActualStateOfWorld(nodeName, volumePluginMgr)
kubeClient := createTestClient()
fakeRecorder := &record.FakeRecorder{}
oex := operationexecutor.NewOperationExecutor(kubeClient, volumePluginMgr, fakeRecorder)
oex := operationexecutor.NewOperationExecutor(kubeClient, volumePluginMgr, fakeRecorder, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when it is not obvious what a variable is that you are passing, stick the parameter name as a comment next to it. e.g.:

NewOperationExecutor(kubeClient, volumePluginMgr, fakeRecorder, false /* checkNodeCapabilitiesBeforeMount */)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -719,6 +719,9 @@ func NewMainKubelet(kubeCfg *componentconfig.KubeletConfiguration, kubeDeps *Kub
return nil, err
}

if len(kubeCfg.ExperimentalMounterPath) != 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add a comment explaining what this is doing and why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rkouj rkouj force-pushed the better-mount-error branch 5 times, most recently from e941460 to a6de94f Compare November 7, 2016 23:53
@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GKE smoke e2e failed for commit e94146089fc98d21b1125ac7f59475f031e378bd. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@rkouj rkouj force-pushed the better-mount-error branch 3 times, most recently from 02dce93 to 1a94fdf Compare November 8, 2016 00:58
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 8, 2016
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 8, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCE e2e failed for commit ae2598b7370794455a3b8a4b4ab41dd811b2e940. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins Kubemark GCE e2e failed for commit ae2598b7370794455a3b8a4b4ab41dd811b2e940. Full PR test history.

The magic incantation to run this job again is @k8s-bot kubemark e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GCE e2e failed for commit ae2598b7370794455a3b8a4b4ab41dd811b2e940. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 9, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins unit/integration failed for commit 70f250d42dfcd4a16e9cda597491e388fffe8c8e. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@rkouj rkouj force-pushed the better-mount-error branch 2 times, most recently from 6390868 to ad6bfa4 Compare November 9, 2016 22:56
@k8s-ci-robot
Copy link
Contributor

Jenkins GKE smoke e2e failed for commit ad6bfa4d01b8c2fc3b91e50b0b3dbcc39020315e. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE etcd3 e2e failed for commit ad6bfa4d01b8c2fc3b91e50b0b3dbcc39020315e. Full PR test history.

The magic incantation to run this job again is @k8s-bot gce etcd3 e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins verification failed for commit d81e216. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@saad-ali
Copy link
Member

@k8s-bot verify test this

@saad-ali saad-ali added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 10, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 0f082c6 into kubernetes:master Nov 10, 2016
@jessfraz jessfraz added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Nov 10, 2016
@jsafrane jsafrane assigned jsafrane and unassigned jsafrane Nov 14, 2016
k8s-github-robot pushed a commit that referenced this pull request Nov 14, 2016
Automatic merge from submit-queue

Implement CanMount() for gfsMounter for linux

**What this PR does / why we need it**:
To implement CanMount() check for glusterfs. If mount binaries are not present on the underlying node, the mount will not proceed and return an error message stating so.

Related to issue : #36098


Related to similar change for NFS : 
#36280

**Release note**:
`Check binaries for GlusterFS on the underlying node before doing mount`





Sample output from testing in GCE/GCI:

rkouj@rkouj0:~/go/src/k8s.io/kubernetes$ kubectl describe pods
Name:		glusterfs
Namespace:	default
Node:		e2e-test-rkouj-minion-group-kjq3/10.240.0.3
Start Time:	Fri, 11 Nov 2016 17:22:04 -0800
Labels:		<none>
Status:		Pending
IP:		
Controllers:	<none>
Containers:
  glusterfs:
    Container ID:	
    Image:		gcr.io/google_containers/busybox
    Image ID:		
    Port:		
    QoS Tier:
      cpu:	Burstable
      memory:	BestEffort
    Requests:
      cpu:		100m
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Environment Variables:
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  glusterfs:
    Type:		Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
    EndpointsName:	glusterfs-cluster
    Path:		kube_vol
    ReadOnly:		true
  default-token-2zcao:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-2zcao
Events:
  FirstSeen	LastSeen	Count	From						SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----						-------------	--------	------		-------
  8s		8s		1	{default-scheduler }						Normal		Scheduled	Successfully assigned glusterfs to e2e-test-rkouj-minion-group-kjq3
  7s		4s		4	{kubelet e2e-test-rkouj-minion-group-kjq3}			Warning		FailedMount	Unable to mount volume kubernetes.io/glusterfs/6bb04587-a876-11e6-a712-42010af00002-glusterfs (spec.Name: glusterfs) on pod glusterfs (UID: 6bb04587-a876-11e6-a712-42010af00002). Verify that your node machine has the required components before attempting to mount this volume type. Required binary /sbin/mount.glusterfs is missing
@jessfraz
Copy link
Contributor

@rkouj there are a lot of conflicts with this cherry-pick would you mind opening it since you are more familiar with the code?

@saad-ali
Copy link
Member

@rkouj there are a lot of conflicts with this cherry-pick would you mind opening it since you are more familiar with the code?

@jessfraz, @rkouj plans to prepare a cherry pick as soon as @jingxu97's mounter changes are in 1.4 (this PR depends on that). If @jingxu97's changes look like they may miss 1.4.7 (for any reason), then @rkouj will cherry pick this PR anyway (working around the dependency on @jingxu97's mounter PR).

This PR must be cherry-picked along with #36686

@jessfraz
Copy link
Contributor

jessfraz commented Dec 2, 2016

cool looks like the mounter stuff is in so ping @rkouj for cherry-pick :)

@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.4" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

dims pushed a commit to dims/kubernetes that referenced this pull request Feb 8, 2018
Automatic merge from submit-queue

Better messaging for missing volume binaries on host

**What this PR does / why we need it**:
When mount binaries are not present on a host, the error returned is a generic one.
This change is to check the mount binaries before the mount and return a user-friendly error message.

This change is specific to GCI and the flag is experimental now.

kubernetes#36098

**Release note**:
Introduces a flag `check-node-capabilities-before-mount` which if set, enables a check (`CanMount()`) prior to mount operations to verify that the required components (binaries, etc.) to mount the volume are available on the underlying node. If the check is enabled and `CanMount()` returns an error, the mount operation fails. Implements the `CanMount()` check for NFS.















Sample output post change :


rkouj@rkouj0:~/go/src/k8s.io/kubernetes$ kubectl describe pods
Name:		sleepyrc-fzhyl
Namespace:	default
Node:		e2e-test-rkouj-minion-group-oxxa/10.240.0.3
Start Time:	Mon, 07 Nov 2016 21:28:36 -0800
Labels:		name=sleepy
Status:		Pending
IP:		
Controllers:	ReplicationController/sleepyrc
Containers:
  sleepycontainer1:
    Container ID:	
    Image:		gcr.io/google_containers/busybox
    Image ID:		
    Port:		
    Command:
      sleep
      6000
    QoS Tier:
      cpu:	Burstable
      memory:	BestEffort
    Requests:
      cpu:		100m
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Environment Variables:
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  data:
    Type:	NFS (an NFS mount that lasts the lifetime of a pod)
    Server:	127.0.0.1
    Path:	/export
    ReadOnly:	false
  default-token-d13tj:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-d13tj
Events:
  FirstSeen	LastSeen	Count	From						SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----						-------------	--------	------		-------
  7s		7s		1	{default-scheduler }						Normal		Scheduled	Successfully assigned sleepyrc-fzhyl to e2e-test-rkouj-minion-group-oxxa
  6s		3s		4	{kubelet e2e-test-rkouj-minion-group-oxxa}			Warning		FailedMount	Unable to mount volume kubernetes.io/nfs/32c7ef16-a574-11e6-813d-42010af00002-data (spec.Name: data) on pod sleepyrc-fzhyl (UID: 32c7ef16-a574-11e6-813d-42010af00002). Verify that your node machine has the required components before attempting to mount this volume type. Required binary /sbin/mount.nfs is missing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants