Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(resize): recover from part-way failed resizes #623

Merged
merged 3 commits into from
Jun 13, 2024

Conversation

apricote
Copy link
Member

If a previous resize attempt failed after calling POST /v1/volumes/{id}/resize (rate limit, network connectivity, ...), the volume might already have the target size.

The PVC (in Kubernetes) would still have the old size in its status, so the external-resizer makes another attempt at the resize. In the Hetzner Cloud API, a resize must always be larger than the current size, so the new resize failed with:

volume size is too small (invalid_input)

This is now fixed, as we compare the size of the Hetzner Cloud volume to the desired size and only proceed if an actual resize is needed.

This was a violation of the CSI Spec[0], which stated for ControllerExpandVolume:

This operation MUST be idempotent. If a volume corresponding to the specified volume ID is already larger than or equal to the target capacity of the expansion request, the plugin SHOULD reply 0 OK.

[0] https://github.com/container-storage-interface/spec/blob/release-1.9/spec.md#controllerexpandvolume

@apricote apricote added the bug Something isn't working label Jun 13, 2024
@apricote apricote requested a review from a team as a code owner June 13, 2024 09:38
If a previous rescale attempt failed after calling `POST /v1/volumes/{id}/resize`
(rate limit, network connectivity, ...), the volume might already have the target size.

The PVC (in Kubernetes) would still have the old size in its status, so
the `external-resizer` makes another attempt at the resize. In the Hetzner Cloud API,
a resize must always be larger than the current size, so the new resize failed with:

    volume size is too small (invalid_input)

This is now fixed, as we compare the size of the Hetzner Cloud volume to
the desired size and only proceed if an actual resize is needed.

This was a violation of the CSI Spec[0], which stated for
`ControllerExpandVolume`:

> This operation MUST be idempotent. If a volume corresponding to the specified volume ID is already larger than or equal to the target capacity of the expansion request, the plugin SHOULD reply 0 OK.

[0] https://github.com/container-storage-interface/spec/blob/release-1.9/spec.md#controllerexpandvolume

Co-authored-by: Jonas Lammler <jonas.lammler@hetzner-cloud.de>
@apricote apricote force-pushed the fix-volume-resize-recovery branch from 6c9ff48 to 273d911 Compare June 13, 2024 09:45
Copy link

codecov bot commented Jun 13, 2024

Codecov Report

Attention: Patch coverage is 55.00000% with 9 lines in your changes missing coverage. Please review.

Project coverage is 45.16%. Comparing base (487fd6a) to head (0739b2f).

Files Patch % Lines
api/volume.go 61.53% 4 Missing and 1 partial ⚠️
volumes/idempotency.go 42.85% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #623      +/-   ##
==========================================
+ Coverage   43.01%   45.16%   +2.15%     
==========================================
  Files          12       12              
  Lines        1174     1169       -5     
==========================================
+ Hits          505      528      +23     
+ Misses        650      618      -32     
- Partials       19       23       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jooola jooola force-pushed the fix-volume-resize-recovery branch from 0eb6e4d to 0739b2f Compare June 13, 2024 09:59
@apricote
Copy link
Member Author

Approved from my side too, as soon as the e2e tests are green.

@apricote apricote merged commit f9016af into main Jun 13, 2024
8 checks passed
@apricote apricote deleted the fix-volume-resize-recovery branch June 13, 2024 11:37
apricote pushed a commit that referenced this pull request Jun 13, 2024
🤖 I have created a release *beep* *boop*
---


##
[2.7.1](v2.7.0...v2.7.1)
(2024-06-13)


### Bug Fixes

* **resize:** recover from part-way failed resizes
([#623](#623))
([f9016af](f9016af))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
lukasmetzner pushed a commit that referenced this pull request Oct 10, 2024
If a previous resize attempt failed after calling `POST
/v1/volumes/{id}/resize` (rate limit, network connectivity, ...), the
volume might already have the target size.

The PVC (in Kubernetes) would still have the old size in its status, so
the `external-resizer` makes another attempt at the resize. In the
Hetzner Cloud API, a resize must always be larger than the current size,
so the new resize failed with:

    volume size is too small (invalid_input)

This is now fixed, as we compare the size of the Hetzner Cloud volume to
the desired size and only proceed if an actual resize is needed.

This was a violation of the CSI Spec[0], which stated for
`ControllerExpandVolume`:

> This operation MUST be idempotent. If a volume corresponding to the
specified volume ID is already larger than or equal to the target
capacity of the expansion request, the plugin SHOULD reply 0 OK.

[0]
https://github.com/container-storage-interface/spec/blob/release-1.9/spec.md#controllerexpandvolume

---------

Co-authored-by: Jonas Lammler <jonas.lammler@hetzner-cloud.de>
lukasmetzner pushed a commit that referenced this pull request Oct 10, 2024
🤖 I have created a release *beep* *boop*
---


##
[2.7.1](v2.7.0...v2.7.1)
(2024-06-13)


### Bug Fixes

* **resize:** recover from part-way failed resizes
([#623](#623))
([f9016af](f9016af))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants