Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GID allocator #850

Merged
merged 4 commits into from
Aug 24, 2023
Merged

Conversation

RomanBednar
Copy link
Contributor

@RomanBednar RomanBednar commented Dec 5, 2022

Is this a bug fix or adding new feature?
Bug fix.

What is this PR about? / Why do we need it?

Dynamically provisioning EFS volume creates an access point (AP) for which it allocates a GID from default range or a range defined in SC parameters.

The GID allocator did not check what GIDs might be already in use. For example let's say we have an EFS with access point for /mydir path which is owned by GID 1000 with permissions of 770 that grant rwx access to group 1000. Then we create another access point with same path or "higher" (can be also /) with the same GID of 1000. Now whoever mounts the second access point will have rwx access to /mydir because the GID matches. The old code does not prevent this and so volume creation done by the driver might result in creating access point with a GID that is already used, and when it gets mounted to a pod this pod might get access to other pods data.

Additionally, when user changed the SC parameters it was not reflected in the driver (controller pod) and GID allocator kept allocating from the range that got in first (for a given FS ID). In order for GID range changes to take effect the pod had to be restarted.

What testing is done?

  • current unit tests
  • new unit test added
  • manual testing

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 5, 2022
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 5, 2022
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 8, 2022
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 14, 2022
@RomanBednar RomanBednar changed the title WIP: Fix GID allocator Fix GID allocator Dec 14, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 14, 2022
@RomanBednar
Copy link
Contributor Author

Here is what I've tested manually to verify the patches:

Test GID allocator avoids used GIDS

  1. Create one PVC dynamically with OCP and one directly in AWS cloud with GID of 6999999.
$ aws efs describe-access-points --file-system-id fs-0c1bf51c4cc8a5771
ACCESSPOINTS    arn:aws:elasticfilesystem:us-east-1:269733383066:access-point/fsap-0b38ad3470381476b    fsap-0b38ad3470381476b  pvc-a0e372b1-2f63-4561-a117-56df9df8d693        fs-0c1bf51c4cc8a5771    available                     269733383066
POSIXUSER       7000000 7000000
ROOTDIRECTORY   /dynamic_provisioning/pvc-a0e372b1-2f63-4561-a117-56df9df8d693
CREATIONINFO    7000000 7000000 700
TAGS    efs.csi.aws.com/cluster true
TAGS    kubernetes.io/cluster/rbednar-mycluster-01-l8wjr        owned
ACCESSPOINTS    arn:aws:elasticfilesystem:us-east-1:269733383066:access-point/fsap-051deabd137c8ea32    fsap-051deabd137c8ea32  console-4d9cd57f-8192-4e14-8e2d-eb4144c8d5fb    fs-0c1bf51c4cc8a5771    available             test-ap 269733383066
POSIXUSER       6999999 6999999
SECONDARYGIDS   0
ROOTDIRECTORY   /
TAGS    Name    test-ap
  1. Create a new PVC with dynamic provisioning with the same FS ID

  2. Observe the used GID was detected and next available (6999998) got assigned.

$ aws efs describe-access-points --file-system-id fs-0c1bf51c4cc8a5771
ACCESSPOINTS    arn:aws:elasticfilesystem:us-east-1:269733383066:access-point/fsap-0b38ad3470381476b    fsap-0b38ad3470381476b  pvc-a0e372b1-2f63-4561-a117-56df9df8d693        fs-0c1bf51c4cc8a5771    available                     269733383066
POSIXUSER       7000000 7000000
ROOTDIRECTORY   /dynamic_provisioning/pvc-a0e372b1-2f63-4561-a117-56df9df8d693
CREATIONINFO    7000000 7000000 700
TAGS    efs.csi.aws.com/cluster true
TAGS    kubernetes.io/cluster/rbednar-mycluster-01-l8wjr        owned
ACCESSPOINTS    arn:aws:elasticfilesystem:us-east-1:269733383066:access-point/fsap-051deabd137c8ea32    fsap-051deabd137c8ea32  console-4d9cd57f-8192-4e14-8e2d-eb4144c8d5fb    fs-0c1bf51c4cc8a5771    available             test-ap 269733383066
POSIXUSER       6999999 6999999
SECONDARYGIDS   0
ROOTDIRECTORY   /
TAGS    Name    test-ap
ACCESSPOINTS    arn:aws:elasticfilesystem:us-east-1:269733383066:access-point/fsap-05066babbad2648ac    fsap-05066babbad2648ac  pvc-1e332980-9c4f-41d4-9f03-aa47dd3feb37        fs-0c1bf51c4cc8a5771    available                     269733383066
POSIXUSER       6999998 6999998
ROOTDIRECTORY   /dynamic_provisioning/pvc-1e332980-9c4f-41d4-9f03-aa47dd3feb37
CREATIONINFO    6999998 6999998 700
TAGS    efs.csi.aws.com/cluster true
TAGS    kubernetes.io/cluster/rbednar-mycluster-01-l8wjr        owned

Test that GID allocator can re-use released GIDs

  1. Remove the extra AP (test-ap) that was created manually and create another PVC with dynamic provisioning. Now that the GID (6999999) is released we can see this GID is assigned to the new dynamically provisioned AP because it is now the next available GID.
$ aws efs describe-access-points --file-system-id fs-0c1bf51c4cc8a5771
ACCESSPOINTS    arn:aws:elasticfilesystem:us-east-1:269733383066:access-point/fsap-0b38ad3470381476b    fsap-0b38ad3470381476b  pvc-a0e372b1-2f63-4561-a117-56df9df8d693        fs-0c1bf51c4cc8a5771    available             269733383066
POSIXUSER       7000000 7000000
ROOTDIRECTORY   /dynamic_provisioning/pvc-a0e372b1-2f63-4561-a117-56df9df8d693
CREATIONINFO    7000000 7000000 700
TAGS    efs.csi.aws.com/cluster true
TAGS    kubernetes.io/cluster/rbednar-mycluster-01-l8wjr        owned
ACCESSPOINTS    arn:aws:elasticfilesystem:us-east-1:269733383066:access-point/fsap-05066babbad2648ac    fsap-05066babbad2648ac  pvc-1e332980-9c4f-41d4-9f03-aa47dd3feb37        fs-0c1bf51c4cc8a5771    available             269733383066
POSIXUSER       6999998 6999998
ROOTDIRECTORY   /dynamic_provisioning/pvc-1e332980-9c4f-41d4-9f03-aa47dd3feb37
CREATIONINFO    6999998 6999998 700
TAGS    efs.csi.aws.com/cluster true
TAGS    kubernetes.io/cluster/rbednar-mycluster-01-l8wjr        owned
ACCESSPOINTS    arn:aws:elasticfilesystem:us-east-1:269733383066:access-point/fsap-0d5ffbeba4c86ef2d    fsap-0d5ffbeba4c86ef2d  pvc-a8329ba9-ca9c-4164-b4fb-41fcdaba6014        fs-0c1bf51c4cc8a5771    available             269733383066
POSIXUSER       6999999 6999999
ROOTDIRECTORY   /dynamic_provisioning/pvc-a8329ba9-ca9c-4164-b4fb-41fcdaba6014
CREATIONINFO    6999999 6999999 700
TAGS    efs.csi.aws.com/cluster true
TAGS    kubernetes.io/cluster/rbednar-mycluster-01-l8wjr        owned

Test GID Range propagation

  1. Recreate a storage class with gidRangeStart and gidRangeEnd parameters
$ oc get sc/efs-sc -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2022-12-08T10:48:42Z"
  name: efs-sc
  resourceVersion: "102207"
  uid: 6c66b91c-1aed-4b7e-8d4e-1518748b6e29
mountOptions:
- tls
parameters:
  basePath: /dynamic_provisioning
  directoryPerms: "700"
  fileSystemId: fs-0c1bf51c4cc8a5771
  gidRangeEnd: "2000"
  gidRangeStart: "1000"
  provisioningMode: efs-ap
provisioner: efs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
  1. Without restarting any driver pods create another PVC and validate the new range was propagated correctly and new AP has GID 2000
$ aws efs describe-access-points --file-system-id fs-0c1bf51c4cc8a5771
ACCESSPOINTS    arn:aws:elasticfilesystem:us-east-1:269733383066:access-point/fsap-0b38ad3470381476b    fsap-0b38ad3470381476b  pvc-a0e372b1-2f63-4561-a117-56df9df8d693        fs-0c1bf51c4cc8a5771    available             269733383066
POSIXUSER       7000000 7000000
ROOTDIRECTORY   /dynamic_provisioning/pvc-a0e372b1-2f63-4561-a117-56df9df8d693
CREATIONINFO    7000000 7000000 700
TAGS    efs.csi.aws.com/cluster true
TAGS    kubernetes.io/cluster/rbednar-mycluster-01-l8wjr        owned
ACCESSPOINTS    arn:aws:elasticfilesystem:us-east-1:269733383066:access-point/fsap-05066babbad2648ac    fsap-05066babbad2648ac  pvc-1e332980-9c4f-41d4-9f03-aa47dd3feb37        fs-0c1bf51c4cc8a5771    available             269733383066
POSIXUSER       6999998 6999998
ROOTDIRECTORY   /dynamic_provisioning/pvc-1e332980-9c4f-41d4-9f03-aa47dd3feb37
CREATIONINFO    6999998 6999998 700
TAGS    efs.csi.aws.com/cluster true
TAGS    kubernetes.io/cluster/rbednar-mycluster-01-l8wjr        owned
ACCESSPOINTS    arn:aws:elasticfilesystem:us-east-1:269733383066:access-point/fsap-0d5ffbeba4c86ef2d    fsap-0d5ffbeba4c86ef2d  pvc-a8329ba9-ca9c-4164-b4fb-41fcdaba6014        fs-0c1bf51c4cc8a5771    available             269733383066
POSIXUSER       6999999 6999999
ROOTDIRECTORY   /dynamic_provisioning/pvc-a8329ba9-ca9c-4164-b4fb-41fcdaba6014
CREATIONINFO    6999999 6999999 700
TAGS    efs.csi.aws.com/cluster true
TAGS    kubernetes.io/cluster/rbednar-mycluster-01-l8wjr        owned
ACCESSPOINTS    arn:aws:elasticfilesystem:us-east-1:269733383066:access-point/fsap-04755a3ae8f8b7701    fsap-04755a3ae8f8b7701  pvc-fe36200c-bdb8-49ec-97bc-f9fb4e570627        fs-0c1bf51c4cc8a5771    available             269733383066
POSIXUSER       2000    2000
ROOTDIRECTORY   /dynamic_provisioning/pvc-fe36200c-bdb8-49ec-97bc-f9fb4e570627
CREATIONINFO    2000    2000    700
TAGS    efs.csi.aws.com/cluster true
TAGS    kubernetes.io/cluster/rbednar-mycluster-01-l8wjr        owned

@RomanBednar
Copy link
Contributor Author

@wongma7 @nckturner what do you think about the change?

Dockerfile Outdated
@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

FROM golang:1.17 as builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we separate Go upgrade in a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, opened here: #867

@Ashley-wenyizha
Copy link
Contributor

Hi Roman

Thanks for providing this fix. We are currently going through your code changes internally from EFS.

One thing we want to call out is, also as you pointed out here #693 (comment)

that with the patch, it would check all possible GIDs currently is 120 from given range each time a volume is created and this might not scale well as we have plan to increase this to 1000 early next year. AP increase internally dev is complete but waiting for deployment and official launch.

We will need to do a performance test upon the new limit and get back to you. Sorry about the potential delay, but what to let you know that we are looking into this and having internal tracking as well. Will provide more update upon this once we have more performance testing data.

Thank you!

@RomanBednar
Copy link
Contributor Author

/hold until #867 merged

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 3, 2023
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 7, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 9, 2023
@RomanBednar
Copy link
Contributor Author

Hello @Ashley-wenyizha, just checking in - anything new regarding the performance test?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 8, 2023
@Ashley-wenyizha
Copy link
Contributor

Hi @RomanBednar

Testing looks good, if not hearing back we will fix the merge conflict and merge in early next week.

Thanks for your contribution!

This code is needed to allow listing of access points. This can be used
by GID allocator to avoid assigning GIDs that might already be used.
Internal limit of EFS is 120 Access Points per Filesystem ID. There is
no reason to check the entire GID range specified by user if we can't
allocate those GIDs anyway.

Considering the internal limit there is no need to track the GIDs using
heap structure and we can always look up the next GID from the full
range which won't exceed 120.

Because the limit is relatively small this won't impact performance.
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 21, 2023
@RomanBednar
Copy link
Contributor Author

@Ashley-wenyizha Thank you for the update, I've resolved the conflicts now.

@Ashley-wenyizha
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 21, 2023
@Ashley-wenyizha
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ashley-wenyizha, RomanBednar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 24, 2023
@k8s-ci-robot k8s-ci-robot merged commit 5e44f89 into kubernetes-sigs:master Aug 24, 2023
2 checks passed
@johnpmayer
Copy link

Is the plan to leave ACCESS_POINT_PER_FS_LIMIT at the current value (120)?

I see this comment marked resolved - #850 (comment)

I'd definitely like to increase the value in my deployment. Is there still a performance concern listing large numbers of access points? Can we make this value configurable, so that users can make the trade-off themselves?

@mskanth972
Copy link
Contributor

@johnpmayer, No the comment is just telling that Access points limit has been increased to 1000 from 120 recently.
https://www.amazonaws.cn/en/new/2023/amazon-efs-increases-the-maximum-number-of-access-points-per-file-system/

@johnpmayer
Copy link

johnpmayer commented Sep 8, 2023

I see this is being addressed in #1119

otorreno added a commit to otorreno/aws-efs-csi-driver that referenced this pull request Dec 29, 2023
With kubernetes-sigs#850, the new way of allocating GIDs introduced a new call
to the ListAccessPoints endpoint of EFS API, that is generating
problems on systems where EFS CSI driver is under high load
(lots of PVC created within a short time period). In this PR,
we are extracting the ListAccessPoints call from gid_allocator,
moving it one level up. In case of dynamic provisioining of GIDs
we can reuse the ListAccessPoints call to check for the file
system existence (thus removing the DescribeFileSystem call in
such case). In case of a fixed UID/GID, we continue calling
DescribeFileSystem, and no calls to ListAccessPoints.

In addition to the change explained above, gidMin and gidMax have
been converted to int64. kubernetes-sigs#850 made both uid and gid int64, but
gidMin and gidMax were nout touched. Also changing the default
value for gidMin, as the value of 50000 was spamming the logs
with a message coming from gid_allocator (i.e. range bigger than
the max number of access points). Setting the value to 6999000
(the value that gid_allocator was setting by substracting 1000
to gidMax)
otorreno added a commit to otorreno/aws-efs-csi-driver that referenced this pull request Dec 29, 2023
With kubernetes-sigs#850, the new way of allocating GIDs introduced a new call
to the ListAccessPoints endpoint of EFS API, that is generating
problems on systems where EFS CSI driver is under high load
(lots of PVC created within a short time period). In this PR,
we are extracting the ListAccessPoints call from gid_allocator,
moving it one level up. In case of dynamic provisioining of GIDs
we can reuse the ListAccessPoints call to check for the file
system existence (thus removing the DescribeFileSystem call in
such case). In case of a fixed UID/GID, we continue calling
DescribeFileSystem, and no calls to ListAccessPoints.

In addition to the change explained above, gidMin and gidMax have
been converted to int64. kubernetes-sigs#850 made both uid and gid int64, but
gidMin and gidMax were nout touched. Also changing the default
value for gidMin, as the value of 50000 was spamming the logs
with a message coming from gid_allocator (i.e. range bigger than
the max number of access points). Setting the value to 6999000
(the value that gid_allocator was setting by substracting 1000
to gidMax). Removing an unused function and field from
gid_allocator too.
otorreno added a commit to otorreno/aws-efs-csi-driver that referenced this pull request Dec 29, 2023
With kubernetes-sigs#850, the new way of allocating GIDs introduced a new call
to the ListAccessPoints endpoint of EFS API, that is generating
problems on systems where EFS CSI driver is under high load
(lots of PVC created within a short time period). In this PR,
we are extracting the ListAccessPoints call from gid_allocator,
moving it one level up. In case of dynamic provisioining of GIDs
we can reuse the ListAccessPoints call to check for the file
system existence (thus removing the DescribeFileSystem call in
such case). In case of a fixed UID/GID, we continue calling
DescribeFileSystem, and no calls to ListAccessPoints.

In addition to the change explained above, gidMin and gidMax have
been converted to int64. kubernetes-sigs#850 made both uid and gid int64, but
gidMin and gidMax were nout touched. Also changing the default
value for gidMin, as the value of 50000 was spamming the logs
with a message coming from gid_allocator (i.e. range bigger than
the max number of access points). Setting the value to 6999000
(the value that gid_allocator was setting by substracting 1000
to gidMax). Removing an unused function and field from
gid_allocator too.
otorreno added a commit to otorreno/aws-efs-csi-driver that referenced this pull request Jan 2, 2024
With kubernetes-sigs#850, the new way of allocating GIDs introduced a new call
to the ListAccessPoints endpoint of EFS API, that is generating
problems on systems where EFS CSI driver is under high load
(lots of PVC created within a short time period). In this PR,
we are extracting the ListAccessPoints call from gid_allocator,
moving it one level up. In case of dynamic provisioining of GIDs
we can reuse the ListAccessPoints call to check for the file
system existence (thus removing the DescribeFileSystem call in
such case). In case of a fixed UID/GID, we continue calling
DescribeFileSystem, and no calls to ListAccessPoints.

In addition to the change explained above, gidMin and gidMax have
been converted to int64. kubernetes-sigs#850 made both uid and gid int64, but
gidMin and gidMax were nout touched. Also changing the default
value for gidMin, as the value of 50000 was spamming the logs
with a message coming from gid_allocator (i.e. range bigger than
the max number of access points). Setting the value to 6999000
(the value that gid_allocator was setting by substracting 1000
to gidMax). Removing an unused function and field from
gid_allocator too.
otorreno added a commit to otorreno/aws-efs-csi-driver that referenced this pull request Jan 2, 2024
With kubernetes-sigs#850, the new way of allocating GIDs introduced a new call
to the ListAccessPoints endpoint of EFS API, that is generating
problems on systems where EFS CSI driver is under high load
(lots of PVC created within a short time period). In this PR,
we are extracting the ListAccessPoints call from gid_allocator,
moving it one level up. In case of dynamic provisioining of GIDs
we can reuse the ListAccessPoints call to check for the file
system existence (thus removing the DescribeFileSystem call in
such case). In case of a fixed UID/GID, we continue calling
DescribeFileSystem, and no calls to ListAccessPoints.

In addition to the change explained above, gidMin and gidMax have
been converted to int64. kubernetes-sigs#850 made both uid and gid int64, but
gidMin and gidMax were nout touched. Also changing the default
value for gidMax, as the value of 7000000 was spamming the logs
with a message coming from gid_allocator (i.e. range bigger than
the max number of access points). Setting the value to 51000
(the value that gid_allocator was setting by adding 1000 to
gidMin). Removing an unused function and field from gid_allocator
too.
mskanth972 pushed a commit to mskanth972/aws-efs-csi-driver that referenced this pull request Jan 3, 2024
With kubernetes-sigs#850, the new way of allocating GIDs introduced a new call
to the ListAccessPoints endpoint of EFS API, that is generating
problems on systems where EFS CSI driver is under high load
(lots of PVC created within a short time period). In this PR,
we are extracting the ListAccessPoints call from gid_allocator,
moving it one level up. In case of dynamic provisioining of GIDs
we can reuse the ListAccessPoints call to check for the file
system existence (thus removing the DescribeFileSystem call in
such case). In case of a fixed UID/GID, we continue calling
DescribeFileSystem, and no calls to ListAccessPoints.

In addition to the change explained above, gidMin and gidMax have
been converted to int64. kubernetes-sigs#850 made both uid and gid int64, but
gidMin and gidMax were nout touched. Also changing the default
value for gidMax, as the value of 7000000 was spamming the logs
with a message coming from gid_allocator (i.e. range bigger than
the max number of access points). Setting the value to 51000
(the value that gid_allocator was setting by adding 1000 to
gidMin). Removing an unused function and field from gid_allocator
too.

(cherry picked from commit dbbc733)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants