Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: relax identifier limits #3279

Merged
merged 1 commit into from
Aug 17, 2024

Conversation

Shubhranshu153
Copy link
Contributor

@Shubhranshu153 Shubhranshu153 commented Aug 6, 2024

Description

Some of users of finch are reporting max length of 76 characters, which seems to stem from this validation
https://github.com/containerd/containerd/blob/92900bf7300b4872db2a43e82128fda9cde08a79/pkg/identifiers/validate.go#L58

Docker skips this validation via containerd library and does its own regex validation

What changed

This change moves the validation function to nerdctl which would give greater control over it. This change modifies identifier validation for the following:

  • volume create
  • container create
  • network create
  • namestore
  • composer new project
  • buildargs secret

This brings these identifier compatible to docker but doesn't add checks for others. Docker seems to have additional checks, of which not all are required but some might be needed to be ported.

Test

  1. local unit test

pkg/namestore/namestore.go Outdated Show resolved Hide resolved
@apostasie
Copy link
Contributor

Hey @Shubhranshu153

So, identifiers no longer have a length limit, is that correct?

Isn't this going to be problematic for anything that will end-up being used on the filesystem? Particularly Namestore etc.

I am not a windows expert, though some liminary googling suggests that anything over 256 chars might spell trouble.

@Shubhranshu153
Copy link
Contributor Author

Hey @Shubhranshu153

So, identifiers no longer have a length limit, is that correct?

Isn't this going to be problematic for anything that will end-up being used on the filesystem? Particularly Namestore etc.

I am not a windows expert, though some liminary googling suggests that anything over 256 chars might spell trouble.

yes that is correct,
and yes windows has a 256 char limit and linux has 4095 for file paths, may be we can have a selective check based on OS.
File path validation can be limited to mount points though. But for lets say container name same identifier may not be needed

@pendo324
Copy link
Contributor

pendo324 commented Aug 7, 2024

Just noting that the Windows file path length limit can be extended to (2^15) - 1 characters in recent versions (source), but its opt-in for users

@apostasie
Copy link
Contributor

apostasie commented Aug 7, 2024

Hey @Shubhranshu153
So, identifiers no longer have a length limit, is that correct?
Isn't this going to be problematic for anything that will end-up being used on the filesystem? Particularly Namestore etc.
I am not a windows expert, though some liminary googling suggests that anything over 256 chars might spell trouble.

yes that is correct, and yes windows has a 256 char limit and linux has 4095 for file paths, may be we can have a selective check based on OS. File path validation can be limited to mount points though. But for lets say container name same identifier may not be needed

We might want to make the effort and move to using shas on the filesystem so that we can allow random identifiers, but that is likely a lot of work to get there and support existing stuff.

Meanwhile, I think we need to be mindful about what identifiers we are relaxing, depending on whether they are going to end-up on the filesystem...

I think you mentioned some of (finch?) users were facing issues - so, is there a particular type of identifier (network name? image name? container name?) that is most problematic for them?

@Shubhranshu153
Copy link
Contributor Author

@apostasie
have seen for volume names majorly. They tend to be longer, we can go the route of having less stringent rules for volume names and other stuffs we can keep the same and probably use 255 for file names.

We can put it within a dockercompat flag to be more sure about it.

@Shubhranshu153
Copy link
Contributor Author

May be the simplest thing would be to increase the limits of all to 255 as that seems to be able to support most of the use cases

@apostasie
Copy link
Contributor

@apostasie have seen for volume names majorly. They tend to be longer, we can go the route of having less stringent rules for volume names and other stuffs we can keep the same and probably use 255 for file names.

We can put it within a dockercompat flag to be more sure about it.

@Shubhranshu153

I just sent #3281 which suggests to change the VolumeStore to no longer use names as path components - which would remove the need for any limitations on volume names.
This is somewhat tricky to get right (specifically backward compatibility), but IMHO it is the only way to be able to really relax restrictions.
LMK what you think over there.

@Shubhranshu153
Copy link
Contributor Author

@apostasie replied on the thread. Additionally:
Also feel if the system is going to fail it, then let the system handle it. Earlier i think i had proposed the sha in the issue, but that complicates the way we want to recover the original string from it.

@apostasie
Copy link
Contributor

apostasie commented Aug 8, 2024

@apostasie replied on the thread.

Thanks.

Additionally: Also feel if the system is going to fail it, then let the system handle it.

Well, we made the deliberate choice to store user-picked identifiers as path components - this is very much an internal implementation choice, not a feature - and not dealing with failures would just leak that choice into the UX... I don't think it is a good thing.

Earlier i think i had proposed the sha in the issue, but that complicates the way we want to recover the original string from it.

Not really for the VolumeStore.

I have not looked at what it would mean for Networks, or the NameStore (although I think the NameStore is problematic in its own right and we might need to reconsider it).

@Shubhranshu153
Copy link
Contributor Author

not really for volume store -> from user perspective if i would want to list out the volumes available will i be able to get it from the sha?

From the above converstations it feels like we do want to keep a limit so probably 254 is a safe upper limit considering the system limits. its not exactly docker compatible though

@apostasie
Copy link
Contributor

not really for volume store -> from user perspective if i would want to list out the volumes available will i be able to get it from the sha?

Nothing changes from a user perspective (with the VolStore PR).
When you list volumes you see their names. You can inspect a volume by name. And you can see the name and the path of the volume.

As I suggested above, how we construct the path on the filesystem is an implementation detail IMHO that should be kept entirely orthogonal to UX.
This is true as well for the API we expose - nothing changes for the consumer.

From the above converstations it feels like we do want to keep a limit so probably 254 is a safe upper limit considering the system limits. its not exactly docker compatible though

The containerd method Validate is specifically advertised as being safe to use for filesystem path components (although the length limitation is somewhat arbitrary and has more to do with editors max line length).

IMHO the only way to lift these restrictions is to change our implementation(s) where it matters so we no longer use identifiers as path components (like I did for the VolumeStore).

Short of doing that, and if we are to increase that length regardless, we need tests to match, and build confidence that it will work (clearly not only Windows has limitations on these).

@Shubhranshu153
Copy link
Contributor Author

yeah length will be variable which is not ideal but we can have longer names and keeping the same checks.
I feel any route we take which involves morphing identifiers would run into risks of legacy artifacts in general. Probably not worth the effort. Rather may be a good way will be to build standard around identifier length for each OS type and recommendations. If they are larger than 76 then we don't really need to think of backward compatibility.

@apostasie
Copy link
Contributor

yeah length will be variable which is not ideal but we can have longer names and keeping the same checks. I feel any route we take which involves morphing identifiers would run into risks of legacy artifacts in general. Probably not worth the effort. Rather may be a good way will be to build standard around identifier length for each OS type and recommendations. If they are larger than 76 then we don't really need to think of backward compatibility.

Agreed that handling legacy user data is generally a daunting proposition.

That being said, in the context of containers, I feel like things tend to be a lot more ephemeral and easy to just rebuild (especially named volumes) - furthermore, having to deal with legacy is a fact of life no matter what, and right now (with v2 around the corner) is probably the ideal time to do things like that.

On the general question of whether such a migration makes sense, I stand by my gun: using user-controlled-identifiers as-is as path components is a problematic design from the get go (regardless of the implementation imposing restrictions).

Anyhow, just shooting the shit :-).

Hope this is all helpful.

@Shubhranshu153
Copy link
Contributor Author

Shubhranshu153 commented Aug 8, 2024

Thats a fair point but handling the legacy would mean we need test for all the stuffs we want to support backward compatibility for. Or we can add an label to the identifier hash to suggest if it is something newly created, if that label doesnt exist we fallback to new workflow. Need to think for the different use cases what changes are needed

Having said this i still feel the more simpler workflow is via having configurable identifier lengths, i am not saying its a great way, may be this is something we can merge to the current stable version. The reason is this is a request from a few of current users of finch and also needed to support dev containers.

For the 2.0.0 we can discuss how we handle the backward compatibility.

@Shubhranshu153
Copy link
Contributor Author

Shubhranshu153 commented Aug 8, 2024

considering that we closed the other pr, i would make the suggested changes probably with a limit of 254 .

@Shubhranshu153
Copy link
Contributor Author

Shubhranshu153 commented Aug 13, 2024

Added docs and changed the function name.
Didn't add the length constraint though as it will make it not really compatible with docker.
An unintended change was min length, seems docker has a min length of 2 but for containerd it was 1, so that is now changed to 2. Added the description in the doc

docs/validate.md Outdated Show resolved Hide resolved
@Shubhranshu153
Copy link
Contributor Author

There was typo in the previous comment:

An unintended change was min length, seems docker has a min length of 2 but for containerd it was 1, so that is now changed to 2 to match docker. Added the description in the doc

@Shubhranshu153 Shubhranshu153 force-pushed the fix-identifier-length branch 2 times, most recently from 185de34 to 8e3c228 Compare August 13, 2024 12:28
@Shubhranshu153 Shubhranshu153 force-pushed the fix-identifier-length branch 2 times, most recently from de4c12d to f87519a Compare August 13, 2024 12:31
@Shubhranshu153 Shubhranshu153 force-pushed the fix-identifier-length branch 2 times, most recently from 9a5afdc to 933c887 Compare August 13, 2024 12:44
@Shubhranshu153
Copy link
Contributor Author

@AkihiroSuda
Updated the code. Thanks for the suggestions.

@AkihiroSuda AkihiroSuda added this to the v2.0.0 milestone Aug 14, 2024
@Shubhranshu153 Shubhranshu153 changed the title fix: remove identifier limits fix: relax identifier limits Aug 14, 2024
Signed-off-by: Shubharanshu Mahapatra <shubhum@amazon.com>
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda merged commit 4bac741 into containerd:main Aug 17, 2024
19 checks passed
@AkihiroSuda AkihiroSuda mentioned this pull request Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants