Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 1.0.5 is breaking (Chart release 4.0.8), major version should be bumped #7939

Closed
morremeyer opened this issue Nov 17, 2021 · 20 comments · Fixed by #7942
Closed

Release 1.0.5 is breaking (Chart release 4.0.8), major version should be bumped #7939

morremeyer opened this issue Nov 17, 2021 · 20 comments · Fixed by #7942
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@morremeyer
Copy link

morremeyer commented Nov 17, 2021

Notice

Everyone reading this: Please save your “this is unacceptable“ comments. This is an open source project, if you want to berate someone, ask their consent first and pay them for it.

If you want to show that you share my opinion that this did not go well, please +1 or like this post, but keep the comments spam free.

Issue

NGINX Ingress controller version 1.0.5
Kubernetes version (use kubectl version): v1.22.2+k3s1, but this is irrelevant

How was the ingress-nginx-controller installed: helm chart, version 4.0.8

What happened: ingress-nginx Release 1.0.5 contains breaking changes, but no major version bump occurred. Helm chart version 4.0.8 contains breaking changes, but no major version bump occured.

1.0.5 introduces the annotation-value-word-blocklist option to the configmap that will block ingresses that have specific words or characters in them. The default for this is load_module,lua_package,_by_lua,location,root,proxy_pass,serviceaccount,{,},',\, making this i breaking change.

What you expected to happen: Major version bump for breaking changes.

The changelog contains the hint Possible Breaking Change. If a change is breaking to some configurations, it is a breaking change and therefore requires a major version bump.

Please release a 1.0.6 that uses an empty default value for this setting and include the current default of load_module,lua_package,_by_lua,location,root,proxy_pass,serviceaccount,{,},',\ in a 2.0.0 ingress-nginx and 5.0.0 helm chart release.

Please also consider retracting the 1.0.5 ingress-nginx and 4.0.8 chart release as this might break many installations.

How to reproduce it:

Use any install method for ingress-nginx in version 1.0.5 with the Helm chart version 4.0.8, using the default configuration.

Create an ingress (please add any additional annotation required)

echo "
  apiVersion: networking.k8s.io/v1
  kind: Ingress
  metadata:
    name: foo-bar
    annotations:
      kubernetes.io/ingress.class: nginx
      nginx.ingress.kubernetes.io/server-snippet: |-
        location = /.well-known/carddav {
          return 301 $scheme://$host/remote.php/dav;
        }
  spec:
    ingressClassName: nginx # omit this if you're on controller version below 1.0.0
    rules:
    - host: foo.bar
      http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: http-svc
              port: 
                number: 80
" | kubectl apply -f -

make a request

POD_NAME=$(k get pods -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx -o NAME)
kubectl exec -it -n ingress-nginx $POD_NAME -- curl -H 'Host: foo.bar' localhost

Anything else we need to know:

More and more systems are automatically deploying patch version upgrades automatically, expecting that a patch version only contains patches. This bug really caught me unaware (and definitely showed a lack of automated tests on my side, even before merging patch versions, this should be standard.

I’m bringing this to your attention as I am of the opinion that if we should not fear bumping major versions. This incident looks like a manifestation of that fear to me as I’ve seen it in the past.

/kind bug

@morremeyer morremeyer added the kind/bug Categorizes issue or PR as related to a bug. label Nov 17, 2021
@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Nov 17, 2021
@strongjz
Copy link
Member

strongjz commented Nov 17, 2021

/triage accepted
/priority critical-urgent

That is a valid point, I will discuss with it @rikatz . We were prioritizing the CVE fix and should be more thoughtful when we release new features like this next.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Nov 17, 2021
@strongjz strongjz self-assigned this Nov 17, 2021
@rikatz
Copy link
Contributor

rikatz commented Nov 17, 2021

Agreed! Sorry for the mess, we were so in a rush to fix this that the "we are breaking users" was not accounted :/

@strongjz you wanna tackle defaulting this to empty? Otherwise I can take care of this tomorrow :)

@rikatz
Copy link
Contributor

rikatz commented Nov 17, 2021

@morremeyer thanks for the nice reporting on this and the consideration :) really appreciate it!

@mzjp2
Copy link

mzjp2 commented Nov 17, 2021

Thanks for reporting this -- I was just about to come and report the same thing. I was using { and } in nginx.ingress.kubernetes.io/auth-snippet which seems like a valid use, but my installation has broken today on the latest ingress-nginx.

I agree with the above discussion (and thank you for the quick replies!) but as a side note, is it worth having whitelisted fields (like auth-snippet) which aren't verified by the blocklist or is this a no-go given the security flaws? (I'm unfamiliar with the relevant CVE)

@morremeyer
Copy link
Author

@strongjz @rikatz Thanks for taking care of it, I really appreciate it.

I tried setting it to empty on my setup, which weirdly failed and was still rejecting ingresses for blocked characters. I didn't debug that much further though as I needed to get things running again.

I therefore set it to thisvaluewillneverappear, which worked.

Just want to mention this in case there is a possibility to add a test for that.

@morremeyer
Copy link
Author

Disregard my previous comment please, I seem to have missed the added test when reading through the PR.

@janosmiko
Copy link

Is it possible somehow to add custom locations blocks to the ingress currently?
Eg: I have a use case where I'd like to serve a static file:

location /robots.txt {return 200 "User-agent: *\nDisallow: /\n";}

Is there a workaround to do this?

@morremeyer
Copy link
Author

@janosmiko Yes, you can set the value of annotation-value-word-blocklist to an empty string "".

@janosmiko
Copy link

Thanks for pointing it out!

@jkabat
Copy link

jkabat commented Nov 22, 2021

@janosmiko Yes, you can set the value of annotation-value-word-blocklist to an empty string "".

I have tried that but unfortunately some FCGI annotations with empty values were blacklisted anyway... so I had to use @morremeyer solution...

@calvinbui
Copy link
Contributor

@janosmiko Yes, you can set the value of annotation-value-word-blocklist to an empty string "".

this didn't work for me

@janosmiko
Copy link

@calvinbui how did you configure it?

@rikatz
Copy link
Contributor

rikatz commented Nov 23, 2021

Hey folks.

We released v1.1.0 where the blocklist is empty by default

Please help us: test, check if the breaking change still persists, and also please please please configure at least a small sane default for your environment.

Thanks once more for the patience and help on this!

@BloodyIron
Copy link

I for one certainly do see how this kind of a default could be a breaking change, and I do agree with the premise of such things being tied to a version bump (1.x?). However, I believe that at some point this should go back to being "blocked by default". People implementing ingress-nginx out of the box can't reasonably be expected to know all the "use this by default to stay secure, but it's not set by default", and having it not block by default generally results in the security issue being generally present. IMO 1.0.6 should have had the default-block removed, and 1.1.0 should have had default-block on, but here we are. It seems that the "reasonable" (possibly most agreeable) thing to do is milestone the "block-by-default" stuff into 1.2.0.

@morremeyer
Copy link
Author

I agree with you on having it back as the default.

However, as that is a breaking change, the release number for that will be 2.0.0.

@strongjz what do you think, should I open a PR to reintroduce it?

@BloodyIron
Copy link

BloodyIron commented Dec 14, 2021

I agree with you on having it back as the default.

However, as that is a breaking change, the release number for that will be 2.0.0.

@strongjz what do you think, should I open a PR to reintroduce it?

And considering how long it took for 1.0 to "come to market" (so to say) how many years will it be before 2.0 will come to market and this "sane by default" security aspect be introduced? It was "deemed acceptable" (so to say) to revert (disabled by default) it in 1.1, why is it "deemed unacceptable" to have it "on by default" in 1.2? This still represents a very real security threat.

@strongjz
Copy link
Member

We are working on a different way to secure the data and control plane rather than blocking known "bad words" Folks were very vocal when we broke this feature. I don't think they would migrate to v2.0.0 if we made that breaking change.

@BloodyIron thank you for your opinion and we welcome them in our community meetings and slack to help further the discussion.

@BloodyIron
Copy link

We are working on a different way to secure the data and control plane rather than blocking known "bad words" Folks were very vocal when we broke this feature. I don't think they would migrate to v2.0.0 if we made that breaking change.

@BloodyIron thank you for your opinion and we welcome them in our community meetings and slack to help further the discussion.

Ahh well if a different way to come at the problem is in-development, then that's cool too. I'm not exactly married to the specific method, but I am responsible for mitigating the CVE :P

I appreciate the invitation, and I may take you up on that at some point, however for now I would prefer discussion to take place in the open and in issue threads like this. I hope that's not going to be problematic for you and your team. It also helps me stay organised and remember what I said, what others said, stuff like that. I have so many details to keep track of @_@ .

So... should I instead then just watch the "data and control plane" #8034 that you linked to and use that as the "pulse" on addressing this issue? (Sounds like yes?)

@morremeyer
Copy link
Author

morremeyer commented Dec 14, 2021

And considering how long it took for 1.0 to "come to market" (so to say) how many years will it be before 2.0 will come to market and this "sane by default" security aspect be introduced? It was "deemed acceptable" (so to say) to revert (disabled by default) it in 1.1, why is it "deemed unacceptable" to have it "on by default" in 1.2? This still represents a very real security threat.

I think this is a misunderstanding about how version bumping works with Semantic Versioning.

You don't plan to release version X at date Y, you plan features and bump the versions accordingly.

If you need to introduce breaking changes, you bump the major version, no matter the plans.

So if there was no other plans to secure ingress-nginx (thanks for updating us on this, @strongjz), version 2.0.0 would be happening as soon as the blocklist would get a non-empty default.

That would happen as soon as the change is accepted and not be artificially delayed until some milestone is reached.

I know that this is very different from "traditional" methods where you bump to a specific version at a specific milestone, so this might be the source of the misunderstanding here.

@BloodyIron
Copy link

BloodyIron commented Dec 14, 2021

And considering how long it took for 1.0 to "come to market" (so to say) how many years will it be before 2.0 will come to market and this "sane by default" security aspect be introduced? It was "deemed acceptable" (so to say) to revert (disabled by default) it in 1.1, why is it "deemed unacceptable" to have it "on by default" in 1.2? This still represents a very real security threat.

I think this is a misunderstanding about how version bumping works with Semantic Versioning.

You don't plan to release version X at date Y, you plan features and bump the versions accordingly.

If you need to introduce breaking changes, you bump the major version, no matter the plans.

So if there was no other plans to secure ingress-nginx (thanks for updating us on this, @strongjz), version 2.0.0 would be happening as soon as the blocklist would get a non-empty default.

That would happen as soon as the change is accepted and not be artificially delayed until some milestone is reached.

I know that this is very different from "traditional" methods where you bump to a specific version at a specific milestone, so this might be the source of the misunderstanding here.

Nah, I understand that the workflow for releases here is tied to actual changes, not time-based release cycles. I was simply saying "years" as "napkin math". Whereby a release with the moniker 2.0.0 would often "require"/warrant a lot more to justify it than just a single CVE compliance aspect. As such, leading to delay waiting for more "justification" (release content) before a 2.0.0 would be released. I took into further consideration that 1.0.0 released in August of this year (2021) and the first ever release on this github repo was in 2016. It stood to reason, logically, that a 2.0.0 release within 12 months from right now would be improbable.

But hey, I'd love to be proven wrong :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants