Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPSEC backend, continued #637

Closed
wants to merge 18 commits into from
Closed

IPSEC backend, continued #637

wants to merge 18 commits into from

Conversation

mkutsevol
Copy link
Contributor

@mkutsevol mkutsevol commented Mar 13, 2017

!THIS IS STILL A WORK IN PROGRESS!

This is based on @eyakubovich @MohdAhmad PR #516

I've packaged it correctly, so you can get a working docker image from this commit.
I've fixed dockerfile only for amd64 for now. done
I've done ikev2 support. I commented #516 (comment) and that is true. But then dead peer detection in v2 won.

Now you can supply config to charon, eg enable debug logging

Just drop logging config to via a volume to /usr/local/strongswan-amd64/etc/strongswan.d/charon-logging.conf
Example available here https://wiki.strongswan.org/projects/strongswan/wiki/LoggerConfiguration

Didn't rebase. Will do when needed. done

I've pushed this build to mkutsevol/flannel:amd64 please test (beware, it is temporary. I'll delete it later)

Please review/comment.

@mkutsevol mkutsevol mentioned this pull request Mar 13, 2017
@mkutsevol
Copy link
Contributor Author

I ran the previous (non-rebased) version for a couple of days with debug logging, pinging among participating nodes and a tcpdump. Everything is fine & stable for me.

I've rebased it. Please review/comment.

@mkutsevol
Copy link
Contributor Author

I've updated mkutsevol/flannel:amd64 with the current build.

@mkutsevol
Copy link
Contributor Author

Switched default esp proposal to Suite-B-GCM-128 and made it configurable instead of hard coded.

@mkutsevol mkutsevol mentioned this pull request Mar 15, 2017
@mkutsevol
Copy link
Contributor Author

@tomdee hi!
What has to be done to merge it?

Thanks! :D

@philips
Copy link
Contributor

philips commented Mar 16, 2017

Do you plan on using Kubernetes with this?

The last time I looked at this the flannel code was fork/exec'ing the swan code. One thing I would love to see is if swan could be put in a pod alongside flanneld for Kubernetes.

@mkutsevol
Copy link
Contributor Author

@philips, no, I need it for a standalone thing.
For sure, vici can be used just through a socket from another pod, but I won't do that for this PR.

@mkutsevol
Copy link
Contributor Author

@tomdee, @philips hi!
So what will be the destiny of this PR?
I've seen discussions and many people will benefit from this code.

@nesc58
Copy link

nesc58 commented Jul 5, 2017

Is there a plan to merge this?

@mkutsevol are you using your implementation in a productive environment or still testing? I am thinking about to fork the current release and merge your commits and try to test it in our testing environment. IPSec or encryption in general is a so necessary point in an infrastructure.

@mkutsevol
Copy link
Contributor Author

@nesc58 I'm using this in production for several months now. Stable.

@nesc58
Copy link

nesc58 commented Jul 6, 2017

@mkutsevol thank you. I changed the flannel image and changed the backend type to ipsec. It works. But in combination with kubernetes it doesn't work. The configurations are the same. Nothing changed.
E.g. I cannot access the kubernetes dashboard using the URL <apiserver>/ui. Is there something to change? Kube-proxy configuration or something else?
It is configured with proxy mode iptables. I tried to switch between activating and deactivating the masquerading all. Nothing changed.
Pods with a NodePort service can accessed from outside the cluster by using the host ip of the server where the pod is running. This is fine.

The internal routing between an ingress haproxy with ssl termination and an application behind also works.

Internal, e.g. The dashboard is accessible e.g. From a busybox pod in another namespace. The routing seems to be okay. I can access it with the URL kubernetes-dashboard.kube-system. So, the dns addon (kube dns) also works.

Can you help me?

Tomorrow I will test the cluster internal communication.

@mkutsevol
Copy link
Contributor Author

@nesc58 I can definitely say that the absence of connectivity is not due to flannel/ipsec code, you should better try to get help on k8s forums.

@nesc58
Copy link

nesc58 commented Jul 11, 2017

@mkutsevol thanks. I will test it soon. And again, thanks for this implementation.

@tomdee
Copy link
Contributor

tomdee commented Jul 12, 2017

@mkutsevol Thanks for all your work on this PR, it's a really important feature that lots of people want.

I've taken a really quick look at it and have a few comments

  • It would be great to get it rebased onto v0.8.0 so that it could potentially be included in the v0.9.0 release
  • The increase to the size of the binary isn't great. On k8s, it would make sense to package and run it as a different container (but in the same pod).
  • And speaking of k8s, I think it would need to work on k8s too (but maybe I could help implement that if you're not a k8s user)
  • Could you write a little about why you needed to create createBackendData and getBackendData?

@mkutsevol
Copy link
Contributor Author

mkutsevol commented Jul 12, 2017

Hi @tomdee,
thanks for the review.

  1. Sure, it needs rebasing.
  2. That can be done, really it just uses a socket to control charon daemon. But it needs some discussion. How to organize it when it is not run in k8s pod.
  3. I haven't seen k8s netmanager at all (or whatever it is called, that thing that is used to store config data instead of etcd).
  4. createBackendData/getBackendData are used to store/retrieve additional net configuration, specifically the network password.

Lets discuss some of the available approaches to separate things here.
Flannel does two things for ipsec to work. It configures the charon daemon via vici over a socket and it sets up policies. So those two processes should be in one network namespace.
k8s pods are just the thing. Different images, different disks, one net ns.
But how we will run in standalone mode?

  1. We can have multiple builds (image size concerns), w/ and w/o strongswan.
  2. We can put the burden of getting a working strongswan daemon on the user. Just add a param '--vici-socket' and assume that a correctly configured strongswan is there ;)

Using the second approach users can use OS package manager to install strongswan, for coreos and friends the user can make a systemd service that starts a container with strongswan (didn't find an official one though) prior to starting flannel.
Also, it will remove the need to build strongswan.

And that all being said, it turns this work into the fourth? reincarnation of it? :D

@tomdee
Copy link
Contributor

tomdee commented Jul 13, 2017

I'll keep thinking this over but my initial though on multiple images vs. putting extra burden on the user to configure it is that we just go for multiple images. Though, I guess it would be nice to have the option of using an OS strongswan package if users really want to do that.

@tomdee
Copy link
Contributor

tomdee commented Aug 14, 2017

I've rebased this onto master and pushed to https://github.com/tomdee/flannel/tree/ipsec

@roffe
Copy link

roffe commented Aug 15, 2017

Looking forward to this hitting upstream and have started testing the custom images discussed here in my dev env. will report back if i run into any issues.

This is awesome btw!

@klausenbusk
Copy link

What exactly is the status of this? Are we just waiting on ?

@dcowden
Copy link

dcowden commented Oct 11, 2017

FWIW, we need this functionality. We use weave because it supports encryption ( a requirement for us). We think flannel would be much more stable, and we'd love to switch.

@mkutsevol
Copy link
Contributor Author

I'm doing my best to allocate time to this in the nearest future.

@dcowden
Copy link

dcowden commented Oct 12, 2017

@mkutsevol that's awesome! we'll be happy to be testers when you need that!

@RRAlex
Copy link

RRAlex commented Oct 12, 2017

Is there any chance Wireguard could be used / integrated as a swappable lower layer of encryption for the overlay?
Seems to be moving forward as a simple, fast and audited (secure) VPN solutions.
Some people seems to be starting to think about it:

@mkutsevol
Copy link
Contributor Author

@RRAlex definitely not in this PR. openswan which is used to control kernel ipsec is used here and the kernel does the magic and 'protocols', 'fast', etc.

@mkutsevol
Copy link
Contributor Author

Hi @tomdee,
So, the packaging.
For the tarballs, it will include the charon daemon, the full dist/ folder.
I'll produce 3 docker images for every arch: flannel only, flannel+charon, charon only.

  • flannel only + charon only = pods in k8s.
  • flannel+charon = standalone.

Basically, charon only image can be replaced by any charon/strongswan with enabled vici image.

I need your opinion on tagging:
$(TAG)-$(ARCH) -> $(TAG)-$(VARIANT)-$(ARCH)

where VARIANT is one of: light, full, strongswan | standalone, ipsec, strongswan

We will have more dockerfiles, which is unfortunate, as they will require synchronous changes. But templating dockerfiles seems like an overkill. What's you opinion?


Status: I rebased your rebase onto the current master and it's broken and needs rework to adapt not only to the packaging changes, but to the changes in flannel itself.

@mkutsevol
Copy link
Contributor Author

mkutsevol commented Oct 14, 2017

Please see create-dockerfiles-$(ARCH) target in the https://github.com/mkutsevol/flannel/commit/3603410a3a94d322d738f9ced1485759bae6cd82 commit, as an example.
Docker uses the last entrypoint directive.

mkutsevol and others added 12 commits October 29, 2017 20:12
 * As flannel doesn't support multiple networks, this is not needed any
more
 * Transitional, (WIP)
 * Support for bundled/remote charon.
 * Cleanup after removal of CreateBackendData/GetBackendData
 * It builds amd64.
 * Still much to do.
 * Added sync.WaitGroup, so spawned processes can correctly shutdown.
 * Bundled charon daemon correctly shuts down.
 * Removed the build of strongswan compeletey. Even alpine linux has it
build already.
 * We package docker with strongswan as a separate image.
 * DRYer
 * Correct stop if hadn't finished init sequence.
@tomdee
Copy link
Contributor

tomdee commented Nov 10, 2017

@mkutsevol I merged the PR to switch over to Alpine so that should greatly simplify the build and packaging part of this PR.

@klausenbusk
Copy link

I'm wondering if this will be superseded by a Wireguard backend at some point. Wireguard isn't upstreamed yet, but at lot can happen in 6-18 months.

Wireguard seems way easier to work with than IPSEC (and all the packing logic) , and someone could probably tweak the ipip backend to use Wireguard.

Just my two cents..

@klausenbusk
Copy link

klausenbusk commented Nov 14, 2017

Wireguard seems way easier to work with than IPSEC (and all the packing logic)

I think implementing could be as simple as (pseudo code):

WG_PRIVATE_KEY_PATH="/foo/wg.key"
if not exist $WG_PRIVATE_KEY_PATH; then
    wg genkey > $WG_PRIVATE_KEY_PATH
    set flannel.alpha.coreos.com/wg-pubkey $(wg pubkey < $WG_PRIVATE_KEY_PATH)
fi

ip link add dev wg0 type wireguard
ip address add dev wg0 <node.PodCIDR (10.2.0.0 -> 10.2.0.1)>

for externalIP, node.podCIDR, flannel.alpha.coreos.com/wg-pubkey in <nodes>; do
    wg set wg0 peer <flannel.alpha.coreos.com/wg-pubkey> allowed-ips <node.podCIDR> endpoint <externalIP>
    ip route add <node.podCIDR> dev wg0
done

@tomdee
Copy link
Contributor

tomdee commented Dec 7, 2017

@klausenbusk See #898

@JasonGiedymin
Copy link

Keep up the good work. Either impl is desirable over nothing.

@zx2c4
Copy link

zx2c4 commented Dec 13, 2017

Indeed doing this with WireGuard means it's a tiny shell script, as opposed to a massive cludge. Let me know if you guys need help doing this.

@klausenbusk
Copy link

Indeed doing this with WireGuard means it's a tiny shell script, as opposed to a massive cludge.

But at least all the kernel modules is already in place. Compiling and loading the WG kernel module on CoreOS isn't complicated [1] [2], but how do you automate the compiling? A shell script which run on every boot seems to be the only solution, but that feels a bit hackish, and it would need to block kubelet/flannel from starting until the kernel module is ready. Another solution is torcx, but that isn't ready yet

@zx2c4
Copy link

zx2c4 commented Dec 13, 2017

but how do you automate the compiling?

Ensure it's shipped for the kernel with the package manager, ensure it's built into the kernel, or just use dkms, which is what's done on ubuntu/debian/arch/fedora/etc.

@klausenbusk
Copy link

Ensure it's shipped for the kernel with the package manager

CoreOS contains no package manager.

ensure it's built into the kernel,

CoreOS does not ship out-of-tree modules. See coreos/bugs#2225

or just use dkms

dkms isn't available in CoreOS..

So.. Limited by the OS.

@zx2c4
Copy link

zx2c4 commented Dec 13, 2017

ensure it's built into the kernel,
CoreOS does not ship out-of-tree modules. See coreos/bugs#2225

Shucks. I guess you could make a "policy exception" if you wanted. I'll reiterate that over in the bug you linked, though, so as not to clutter this one.

So, I guess you can rig up a wireguard-autobuilder shell script then, like you originally suggested? Hate to come to that, but if that's our only solution...

@eyakubovich
Copy link
Contributor

IPSec is definitely not the easiest to work with but it's really just a building block for a system that exposes more user friendly concepts. That's why we have the swans and the like.

If flannel can use IPSec to implement the concepts it provides then it just ends up being an implementation detail. The implementation can be based on WG but if flannel wraps it anyway, WG's simplicity is not going to matter to the end user.

@tomdee
Copy link
Contributor

tomdee commented Jan 1, 2018

@mkutsevol I've rebased this again onto master and pushed it to https://github.com/tomdee/flannel/tree/feature/ipsec

Unfortunately, I can't get it to work. The logs look fine to me, but I can't ping when running the e2e test - would you be able to take a look at see what's going wrong? Ping me on slack if you need help getting it running.

@mkutsevol
Copy link
Contributor Author

mkutsevol commented Jan 2, 2018

@tomdee, I know, that's 'cos of the routing setup the tests do.
I've moved to another city once again and struggling to keep up with my work, I'll try to pay some time to it this weekend.

@tomdee
Copy link
Contributor

tomdee commented Jan 25, 2018

IPSec support has just been merged to master in #929 🎆

It's still classed as "experimental" but it would be great if everyone could try it out, and provide feedback (and ideally PRs!) on the user experience, the code, the documentation, the tests etc...

Docs: https://github.com/coreos/flannel/blob/master/Documentation/backends.md#ipsec
Image: quay.io/coreos/flannel-git:v0.10.0-8-g6b98346d

@tomdee tomdee closed this Jan 25, 2018
@rektide rektide mentioned this pull request Dec 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.