proxy: Fix Inotify falling back to polling when files don't exist yet #1119

hawkw · 2018-06-13T23:32:12Z

This PR changes the proxy's Inotify watch code to avoid always falling back to
polling the filesystem when the watched files don't exist yet. It also contains
some additional cleanup and refactoring of the inotify code, including moving
the non-TLS-specific filesystem watching code out of the tls::config module
and into a new fs_watch module.

In addition, it adds tests for both the polling-based and inotify-based watch
implementations, and changes the polling-based watches to hash the files rather
than using timestamps from the file's metadata to detect changes. These changes
are originally from #1094 and #1091, respectively, but they're included here
because @briansmith asked that all the changes be made in one PR.

Closes #1094. Closes #1091. Fixes #1090. Fixes #1097. Fixes #1061.

Signed-off-by: Eliza Weisman eliza@buoyant.io

briansmith

In general, I think this looks pretty good. The main concern I have is that this is basically +800 new lines of code in tls/config.rs that's mostly not TLS-specific. Even if it's not perfect, can we move as much of the generic file watching logic into some top-level module?

briansmith · 2018-06-14T18:45:29Z

proxy/src/transport/tls/config.rs

+            Ok(true)
+        } else {
+            Ok(false)
+        }


We chan just do Ok(changed) here; i.e. take the return value out of the if/else.

briansmith · 2018-06-14T18:56:39Z

proxy/src/transport/tls/config.rs

+    use futures_watch::Watch;
+
+    // These functions are a workaround for Windows having separate API calls
+    // for symlinking files and directories. You're welcome, Brian ;)


Thanks! But, please remove the "You're welcome, Brian" comment.

Does this work on Windows? Usually Windows won't let a non-privileged process create symlinks. I wouldn't mind if we just didn't run these on Windows.

Usually Windows won't let a non-privileged process create symlinks.

Ah, I wasn't aware of that. In that case, I'll just disable these tests on Windows.

briansmith · 2018-06-14T19:02:17Z

proxy/src/transport/tls/config.rs

+
+        let f = File::create(cfg.trust_anchors)
+            .expect("create trust anchors");
+        f.sync_all().expect("create trust anchors");


Why do we need to call sync_all(), especially so many times?

briansmith · 2018-06-14T19:04:19Z

proxy/src/transport/tls/config.rs

+    }
+
+    impl BlockOnFor for Runtime {
+        fn block_on_for<F>(&mut self, duration: Duration, f: F) -> Result<F::Item, F::Error>


I think duration is maybe better named timeout?

Why do we need a timeout at all? If we removed the timeout, would any/all of the tests wait forever?

Why do we need a timeout at all? If we removed the timeout, would any/all of the tests wait forever?

Not when the functionality we're testing for works correctly; however, before everything was working correctly, the failure mode was waiting forever. I found that this made debugging failing tests more difficult, as killing the tests when one has been waiting for over a minute prevents any remaining tests from running, and doesn't print any output from the killed test.

OK, I think it's fine to leave the timeouts in. We can remove them later if we find they get in the way. It would be good to add a comment to the constructor function that explains basically what you just said here.

Okay, I'll add a comment here.

I was considering moving this to another module, as I think we currently have a number of tests whose failure modes are waiting forever rather than panicking, and changing them to use this might make debugging any regressions that break those tests a bit easier.

briansmith · 2018-06-14T19:05:41Z

proxy/src/transport/tls/config.rs

+        println!("created {:?}", f);
+
+        let next = watch.into_future().map_err(|(e, _)| e);
+        let (item, _) = rt.block_on_for(Duration::from_secs(2), next)


Why are some of the timeouts 2 seconds and others 5 seconds?

briansmith · 2018-06-14T19:07:09Z

proxy/src/transport/tls/config.rs

+
+        // Sleep briefly between changing the fs, as a workaround for
+        // macOS reporting timestamps with second resolution.
+        // https://github.com/runconduit/conduit/issues/1090


Since we're not using timestamps any more, can we clean this up?

briansmith · 2018-06-14T19:11:02Z

proxy/src/transport/tls/config.rs

+        println!("created {:?}", trust_anchors);
+        trust_anchors.write_all(b"I am not real trust anchors")
+            .expect("write to trust anchors");
+        trust_anchors.sync_all().expect("sync trust anchors");


Is it really valuable to have the expect() messages everywhere? Why not just unwrap() and make the code easier to read? If we need to know what failed we can use the backtrace.

It seems like this File::create...write_all...sync_all() pattern should be encapsulated into a function that's reused.

Similarly, we should encapsulate the larger logic of "create a file in one directory and then create a symlink to it in another directory" and the larger logic "create the file and the first symlink and then create the second symlink."

More generally, I can't really carefully review this much repetitive testing code. I tried last night and again today and it's just overwhelming. So, the less code there actually is in the test, and the more reusable functions are reused, the better.

It seems like this File::create...write_all...sync_all() pattern should be encapsulated into a function that's reused.

I agree. I intended to do this in a separate follow-up PR, as I wanted to get this up for review sooner, but I'd be happy to make that change now if you prefer.

I agree. I intended to do this in a separate follow-up PR, as I wanted to get this up for review sooner, but I'd be happy to make that change now if you prefer.

Yeah, let's just do it now. It's basically the difference between me actually reviewing the tests and pretending that I reviewed the tests. :) I have already done my rebasing so it will be easy to do the next round of rebasing since I don't expect to conflict with changes to the tests.

briansmith · 2018-06-14T19:25:11Z

+800 new lines

I realize most of this is testing, which definitely helps. I'm hoping if we move the file watching code to its own module, and create some utilities functions that serve as building-blocks for the tests and put them in that module, then the tests themselves will get smaller.

hawkw · 2018-06-14T19:31:01Z

I'm hoping if we move the file watching code to its own module, and create some utilities functions that serve as building-blocks for the tests and put them in that module, then the tests themselves will get smaller.

I completely agree that a lot of the repeated testing code needs to be factored out.

Initially, I wasn't sure if the watching code should go in its own module, as it seemed like making this code less TLS-config-specific might introduce some additional complexity, but I can do that if you'd prefer.

briansmith · 2018-06-14T19:35:24Z

Initially, I wasn't sure if the watching code should go in its own module, as it seemed like making this code less TLS-config-specific might introduce some additional complexity, but I can do that if you'd prefer.

I think if it's a significant amount of effort to move the generic logic to its own module, then we shouldn't do it. However, during the review I noticed that once we map things to a list of PathBufs then it seems like it's all totally generic anyway.

The test fixture is definitely specific to our TLS stuff. In an ideal world we'd refactor the tests so that we have generic file watching tests in the new module, and then TLS-specific tests in the tls/config.rs module. However, that's not reasonable to do now, so we can just leave the tests in the tls/config.rs module for now (probably forever) without trying to make them generic. The utilities like "create a file with two symlinks" can/should go in the generic module, though, I think.

briansmith · 2018-06-14T19:36:24Z

(Moving it to a module outside of transport/tls does mean that the ring and untrusted extern crate declarations need to be moved to lib.rs. That's OK with me.)

hawkw · 2018-06-14T21:38:05Z

@briansmith, I think as of ebe7b3d, I've now made all the changes you've requested?

briansmith · 2018-06-14T21:44:19Z

proxy/src/transport/tls/config.rs

+
+    fn create_file<P: AsRef<Path>>(path: P) -> io::Result<File> {
+        let f = File::create(path)?;
+        f.sync_all()?;


Why sync_all()? I don't think it should be necessary and if it is there should be a comment.

I don't think it's strictly necessary here, but I thought flushing every filesystem operation immediately might help prevent the tests from being flaky.

This flushing, IIUC, is telling the kernel to flush its internal state to disk, not flushing the process state to the kernel. So, I expect that these sync_alls() will not have any effect, since it won't change anything at the kernel-userspace boundary, execpt slow the tests (actually your whole system) down due to fsyncs. So AFAICT they are actually harmful and should be removed.

Okay, will remove them in that case.

briansmith · 2018-06-14T21:46:40Z

proxy/src/transport/tls/config.rs

+        assert!(item.is_some());
+    }
+
+    // Test for when the watched files are symlinks to a file insdie of a


s/insdie/inside/

briansmith · 2018-06-14T21:47:41Z

proxy/src/transport/tls/config.rs

+
+        let mut private_key = create_and_write(
+            &private_key_path,
+            b"I am not a realprivate key",


s/realprivate/real private/

briansmith · 2018-06-14T21:49:57Z

proxy/src/transport/tls/config.rs

+        rt.spawn(bg);
+
+        trust_anchors.write_all(b"Trust me on this :)").unwrap();
+        trust_anchors.sync_all().unwrap();


Why do we need sync_all()?

briansmith · 2018-06-14T21:52:29Z

proxy/src/transport/tls/config.rs

+
+        rt.spawn(bg);
+
+        trust_anchors.write_all(b"Trust me on this :)").unwrap();


Instead of having all these cute(!) strings, it would be clearer if we just had a few constants that are used throughout the tests, something like const A: &str = "A"; const B: &str = "B";. That would be much less less distracting than reading all these strings and then trying to determine whether the string is the same or different than the previous value.

briansmith · 2018-06-14T22:10:47Z

proxy/src/transport/tls/config.rs

+                trust_anchors: dir.path().join(TRUST_ANCHORS),
+                end_entity_cert: dir.path().join(END_ENTITY_CERT),
+                private_key: dir.path().join(PRIVATE_KEY),
+            };


OK, now I can read the tests much better. I see that there's actually nothing TLS-specific about these tests except that they use CommonSettings to determine which files to look at, right?

Imagine that you replaced cfg with paths: Vec<PathBuf> or similar. Then all these tests would be completely independent of the tls submodule and they could all easily be moved to the fs_watch submodule. Not only that, but even more repetitive code in the tests could be replaced because you could just loop over the contents of paths (even now, they could be replaced with for x in &[&fixture.cfg.trust_anchors, ...] or similar).

Besides these tests that verify that the FS watching code works, in tls/config.rs (or, hopefully, somewhere under tests/) we're going to need to add tests that verify (1) When the input are initially not valid certificates/ca-bundles/private-keys, we ignore them, (2) When the inputs become valid, we construct a valid configuration from them, and (3) When we have a valid configuration and the files change to an invalid configuration, we ignore the change.

Besides these tests that verify that the FS watching code works, in tls/config.rs (or, hopefully, somewhere under tests/) we're going to need to add tests that verify (1) When the input are initially not valid certificates/ca-bundles/private-keys, we ignore them, (2) When the inputs become valid, we construct a valid configuration from them, and (3) When we have a valid configuration and the files change to an invalid configuration, we ignore the change.

^ Not in this PR, though.

This is a fair point. I'll make the tests more generic and move them to the fs_watch module.

hawkw · 2018-06-14T23:05:11Z

@briansmith as of acdc91e, the tests are now no longer TLS-specific, and have been moved to the fs_watch module.

I'll start on adding integration tests for the TLS specific functionality in an additional branch, as well.

hawkw · 2018-06-14T23:27:35Z

I think the Travis build of acdc91e errored for unrelated reasons:

I'll go ahead and restart it.

briansmith

LGTM. The only thing I'm nervous about is that, in the inotify case, it's not clear which filesystem action is actually triggering the notification. I called out one case where we expect that the symlink call will trigger a notification. This assumes that inotify will not give us any spurious notifications at the beginning of a watch just due to the file existing. I read the inotify man page and there was no indication that that would happen. So, if there is an easy way to drain the stream before the action that we expect to trigger a notification, it would be great to add that, but OK if not.

briansmith · 2018-06-14T23:52:23Z

proxy/src/fs_watch.rs

+/// This will calculate the SHA-384 hash of each of files at the paths
+/// described by this `CommonSettings` every `interval`, and attempt to
+/// load a new `CommonConfig` from the files again if any of the hashes
+/// has changed.


Something to think about as a future improvement: Should we do this contents checking even when using inotify, in case inotify gives us any kind of spurious change (e.g. the symlink was recreated for some reason but the file contents didn't actually change)?

Yeah, I thought about that as well. It seemed out of scope for this branch, though, especially given that I think false negatives are much more likely to be a problem than false positives.

briansmith · 2018-06-15T00:03:21Z

proxy/src/fs_watch.rs

+        fs::remove_dir_all(&data_path).unwrap();
+        symlink(&real_data_path_2, &data_path).unwrap();
+
+        let (item, _) = next_change(&mut rt, watch).unwrap();


Did you verify that if you comment out the two lines above this, then next_change() will hang because there's no notifications? That is, did you verify that the symlink() call above is actually what causes the notification?

Ah, you're right, this test still passes with those lines removed. Thanks for catching that!

I've added a call to next_change(...) prior to creating the symlink, after which the test will fail if we don't create the symlink.

briansmith · 2018-06-15T00:09:30Z

proxy/src/fs_watch.rs

+                | WatchMask::DELETE
+                | WatchMask::DELETE_SELF
+                | WatchMask::MOVE
+                | WatchMask::MOVE_SELF


Shouldn't we include IN_ONLYDIR too? Otherwise, this looks right to me, though it's not clear to me whether or not we want to use IN_EXCL_UNLINK. Something to figure out later.

Shouldn't we include IN_ONLYDIR too?

When a file exists when starting the watch, we do watch the actual file, so I didn't add that mask flag

When a file exists when starting the watch, we do watch the actual file, so I didn't add that mask flag

Ah, yeah. Why do we do that? It seems like it would always be correct to watch the parent directory regardless of whether the file initially exists. Generally unnecessary special cases are bad and "file existed at the beginning" seems like a special case.

We always watch the files we care about directly, if they exist. We only watch their parentl directories if they don't currently exist. This is so that we see fewer events on unrelated files, if a file we care about is in a directory that also contains files we don't care about watching.

Sorry if my earlier comment wasn't clear; by "when starting a watch", I meant when starting a new inotify watch, not when the watch stream is created.

We always watch the files we care about directly, if they exist. We only watch their parentl directories if they don't currently exist. This is so that we see fewer events on unrelated files, if a file we care about is in a directory that also contains files we don't care about watching.

In our case, won't the directory we are watching only contain files that we care about? I feel like this is an optimization for a general case that isn't actually helpful for us. I would prefer to always do it one way. If we find we're getting too many notifications IRL then we can always add the optimization later.

I made an attempt to rewrite this to only watch directories, but I realised that if we're given a path to a file in the CWD (which seems like a fairly plausible case at least while testing locally), we will still sometimes need to watch file paths rather than dirs, so I think there will end up being some special-casing regardless.

The rewritten code looks like this:

fn add_paths(&mut self) -> Result<(), io::Error> { let dir_mask = WatchMask::CREATE | WatchMask::MODIFY | WatchMask::DELETE | WatchMask::DELETE_SELF | WatchMask::MOVE | WatchMask::MOVE_SELF | WatchMask::ONLYDIR ; let file_mask = WatchMask::CREATE | WatchMask::MODIFY | WatchMask::DELETE | WatchMask::MOVE ; for path in &self.paths { let canonical = path .canonicalize() .unwrap_or_else(|e| { trace!("canonicalize({:?}): {:?}", &path, e); path.to_path_buf() }); if let Some(ref parent) = canonical.parent() { // If the path has a parent directory, watch that. self.inotify.add_watch(&parent, dir_mask)?; trace!("watch {:?} (for {:?})", parent, path); } else { // Otherwise, we're watching the path directly. self.inotify.add_watch(&canonical, file_mask)?; trace!("watch {:?} (for {:?})", &canonical, path); }; } Ok(()) }

Let me know if you prefer that over the current implementation, or can think of any ways around having to special-case something.

I made an attempt to rewrite this to only watch directories, but I realised that if we're given a path to a file in the CWD (which seems like a fairly plausible case at least while testing locally), we will still sometimes need to watch file paths rather than dirs, so I think there will end up being some special-casing regardless.

I think it's fine to punt on this now. I will file an issue about simplifying it. Basically, I think that we should enforce that these env. vars. to be absolute paths so that the issue of having a directory component of "" goes away. Enforcing them to be absolute paths seems like a good idea anyway. But, I don't want to block landing this on that.

briansmith · 2018-06-15T00:37:08Z

proxy/src/fs_watch.rs

+
+        for path in &paths {
+            fs::remove_file(path).unwrap();
+            println!("deleted {:?}", path);


It seems like here we need assert!(item.is_some()); too. And maybe other places?

Sorry, what do you mean? I don't see an item in scope there?

I mean, shouldn't there be something like this here:

let (item, watch) = next_change(&mut rt, watch).unwrap(); assert!(item.is_some());

Otherwise, how do we know if the notifications received in the next_change() calls below are from the create_and_write and not from the remove_file?

Ah, yes, you're right; when I rewrote this test, I think I must've briefly forgot that the filtering out of deletions happens in the TLS-specific code and not here. Will fix.

briansmith · 2018-06-15T00:38:56Z

proxy/src/fs_watch.rs

+        -> Result<(Option<()>, Watch<()>), WatchError>
+    {
+        let next = watch.into_future().map_err(|(e, _)| e);
+        rt.block_on_for(Duration::from_secs(2), next)


Another thing I just noticed: Now that everything is working as intended, should we really have this timeout? It seems like it will result in a flaky test?

I think it's good to have the timeout, so that we can get better errors out of regressions. Since this touches the fs, I think it's reasonable to treat it as an integration test, and we have similar timeout behaviour in some of our integration tests.

I think a 2-second timeout is fairly reasonable here, but I could make it more generous if you think that's likely to reduce flakiness?

I think it's good to have the timeout, so that we can get better errors out of regressions.

I don't think that's worth risking flakiness. If the test is otherwise working but we fail to receive the notifications, that's a major bug either in our code or in Linux, and neither should happen with any likelihood. If they were to happen then we'd need to fix the code to stop the tests from ever stalling.

I think a 2-second timeout is fairly reasonable here, but I could make it more generous if you think that's likely to reduce flakiness?

IMO we should never have timeouts in tests. That that's something that can also be addresses more generally later. If we really want to keep a timeout then let's make it 30s or so.

Okay, I'll make the timeout longer here. I think we ought to eventually have a more general discussion about timeouts in tests, as I think @olix0r has some opinions on that as well.

@briansmith

* Propagate errors in conduit containers to the api (#1117) - It would be nice to display container errors in the UI. This PR gets the pod's container statuses and returns them in the public api - Also add a terminationMessagePolicy to conduit's inject so that we can capture the proxy's error messages if it terminates * proxy: Update prost to 0.4.0 (#1127) prost-0.4.0 has been released, which removes unnecessary dependencies. tower-grpc is being updated simultaneously, as this is the proxy's primary use of prost. See: https://github.com/danburkert/prost/releases/tag/v0.4.0 * Simplify & clarify "No TLS" server configuration (#1131) The same pattern will be used for the "No TLS" client configuration. Signed-off-by: Brian Smith <brian@briansmith.org> * proxy: Fix Inotify falling back to polling when files don't exist yet (#1119) This PR changes the proxy's Inotify watch code to avoid always falling back to polling the filesystem when the watched files don't exist yet. It also contains some additional cleanup and refactoring of the inotify code, including moving the non-TLS-specific filesystem watching code out of the `tls::config` module and into a new `fs_watch` module. In addition, it adds tests for both the polling-based and inotify-based watch implementations, and changes the polling-based watches to hash the files rather than using timestamps from the file's metadata to detect changes. These changes are originally from #1094 and #1091, respectively, but they're included here because @briansmith asked that all the changes be made in one PR. Closes #1094. Closes #1091. Fixes #1090. Fixes #1097. Fixes #1061. Signed-off-by: Eliza Weisman <eliza@buoyant.io> * test: Use proxy instead of lb for external test traffic (#1129) * test: Use proxy instead of lb for external test traffic * Adjust timeouts on install and get tests Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Display proxy container errors in the Web UI (#1130) * Display proxy container errors in the Web UI Add an error modal to display pod errors Add icon to data tables to indicate errors are present Display errors on the Service Mesh Overview Page and all the resource pages * Start running integration tests in CI (#1064) * Start running integration tests in CI * Add gcp helper funcs * Split integration test cleanup into separate phase Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Fix conduit version issue in integration tests (#1139) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Keep accepting new connections after TLS handshake error. (#1134) When a TLS handshake error occurs, the proxy just stops accepting requests. It seems my expectations of how `Stream` handles errors were wrong. The test for this will be added in a separate PR after the infrastructure needed for TLS testing is added. (This is a chicken and egg problem.) Signed-off-by: Brian Smith <brian@briansmith.org> * Add optional TLS client certificate authentication. (#1135) Refactor the way the TLS trust anchors are configured in preparation for the client and server authenticating each others' certificates. Make the use of client certificates optional pending the implementation of authorization policy. Signed-off-by: Brian Smith <brian@briansmith.org> * Attempt to load TLS settings immediately prior to starting watch (#1137) Previously, the proxy would not attempt to load its TLS certificates until a fs watch detected that one of them had changed. This means that if the proxy was started with valid files already at the configured paths, it would not load them until one of the files changed. This branch fixes that issue by starting the stream of changes with one event _followed_ by any additional changes detected by watching the filesystem. I've manually tested that this fixes the issue, both on Linux and on macOS, and can confirm that this fixes the issue. In addition, when I start writing integration tests for certificate reloading, I'll make sure to include a test to detect any regressions. Closes #1133. Signed-off-by: Eliza Weisman <eliza@buoyant.io> * Proxy: Make the control plane completely optional. (#1132) Proxy: Make the control plane completely optional. * Update Rustls to the latest Git version to fix a bug. (#1143) Using MS Edge and probably other clients with the Conduit proxy when TLS is enabled fails because Rustls doesn't take into consideration that Conduit only supports one signature scheme (ECDSA P-256 SHA-256). This bug was fixed in Rustls when ECDSA support was added, after the latest release. With this change MS Edge can talk to Conduit. Signed-off-by: Brian Smith <brian@briansmith.org> * Enable get for nodes/proxy for Prometheus RBAC (#1142) The `kubernetes-nodes-cadvisor` Prometheus queries node-level data via the Kubernetes API server. In some configurations of Kubernetes, namely minikube and at least one baremetal kubespray cluster, this API call requires the `get` verb on the `nodes/proxy` resource. Enable `get` for `nodes/proxy` for the `conduit-prometheus` service account. Fixes #912 Signed-off-by: Andrew Seigner <siggy@buoyant.io> * Grafana: remove fill and stack from individual resource breakouts (#1092) Remove the filling and stacking in request rate graphs that combine resources, to make it easier to spot outliers. * Grafana: remove fill and stack from individual resource breakouts * Remove all the stacks and fills from request rates everywhere * Build CLI only for host platform (#884) * Build CLI only for host platform Signed-off-by: Alena Varkockova <varkockova.a@gmail.com> * Changes after code review Signed-off-by: Alena Varkockova <varkockova.a@gmail.com> * Fix unbound variable issue in docker-build script (#1146) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * v0.4.4 release notes (#1145) * v0.4.4 release notes * Tweak wording about adblocker fix Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Upgrade to webpack 4 and webpack-dev-server 3 (#1138) Speeds up performance of webpack-dev-server. * proxy: Upgrade h2 to 0.1.10 (#1149) This picks up a fix for hyperium/h2#285 * Proxy: Make TLS server aware of its own identity. (#1148) * Proxy: Make TLS server aware of its own identity. When validating the TLS configuration, make sure the certificate is valid for the current pod. Make the pod's identity available at that point in time so it can do so. Since the identity is available now, simplify the validation of our own certificate by using Rustls's API instead of dropping down to the lower-level webpli API. This is a step towards the server differentiating between TLS handshakes it is supposed to terminate vs. TLS handshakes it is supposed to pass through. This is also a step toward the client side (connect) of TLS, which will reuse much of the configuration logic. Signed-off-by: Brian Smith <brian@briansmith.org> * proxy: Add `tls="true"` metric label to connections accepted with TLS (#1050) Depends on #1047. This PR adds a `tls="true"` label to metrics produced by TLS connections and requests/responses on those connections, and a `tls="no_config"` label on connections where TLS was enabled but the proxy has not been able to load a valid TLS configuration. Currently, these labels are only set on accepted connections, as we are not yet opening encrypted connections, but I wired through the `tls_status` field on the `Client` transport context as well, so when we start opening client connections with TLS, the label will be applied to their metrics as well. Closes #1046 Signed-off-by: Eliza Weisman <eliza@buoyanbt.io> * Truncate very long error messages, small tweaks to error messages (#1150) - If error messages are very long, truncate them and display a toggle to show the full message - Tweak the headings - remove Pod, Container and Image - instead show them as titles - Also move over from using Ant's Modal.method to the plain Modal component, which is a little simpler to hook into our other renders. * proxy: Clarify Outbound::recognize (#1144) The comments in Outbound::recognize had become somewhat stale as the logic changed. Furthermore, this implementation may be easier to understand if broken into smaller pieces. This change reorganizes the Outbound:recognize method into helper methods--`destination`, `host_port`, and `normalize`--each with accompanying docstrings that more accurately reflect the current implementation. This also has the side-effect benefit of eliminating a string clone on every request. * Add integration tests for tap (#1152) * Add integration tests for tap * Collect fewer tap events Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * dest service: close open streams on shutdown (#1156) * dest service: close open streams on shutdown * Log instead of print in pkg packages * Convert ServerClose to a receive-only channel Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Don't panic on stats that aren't included in StatAllResourceTypes (#1154) Problem `conduit stat` would cause a panic for any resource that wasn't in the list of StatAllResourceTypes This bug was introduced by https://github.com/runconduit/conduit/pull/1088/files Solution Fix writeStatsToBuffer to not depend on what resources are in StatAllResourceTypes Also adds a unit test and integration test for `conduit stat ns` * Fix dashboard integration test (#1160) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Proxy: Add TLS client infrastructure. (#1158) Move TLS cipher suite configuration to tls::config. Use the same configuration to act as a client and a server. Signed-off-by: Brian Smith <brian@briansmith.org> * Proxy: More carefully keep track of the reason TLS isn't used. (#1164) * Proxy: More carefully keep track of the reason TLS isn't used. There is only one case where we dynamically don't know whether we'll have an identity to construct a TLS connection configuration. Refactor the code with that in mind, better documenting all the reasons why an identity isn't available. Signed-off-by: Brian Smith <brian@briansmith.org> * Don't allow stat requests for named resources in --all-namespaces (#1163) Don't allow the CLI or Web UI to request named resources if --all-namespaces is used. This follows kubectl, which also does not allow requesting named resources over all namespaces. This PR also updates the Web API's behaviour to be in line with the CLI's. Both will now default to the default namespace if no namespace is specified. * Enable optional parallel build of docker images (#978) * Enable optional parallel build of docker images By default, docker does image builds in a single thread. For our containers, this is a little slow on my system. Using `parallel` allows for *optional* improvements in speed there. Before: 41s After: 22s * Move parallel help text to stderr * proxy: re-enabled vectored writes through our dynamic Io trait object. (#1167) This adds `Io::write_buf_erased` that doesn't required `Self: Sized`, so it can be called on trait objects. By using this method, specialized methods of `TcpStream` (and others) can use their `write_buf` to do vectored writes. Since it can be easy to forget to call `Io::write_buf_erased` instead of `Io::write_buf`, the concept of making a `Box<Io>` has been made private. A new type, `BoxedIo`, implements all the super traits of `Io`, while making the `Io` trait private to the `transport` module. Anything hoping to use a `Box<Io>` can use a `BoxedIo` instead, and know that the write buf erase dance is taken care of. Adds a test to `transport::io` checking that the dance we've done does indeed call the underlying specialized `write_buf` method. Closes #1162 * proxy: add HTTP/1.1 Upgrade support automatically (#1126) Any HTTP/1.1 requests seen by the proxy will automatically set up to prepare such that if the proxied responses agree to an upgrade, the two connections will converted into a standard TCP proxy duplex. Implementation ----------------- This adds a new type, `transparency::Http11Upgrade`, which is a sort of rendezvous type for triggering HTTP/1.1 upgrades. In the h1 server service, if a request looks like an upgrade (`h1::wants_upgrade`), the request body is decorated with this new `Http11Upgrade` type. It is actually a pair, and so the second half is put into the request extensions, so that the h1 client service may look for it right before serialization. If it finds the half in the extensions, it decorates the *response* body with that half (if it looks like a response upgrade (`h1::is_upgrade`)). The `HttpBody` type now has a `Drop` impl, which will look to see if its been decorated with an `Http11Upgrade` half. If so, it will check for hyper's new `Body::on_upgrade()` future, and insert that into the half. When both `Http11Upgrade` halves are dropped, its internal `Drop` will look to if both halves have supplied an upgrade. If so, the two `OnUpgrade` futures from hyper are joined on, and when they succeed, a `transparency::tcp::duplex()` future is created. This chain is spawned into the default executor. The `drain::Watch` signal is carried along, to ensure upgraded connections still count towards active connections when the proxy wants to shutdown. Closes #195 * Add controller admin servers and readiness probes (#1168) * Add controller admin servers and readiness probes * Tweak readiness probes to be more sane * Refactor based on review feedback Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * bin: Remove unused script (#1153) Committed in error. Signed-off-by: Eliza Weisman <eliza@buoyant.io> * Proxy: Implement TLS conditional accept more like TLS conditional connect. (#1166) * Proxy: Implement TLS conditional accept more like TLS conditional connect. Clean up the accept side of the TLS configuration logic. Signed-off-by: Brian Smith <brian@briansmith.org> * Upgrade prometheus to v2.3.1 (#1174) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * proxy: Document tls::config::watch_for_config_changes (#1176) While investigating TLS configuration, I found myself wanting a docstring on `tls::config::watch_for_config_changes`. This has one minor change in functionality: now, `future::empty()` is returned instead of `future:ok(())` so that the task never completes. It seems that, ultimately, we'll want to treat it as an error if we lose the ability to receive configuration updates. * Add CA certificate bundle distributor to conduit install (#675) * Add CA certificate bundle distributor to conduit install * Update ca-distributor to use shared informers * Only install CA distributor when --enable-tls flag is set * Only copy CA bundle into namespaces where inject pods have the same controller * Update API config to only watch pods and configmaps * Address review feedback Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Add probes and log termination policy for distributor (#1178) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

@briansmith

…linkerd#1119) This PR changes the proxy's Inotify watch code to avoid always falling back to polling the filesystem when the watched files don't exist yet. It also contains some additional cleanup and refactoring of the inotify code, including moving the non-TLS-specific filesystem watching code out of the `tls::config` module and into a new `fs_watch` module. In addition, it adds tests for both the polling-based and inotify-based watch implementations, and changes the polling-based watches to hash the files rather than using timestamps from the file's metadata to detect changes. These changes are originally from linkerd#1094 and linkerd#1091, respectively, but they're included here because @briansmith asked that all the changes be made in one PR. Closes linkerd#1094. Closes linkerd#1091. Fixes linkerd#1090. Fixes linkerd#1097. Fixes linkerd#1061. Signed-off-by: Eliza Weisman <eliza@buoyant.io>

hawkw added area/proxy area/security labels Jun 13, 2018

hawkw added this to the 0.5.0 milestone Jun 13, 2018

hawkw self-assigned this Jun 13, 2018

hawkw requested review from briansmith and olix0r June 13, 2018 23:32

briansmith reviewed Jun 14, 2018

View reviewed changes

hawkw added 18 commits June 14, 2018 14:59

proxy: Start adding tests for filesystem watch code

0c2ca6a

Add tests for inotify and polling

3fb6cfe

Add note on tempfile dep

eb324ab

Add tests for symlinked file watch creation events

9a2d0e1

Add more tests

16d25f2

proxy: Detect config file changes with hashes rather than timestamps

8e9f03c

Add test for when a double symlink's target changes

612ea78

first pass

7daad83

ugly fix

e53d8e3

Refactor for better testability

10b6537

Handle inotify errors using Stream::or_else

d6aeaaf

clean up inotify error handling

c9a5c0f

don't panic on logger errors

ee37e96

docker-test-proxy: Add RUST_LOG, pass args to cargo test

44295d3

start rewrite

10c2846

everything works now, needs polish

5990cb2

fix stuff

5991701

cleanup

2dc3a8c

hawkw added 5 commits June 14, 2018 14:59

Review feedback: don't run tests involving symlinks on Windows

dca706e

Review feedback: factor out BlockOnFor and improve documentation

c113146

Review feedback

98b5983

Move fs watch code to its own module

16020c9

Reduce repeated code in tests

ebe7b3d

hawkw force-pushed the eliza/fix-fallback-when-files-don't-exist branch from 284e2ca to ebe7b3d Compare June 14, 2018 21:59

briansmith reviewed Jun 14, 2018

View reviewed changes

Make tests not TLS specific

acdc91e

briansmith approved these changes Jun 15, 2018

View reviewed changes

Fix double symlink retargeting test

1b70738

briansmith reviewed Jun 15, 2018

View reviewed changes

Make timeout more generous

1f7decf

briansmith mentioned this pull request Jun 15, 2018

Proxy doesn't load TLS configuration files if they exist before proxy is started #1133

Closed

hawkw added 5 commits June 15, 2018 10:42

Fix delete_and_recreate test not expecting events on deletion

2df370f

Fix polling watch not producing events on deletion

fa2f95a

Clarify some tests, minor whitespace fixes

6498883

Fix missing paren in tests

db0dc46

Run rustfmt on new file prior to merging

8996730

hawkw merged commit 0eadcc5 into master Jun 15, 2018

olix0r deleted the eliza/fix-fallback-when-files-don't-exist branch October 13, 2020 14:55


		rt.spawn(bg);

		trust_anchors.write_all(b"Trust me on this :)").unwrap();

proxy: Fix Inotify falling back to polling when files don't exist yet #1119

proxy: Fix Inotify falling back to polling when files don't exist yet #1119

Conversation

hawkw commented Jun 13, 2018 • edited Loading

briansmith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

briansmith commented Jun 14, 2018

hawkw commented Jun 14, 2018

briansmith commented Jun 14, 2018

briansmith commented Jun 14, 2018

hawkw commented Jun 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hawkw commented Jun 14, 2018 • edited Loading

hawkw commented Jun 14, 2018

briansmith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hawkw commented Jun 13, 2018 •

edited

Loading

hawkw commented Jun 14, 2018 •

edited

Loading

hawkw commented Jun 14, 2018 •

edited

Loading