From f2eeb0ba18b4da40b8334910ee56d3d806fa3f43 Mon Sep 17 00:00:00 2001 From: Michael Rebello Date: Thu, 1 Aug 2019 17:44:42 -0700 Subject: [PATCH 1/3] docs: update iOS performance findings Updating based on https://github.com/lyft/envoy-mobile/issues/128#issuecomment-516260951. Signed-off-by: Michael Rebello --- .../performance/device_conditions.rst | 102 ++++++++++++------ 1 file changed, 72 insertions(+), 30 deletions(-) diff --git a/docs/root/development/performance/device_conditions.rst b/docs/root/development/performance/device_conditions.rst index 772bc9a63c..bc0451f56d 100644 --- a/docs/root/development/performance/device_conditions.rst +++ b/docs/root/development/performance/device_conditions.rst @@ -11,9 +11,12 @@ iOS Valid through SHA :tree:`f05d43f `. -We did not observe any issues when switching between background/foreground or between WiFi/cellular. -**However**, we believe that this will become problematic when we change to calling Envoy directly, -rather making requests through ``URLSession`` and having Envoy proxy them. +.. warning:: + + Envoy Mobile is currently unable to properly reconnect or switch between connections when the device + is switched between WiFi and cellular. This is a common issue with libraries that use BSD sockets, + and it is being fixed as part of :issue:`this issue <13>`. + This test will need to be re-run after those changes. See below for more details. Android @@ -30,7 +33,45 @@ Experimentation method Modified versions of the "hello world" example apps were used to run the following experiments. See the iOS/Android sections below for instructions on building the examples. -Throughout all of the following steps, we tested to make sure that network requests succeeded +iOS +--- + +The original investigation was completed as part of :issue:`this issue <128#issuecomment-516260951>`. + +We built the Envoy Mobile library using the following flags to allow us to build to +a device with debugging symbols: + +``bazel build ios_dist --config=ios --ios_multi_cpus=arm64 --copt=-ggdb3`` + +In the active scheme of the app's ``Environment Variables``, set ``CFNETWORK_DIAGNOSTICS=3`` +to enable more verbose ``CFNetwork`` logs. Additionally, set Envoy's logs to ``trace``. + +Within the app, we performed the following: + +- Made single requests to a simple Python server from which we could see when the client connected, disconnected, and made requests +- Routed these requests through Envoy **using the socket implementation of Envoy Mobile via URLSession** (not calling directly into the Envoy Mobile library) + +Whenever we switched from WiFi to cellular (or vice versa), the next request would consistently fail +with ``-1001 kCFURLErrorTimedOut``. At the same time, we'd see the connection terminate on the server, +and we'd see logs from both Envoy and CFNetwork indicating that a new connection was established. + +When we executed the next request, it would complete successfully. + +If we did this more quickly and sent several requests in rapid succession, +**the first would still fail and the subsequent requests would complete normally**. + +Setting the ``URLSessionConfiguration``'s ``httpMaximumConnectionsPerHost`` to ``1`` +(preventing concurrent connections) and sending several requests in rapid succession resulted in all +of them failing after the ``timeoutIntervalForRequest`` specified on the ``URLSessionConfiguration``. +This is the same behavior seen with libraries like gRPC which use BSD sockets. + +Additionally, putting the phone in airplane mode results in all requests failing immediately +(instead of waiting for the specified timeout) because iOS was aware that it had no connectivity. + +Android +------- + +Throughout all of the following steps, we tested to make validate if network requests succeeded when making each of the lifecycle/network changes listed below. Lifecycle experiment steps: @@ -48,20 +89,6 @@ Network experimentation steps: 5. Turn on airplane mode 6. Turn off airplane mode -iOS ---- - -The original investigation was completed as part of :issue:`this issue <128>`. - -Reproducing the Envoy example app: - -1. Build the library using ``bazel build ios_dist --config=ios --config=fat`` -2. Copy ``./dist/Envoy.framework`` to the example's :repo:`source directory ` -3. Build/run the example app on a physical device - -Android -------- - Reproducing the Envoy example app: 1. Build the library using ``bazel build android_dist --config=android`` @@ -73,22 +100,37 @@ Analysis iOS --- -With the current configuration of sending traffic over ``URLSession`` and having Envoy proxy it through, -there seems to be no issues on iOS when switching between WiFi/cellular or background/foreground. +Envoy is currently configured as such: + +``[URLSession] --> [Socket] --> [Envoy Mobile] --> [Socket] --> [Internet]`` + +With the current configuration of sending traffic over URLSession and having Envoy proxy it through, +we identified issues with Envoy being able to reconnect or switch between connections when the device +underwent various network changes such as toggling between WiFi and cellular. + +The experiment above indicates that when a working connection changes to inactive (i.e., by disabling WiFi and +forcing the phone to switch to cellular), the sockets aren't notified of the change. +This is a commonly understood issue with BSD sockets on iOS, and is why Apple strongly advises against using them. + +Switching networks then executing a request through URLSession would result in the request timing out. +Executing another network request resulted in the following, +which could make it seem like Envoy was working properly at first glance (even though it wasn't): + +- iOS realized that the connection was dead and terminated its socket connection with Envoy, then re-established it +- When the connection with Envoy was terminated, Envoy in turn terminated its socket connection with the outside Internet +- When iOS reconnected to Envoy, Envoy also reconnected and selected the first available connection (cellular in this case) +- Future requests succeeded because they're sent over the new/valid connection -However, these findings are strange given that -`issues have been observed `_ with the cellular radio and gRPC -in the past. We were able to reproduce this issue in the :issue:`investigation <128>`, which showed that gRPC -channels (using BSD sockets under the hood) stalled when switching from WiFi to cellular, while requests made -through ``URLSession`` and proxied through Envoy continued to succeed. +Essentially, URLSession forced Envoy to reconnect/switch to a valid connection when a request failed due to +the fact that it was disconnecting from Envoy and reconnecting to it. -The current theory is that ``URLSession`` is doing something smart under the hood to handle these changes. +This means: -We will need to re-run these tests when we switch to calling Envoy Mobile directly as a library -(rather than running on top of calls to ``URLSession``). +- When Envoy is called as a library (instead of proxying URLSession over a socket), it will break because nothing will force it to reconnect to a valid connection +- Restricting URLSession's concurrent connections makes this problem immediately apparent even in today's setup because the only existing connection becomes invalid -Depending on the outcome of :issue:`#13 <13>`, we shouldn't have problems as long as we use Apple-approved -network solutions for the transport on iOS (such as ``CFNetwork``/``Network.framework``/etc.). +:issue:`Issue #13 <13>` will be implementing Apple-approved network solutions for the transport layer +on iOS (such as CFNetwork/Network.framework/etc.), which will resolve these problems. Android ------- From 217e8a65fbbe1eda2d2696dab4a767be9ef90d84 Mon Sep 17 00:00:00 2001 From: Michael Rebello Date: Thu, 1 Aug 2019 17:49:37 -0700 Subject: [PATCH 2/3] grammar Signed-off-by: Michael Rebello --- docs/root/development/performance/device_conditions.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/root/development/performance/device_conditions.rst b/docs/root/development/performance/device_conditions.rst index bc0451f56d..b1fecc78f7 100644 --- a/docs/root/development/performance/device_conditions.rst +++ b/docs/root/development/performance/device_conditions.rst @@ -65,7 +65,7 @@ Setting the ``URLSessionConfiguration``'s ``httpMaximumConnectionsPerHost`` to ` of them failing after the ``timeoutIntervalForRequest`` specified on the ``URLSessionConfiguration``. This is the same behavior seen with libraries like gRPC which use BSD sockets. -Additionally, putting the phone in airplane mode results in all requests failing immediately +Putting the phone in airplane mode resulted in all requests failing immediately (instead of waiting for the specified timeout) because iOS was aware that it had no connectivity. Android @@ -112,14 +112,14 @@ The experiment above indicates that when a working connection changes to inactiv forcing the phone to switch to cellular), the sockets aren't notified of the change. This is a commonly understood issue with BSD sockets on iOS, and is why Apple strongly advises against using them. -Switching networks then executing a request through URLSession would result in the request timing out. +Switching networks then executing a request through URLSession resulted in the request timing out. Executing another network request resulted in the following, which could make it seem like Envoy was working properly at first glance (even though it wasn't): - iOS realized that the connection was dead and terminated its socket connection with Envoy, then re-established it - When the connection with Envoy was terminated, Envoy in turn terminated its socket connection with the outside Internet - When iOS reconnected to Envoy, Envoy also reconnected and selected the first available connection (cellular in this case) -- Future requests succeeded because they're sent over the new/valid connection +- Future requests succeeded because they were sent over the new/valid connection Essentially, URLSession forced Envoy to reconnect/switch to a valid connection when a request failed due to the fact that it was disconnecting from Envoy and reconnecting to it. From 4280f218ed3054e12ecc22d9c35eb51291b49362 Mon Sep 17 00:00:00 2001 From: Michael Rebello Date: Fri, 2 Aug 2019 13:30:22 -0700 Subject: [PATCH 3/3] CR Signed-off-by: Michael Rebello --- .../performance/device_conditions.rst | 27 ++++++++++++++----- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/docs/root/development/performance/device_conditions.rst b/docs/root/development/performance/device_conditions.rst index b1fecc78f7..f79048c875 100644 --- a/docs/root/development/performance/device_conditions.rst +++ b/docs/root/development/performance/device_conditions.rst @@ -38,18 +38,31 @@ iOS The original investigation was completed as part of :issue:`this issue <128#issuecomment-516260951>`. -We built the Envoy Mobile library using the following flags to allow us to build to -a device with debugging symbols: +**Configuration:** + +1. Build the app using the following flags to allow us to build to a device with debugging symbols: ``bazel build ios_dist --config=ios --ios_multi_cpus=arm64 --copt=-ggdb3`` -In the active scheme of the app's ``Environment Variables``, set ``CFNETWORK_DIAGNOSTICS=3`` -to enable more verbose ``CFNetwork`` logs. Additionally, set Envoy's logs to ``trace``. +2. Add the outputted ``Envoy.framework`` to the example app + +3. In the active scheme of the app's Xcode ``Environment Variables``, set ``CFNETWORK_DIAGNOSTICS=3`` to enable more verbose ``CFNetwork`` logs + +4. Set Envoy's logs to ``trace`` + +**Experiment:** + +Make single requests to a +`simple Python server `_ +that shows when the client connects, disconnects, and makes requests. These request should be routed +through Envoy **using the socket implementation of Envoy Mobile via URLSession** +(not calling directly into the Envoy Mobile library). -Within the app, we performed the following: +- Send a request on WiFi, then switch to cellular (disabling WiFi) and make more requests +- Set ``URLSessionConfiguration``'s ``httpMaximumConnectionsPerHost`` to ``1`` and try this again +- Switch the phone to airplane mode and try making requests -- Made single requests to a simple Python server from which we could see when the client connected, disconnected, and made requests -- Routed these requests through Envoy **using the socket implementation of Envoy Mobile via URLSession** (not calling directly into the Envoy Mobile library) +**Findings:** Whenever we switched from WiFi to cellular (or vice versa), the next request would consistently fail with ``-1001 kCFURLErrorTimedOut``. At the same time, we'd see the connection terminate on the server,