Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested changes for LambdaBootstrap testability #540

Merged
merged 5 commits into from
Nov 6, 2019
Merged

Suggested changes for LambdaBootstrap testability #540

merged 5 commits into from
Nov 6, 2019

Conversation

martincostello
Copy link
Contributor

This PR is initially more for discussion/suggestion as the use case might be too niche to actually change the LambdaBootstrap class that ships to NuGet.org, or the change to enable the use case might be better achieved in other ways for changing the public API surface.

Scenario

I've recently started some work on porting a .NET Core 3.0 worker service which processes SQS messages to be an AWS Lambda function instead.

As the project is already targeting .NET Core 3.0, I decided to go with the custom runtime support rather than downgrade to .NET Core 2.1 to use the built-in support.

As part of the work I looked at creating some "end-to-end" or "integration" style tests I've used before with HTTP APIs to run the project as a "black box" in-memory and exercise the code from the outside so things like configuration, serialization etc. are covered by tests as well.

This has lead to the following open source project which I put together yesterday and have published as a NuGet package as a v0.1.0 which builds on top of ASP.NET Core's TestServer: https://github.com/martincostello/lambda-test-server

Essentially it emulates the Lambda runtime that Amazon.Lambda.RuntimeSupport calls internally over HTTP to deliver requests to the message loop and listens for responses and errors from the function for inspection by the test code.

Here's some example code of what you can achieve with it:

Problem

However, there's two things I've had "workaround" in the way LambdaBootstrap is designed to get things working.

The first is that there's no way to inject a custom HttpClient into LambdaBootstrap which provides the hook to redirect the HTTP calls in-memory. This could be achieved by hosting the Lambda test server on an actual HTTP port and setting the AWS_LAMBDA_RUNTIME_API environment variable to its host and port, but this increases the complexity to some degree rather than simple use of TestServer and makes it less lightweight.

For the time being this can be worked around in test code by using Reflection to change the runtime API client that's being used, but it's a bit of a hack. It would be cleaner if it was possible to specify a custom HttpClient as a constructor argument to LambdaBootstrap.

https://github.com/martincostello/lambda-test-server/blob/da1ac1800af1a10eff2a74a2aa192140a9ae26a6/tests/AwsLambdaTestServer.Tests/FunctionRunner.cs#L24-L32

The second is that the internal loop will invariably always be listening for at least one more message from the runtime using what's coded like long-polling after a single invocation. This can be minimised by getting the timing "just-so" so the CancellationToken passed to LambdaBootstrap.RunAsync() is cancelled before the loop starts again, but if the timing is off you need to deliver at least one more message to get the polling to stop, otherwise it hangs.

You also can't just throw an exception or return no content or an HTTP error, as it causes the loop in the bootstrapper to throw an exception either by receiving a non 2xx status code, throws the exception from the TestServer out to the caller or throws because one of the required headers isn't set.

This required me to deliver a "dummy" empty JSON response back to the Lambda runtime to get the loop to terminate:

https://github.com/martincostello/lambda-test-server/blob/da1ac1800af1a10eff2a74a2aa192140a9ae26a6/src/AwsLambdaTestServer/RuntimeHandler.cs#L125-L137

This then delivers a dud message to the function handler, which may or may not handle it correctly (though it can be worked around), but having to provide an "extra" fake message to get the loop to terminate is also a bit of a hack considering that the loop has an argument for a CancellationToken, but that it isn't flowed through to the HttpClient to cancel the request to get a new message.

One way of improving this might be to flow the CancellationToken from RunAsync() through to the HttpClient's requests instead of the None value that is used today, and then catching the OperationCancelled exception to stop the loop.

I realise that this isn't an issue with a production Lambda runtime because the process is frozen when there are no messages so as far as the function's concerned there's a continuous loop of messages that don't need to timeout/cancel/abort, but it seems like a small change to make testing/simulation outside of the Lambda environment itself easier to work with.

Possible Solution

This PR proposes one possible set of changes to Amazon.Lambda.RuntimeSupport that would improve the scenario described above.

The first is to provide two new constructors that accept an HttpClient parameter to allow for the in-process test server to be used. I considered making the Client property public instead, but I thought this approach neater as it removes the extra HttpClient being allocated that's then no longer used. The new constructors are used rather than another optional parameter as it's a non-breaking API surface area change.

The second is more contentious as it consists of two parts.

The first simpler part is catching OperationCanceled exception in the message loop and swallowing it, which then causes the loop to observed the CancellationToken has been signalled and return without having to provide "one more" message to process for the function before the token is observed.

The second part is trickier as due to the use of the public IRuntimeApiClient interface, it wasn't possible to pass through the CancellationToken without making a breaking change to the interface by introducing a new optional parameter to the GetNextInvocationAsync() method.

I'm open to suggestions on a better way to achieve this without having to change the public API surface area.

Next Steps

As I said at the top of the issue, this PR is for discussion and it might not be a correct solution or enough of a real-world problem to actually change the library code.

I could expect the outcode of this PR to be one of the following outcomes (there may be others):

  1. Accept as-is for a v2.0.0 with the breaking change to IRuntimeApiClient
  2. Accept in-principle, but change the approach to improve the scenario in another way that may or may not make breaking changes
  3. Close the PR without merging as the changes do not fit with the overall architecture and goals of the Lambda runtime support package and/or Lambda runtime in production environments
  4. Close the PR without merging as the use-case is too niche

Thanks for taking the time to consider the above and this PR!

Support a user-provided HttpClient with LambdaBootstrap.
Pass the CancellationToken from LambdaBootstrap.RunAsync() through to IRuntimeApiClient.GetNextInvocationAsync() and catch OperationCanceledException in LambdaBootstrap and return.
@martincostello
Copy link
Contributor Author

And something I forgot to include, here's a commit that shows how the test server and function code could be tidied up with these changes in place: martincostello/lambda-test-server@d7dff32

Copy link
Member

@normj normj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with the idea of the PR but need a few changes done first.

{
await InvokeOnceAsync(cancellationToken);
}
catch (OperationCanceledException)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we only swallow OperationCanceledException if cancellationToken.IsCancellationRequested is true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair - I did tinker with inspecting the OperationCanceledException but it wasn't 100% reliable in cases where different cancellation tokens where combined. I think that'd be a good compromise against accidentally swallowing some exception coming from inside the library that wasn't prompted by the caller cancelling.

{
_httpClient = httpClient ?? throw new ArgumentNullException(nameof(httpClient));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to keep track if bootstrap created the httpclient or if it was given to bootstrap. If it was given to bootstrap then we should not dispose it as that is the responsibility of the caller.

/// <returns>A Task representing the asynchronous operation.</returns>
public async Task<InvocationRequest> GetNextInvocationAsync()
public async Task<InvocationRequest> GetNextInvocationAsync(CancellationToken cancellationToken)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this slight breaking change but we should make it cancellationToken = default so it won't break users that recompile their application.

Also we might as well add cancellationToken to all of the methods in IRuntimeApiClient to to make the one "breaking" change all at once.

I don't really anticipate users calling IRuntimeApiClient directly which is why I'm okay making the breaking change.

Only swallow OperationCanceledException if IsCancellationRequested is true on the CancellationToken passed into the method by the caller.
Do not dispose of HttpClient if a value is provided through the constructor by the user. Only dispose if created by the instance itself.
Add an optional CancellationToken argument to all IRuntimeApiClient methods.
@martincostello martincostello requested a review from normj November 4, 2019 10:40
@normj
Copy link
Member

normj commented Nov 5, 2019

I have merged the PR into the dev branch to go out with the next release.

@martincostello
Copy link
Contributor Author

Awesome - thanks!

@martincostello
Copy link
Contributor Author

Are you able to share a rough estimate of when the next SDK release will ship?

@normj normj merged commit f2f3b97 into aws:master Nov 6, 2019
@normj
Copy link
Member

normj commented Nov 6, 2019

Version 1.1.0 of Amazon.Lambda.RuntimeSupport is out with this PR. Thanks for your contribution!

@martincostello martincostello deleted the Lambda-Bootstrap-Testability branch November 6, 2019 07:19
martincostello added a commit to martincostello/lambda-test-server that referenced this pull request Nov 6, 2019
React to changes from aws/aws-lambda-dotnet#540 and remove workarounds required to plug in the Lambda test server that were needed with v1.0.0 of Amazon.Lambda.RuntimeSupport that aren't required with changes made in v1.1.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants