A bearer token authenticated dropbox that drops its payloads into an S3 bucket, designed to run as an AWS Lambda function via a Lambda Function URL.
This is a simple single-function dropbox. Payloads are expected to be small: a limit of 10 KB is enforced.
Tip
Looking for the FastAPI-based s3-dropbox? We moved away from that to AWS Lambda. You can access the most recent FastAPI-based code in the git history.
- Authentication
- Deployment
- Required environment variables
- Optional environment variables
- Permissions
- Running type checking and tests
- Running locally outside of tests
There are 2 layers of authentication:
-
Clients that send data must pass an
Authorization
HTTP header in the formatBearer <token>
, where<token>
is the client token generated by the create_token.py script. The corresponding server token, which is a hashed form of the client token, should be stored in theAUTH_TOKEN
environment variable. -
The CDN in front of the Lambda Function URL must add a HTTP header in the format
Bearer <token>
, where<token>
is the client token generated by the create_token.py script. The corresponding server token, which is a hashed form of the client token, should be stored as theCDN_TOKEN
environment variable. The name of the HTTP header should be stored in theCDN_TOKEN_HTTP_HEADER_NAME
. A typical name would bex-cdn-authorization
.This is in place so a web application firewall (WAF) can be attatched to the CDN with additional protections. For example a web access control list (web ACL) that limits connections from certain IP addresses.
The
CDN_TOKEN
should be generated by a separate run of the create_token.py script to the run that generatesAUTH_TOKEN
.
Deployment is fairly manual (AKA ClickOps):
-
Create an S3 bucket for the data to be dropped in.
-
Create a Python Lambda function and copy and paste the code from main.py into the AWS Console, ensuring to "deploy" the code. The only Python dependency is boto3, which is already automatically available in AWS Lambda.
-
Enable Function URL for the Lambda Function, disabling authentication.
-
Create a CloudFront distribution with origin of the Lambda Function URL, and a behaviour pointing to this origin at a specific path. A typical path would be "/v1/drop".
The default behaviour, i.e. for other paths, can be set to a "blackhole" origin that points to the non-existant domain "blackhole.invalid" for example.
-
Create a WAF attached to the CloudFront distribution with appropriate protections. For example, one that allows only specific IP addresses through.
-
Set environment variables on the Lambda function as described below.
-
(Safely) pass the client token to the client that will be connecting.
-
Configure the origin of the CloudFront distribution to set its token as the
CDN_TOKEN_HTTP_HEADER_NAME
HTTP header. -
Set permissions on the execution role of the Lambda function as described below.
Configuration is via environment variables, and 4 are required to be explicitly set for s3-dropbox to function.
-
BUCKET
The S3 bucket name to upload files to
-
AUTH_TOKEN
The server side token created from create_token.py, corresponding to the plain text bearer token given to the client.
-
CDN_TOKEN
The server side token created from create_token.py, corresponding to the plain text token set in the CDN in the HTTP header with name
CDN_TOKEN_HTTP_HEADER_NAME
.The
CDN_TOKEN
should be generated by a separate run of the create_token.py script to the run that generatesAUTH_TOKEN
. -
CDN_TOKEN_HTTP_HEADER_NAME
The name of the HTTP header that contains the client token created by create_token.py. This is typically added by the CDN running in front of the Lambda Function URL, for example CloudFront.
These environment variables can be used to configure s3-dropbox, but typically do not have to be explicitly set.
-
S3_ENDPOINT_URL
The endpoint of S3 or the S3-compatible service. If this is unset, the AWS S3 endpoints are used, and so typically they should only need tobe set during testing outside of AWS Lambda.
-
These are technically required because boto3, the Python AWS SDK, uses them, they are automatically populated by the AWS Lambda environment, and so they would typically would only need to be explicitly set during testing outside of AWS Lambda.
The only permission that the Lambda IAM execution role needs (in addition to any usual logging permissions) is s3:PutObject
on the bucket specified in the BUCKET
environment variable.
Python requirements must be installed and a local S3-like service started:
pip install -r requirements-dev.txt
./start-services
Then to run type checking:
mypy .
Then to run the tests:
pytest
Or to run the tests with more verbose output:
pytest -s
At the time of writing, it hasn't been necessary to run the function locally outside of tests.