Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR: Download Strategy #821

Merged
merged 11 commits into from
Apr 21, 2021
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions docs/Architecture Decision Record/013-download-strategy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# 13. Download Strategy

Date: 2021-04-06 (yyyy-mm-dd)

## Status

Approved

## Context

In [this issue](https://github.com/raft-tech/TANF-app/issues/771) we are investigating the use of pre-signed URLs to determine whether there are security issues with the approach.

We had originally implemented pre-signed URLs for downloading files because the system would need to download the files from S3 and then download them again to the client, using resources on the backend for every download. This would not cause a problem in this stage of development, but when the backend will be charged with parsing data from potentially large files, those system resources would become more precious. Using pre-signed URLs takes the added pressure off of the backend entirely.

## Decision

We believe the use of time/IP address limited signed URLs is a reasonably secure approach to downloading files from S3. However, we also believe that it may cause issues with our ATO approval as the data is highly sensitive. Furthermore, 18F published a recommendation today, [recommending to not use pre-signed URLs](https://engineering.18f.gov/security/cloud-services/) for FISMA High projects.

In our investigation we discovered a way that we can [securely download the files from the backend while [streaming the files](https://github.com/jschneier/django-storages/blob/master/storages/backends/s3boto3.py#L83) directly from S3 to the client, taking any pressure off of resources needed for parsing files on the backend.

In light of these facts we have decided to shift our efforts to download files from the backend.

[Technical Documentation](../Technical-Documentation/data-file-downloads.md)

## Consequences

**Pros**
- Ensures access to S3 is completely hidden from the frontend.
- Cuts down on latency in passing signed signatures back and forth.
- Simplifies frontend logic
- Fewer requests get made, reducing the complexity of redux actions.
- it is more obvious to the user where the file is (it is no longer potentially uploaded, but unsubmitted)
- Allows us to leverage django storage which makes it easier to reason about the files in our database.
- reduces number of endpoints required on the backend.
- Eases path to ATO.


**Cons**
- Additional effort on the frontend and backend will be required for this shift. However, it's a fairly small lift for both.
- The original con of added resource expenditure on the backend is no longer an issue.
49 changes: 49 additions & 0 deletions docs/Technical-Documentation/data-file-downloads.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Data Files Download Architecture

This application provides a secure means to both store and download files from
AWS S3 through the use of an open source Django plugin `django-storages`. By
utilizing built in Django classes in conjunction with this plugin we can enable
downloading of these files through an API endpoint without having to write
the files to the local storage of the server, thus removing a performance
penalty that would be incurred by essentially downloading the file twice.

## Process Flow
![](diagrams/tdp-data-file-download-api.png)

## S3 File Storage
`django-storages` provides a custom Storage Backend for Django that enables
storing files in S3 instead of on the local Django server. This application
has historically used this library for collection and storage of static files
served for the Django admin. However, with this change we will move towards
using this to interface with the Data Files as well.

### S3Boto3Storage
This storage backend provides the support for opening files in read or write
mode and supports streaming (buffering) data in chunks to S3 when writing.

[Source code](https://github.com/jschneier/django-storages/blob/master/storages/backends/s3boto3.py#L233)

### S3Boto3StorageFile
This class extends Django's File class to support file streaming using the
[boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)
library's multipart uploading functionality. It provides a wrapper to access
the buffered file contents.

[Source code](https://github.com/jschneier/django-storages/blob/master/storages/backends/s3boto3.py#L79)

### InMemoryUploadedFile
This file class is provided by Django and represents a file that has been
uploaded into memory via streaming. The file returned by `django-storages` for
a given FileField associated with an S3 object will leverage the functionality
of this class to prevent needing to write the file to disk, resulting in a more
performant download experience.

[Source code](https://github.com/django/django/blob/main/django/core/files/uploadedfile.py#L78)

## ReportFile model
This is a custom model for the application that stores information about a
Data File that has been uploaded to the system. To leverage the features
mentioned above this model will have a [FileField](https://docs.djangoproject.com/en/3.2/ref/models/fields/#filefield)
on the model which is linked to S3 via `django-storages`. From the perspective
of the API, this will make downloading the file as simple as calling the `open`
method on the file property of a `ReportFile` instance.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<mxfile host="app.diagrams.net" modified="2021-04-16T15:22:10.545Z" agent="5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.152 Safari/537.36" etag="7lQ22vyzSLjkEvPILRMg" version="14.6.0" type="device"><diagram id="Uazb6U9cwz0oTInOb4sc" name="Page-1">7Vrbcts2EP0azbSdUYZXWXq0LKvJjN26sXvJI0RCJGKQoEHQkvL1XRAAJRF0rMai5aaNMyNycSP27NkLyIF/ka1/5qhIr1mM6cBz4vXAnw08zw08byD/O/FGSUZjVwkSTmLdaSu4JV+wFjpaWpEYl3sdBWNUkGJfGLE8x5HYkyHO2Wq/25LR/VULlGBLcBshakv/JLFIlXQcOlv5e0yS1KzsOrolQ6azFpQpitlqR+RfDvwLzphQV9n6AlOpPKMXNW7+RGvzYBzn4pAB7x9Ww4d5ML3zR3/E9Dq6XLHVUM/yiGilN3w3uwHBDAkEP3MCeoA7tsopQzFcnt980NsRG6MjgdfwBNNUZBQELlyWgrN7fMEo4yDJWQ49p0tCaUuEKElyuI1gDxjk00fMBQHtn+uGjMSxXGa6SonAtwWK5JorsDWQcVblMZbbc+BO7wQmwOsnVeQ2igeLxSzDgm+gix4QBqEaoo010NCttsj7ju6S7qBuIEba2JJm5i0ecKEh+SfweBY+lvZxDAarbxkXKUtYjujlVtrS07bPFWOFBuwzFmKj2YcqwfbhVGvKhb6uWOAm4gkWX9vREwhwTJEgj/sLdOlTD71hBJZukHMNVBo5d9yCpGQVj7Ae1UKleYwXAGUTqWaKU2LEoxQ8mPQ9XPmFKCV5ApcfcQFgaI5xHDEeW+husXOfYIGF04so0Dheo8jQ5oBr7H2XA6O+OGAW+9dwQJna14wl7IUD4WgfOq/NAUXO3jjgTf4H6iCgRqNnnFXPQLmhBdS19EoDb0RlIF+Anxol8mouwRj4c9stpSxbVOWruKTQa6mrIyyPOzxSW6vHi8pjS3+fpIf/XtOikdMCYGQDEHQA0FtaZBzLDgB2pvoRP1S4FNiOqbAClA9Pqe/I8fR52+0Mp70Zr2eT/xReGtTHN3/p8fXNJ3nzLjS3s/Vu42yzZ85H8e6ePzgwZ31pGHgZYL4F2A0rRcLx7W9XFnRQZRbyMtpQAghx/3krXygsrxaNAEX3SY3wr5WAabBxXwpMNzwONfxJixt+aHHjrIMak96oEbwRasTn8uRiGw+OmdIcWqgpszt67uN/Y6FmTRSEp02ifLss+YhFxXOQBQ5YkvP+7u6mjkNlwXIIN6cMQ6HzRG1wsjjk2zH8F/b95lCh3wKgIxF41RzKfxt5QAk0Ff25u+DQGO8Hp4zxvh3j1fFRVdZHRxEDI86FvGTLgTqXnRNMZZ4LGvWcGIORZzJUg0WlmMtfIl3RLczsLFVyTMoaXsY7cuLXPmeyauCTnzP5Z2+CEG8gMfaDQ0kz6SVHcJ3x/jn8q8d2OxGc71Bo4I0eKvnaRgYjjDJJJyORbOMsM8xDsn9WUUGGgITcY5RW+X1Z07NmLpIs/ZBf44zxze+FLF5xrFdji8/yndapuToZt4tYO1Fv+LvL1bPeuNp11KgOr2ABVKsZ8VIbr8GmEsvheBcqdchlRpZFjcVW0aZjpDKLc2nZyeIH2BQ8tWN+ftSog4seLlFG6EZ1TTF9xDLj2GlXFZRsdb1ivdugFpUtOeMZojttK63R8zqzVFx3KDgRzIfwzJF8mWCNlInTUOdAsq1Jg0ybrA5zPatjnqVuERzl5RLmMrPqoOysGI/3V2wGbkvGYUtbXjBWivKCib4Ijc5iUhYUaX2RXJWa9b6BBqK1fBu2XUYWqCwlC2FGDoyhmy0NZ59RnkiiZTXDTMC0knNlA2CNygyMaZwwZ3fddoHcdXjkd7yP7C1pD+zXXD8NmtfENRQ5hk3LZ+QELDRvEhRS3veQ3VO8FCfP7dv+0fM6/OOoK7kP+8LJLq4kTndpDRGiVH4XIbNLA0+JHiWCBWcRLkv1elKQzC5av1fIutLPLsj83qjVUQJk6AszSXwLB6ZO5i6az16kYmS4MFofeD78zeUTTBOOYoK3bRqRfeRgwLL+14Jvd6IYlWkDg0HwCi0wvWElEYTtIdmGeMGEAJ/cUZvvmFMX/HJfOmF2zT5NGu6qyK1i0kwed5oj0GydyG+S3qFVGbwDy46rSHyI5CNOC64u9vuU/nGMa9w+4/Jt4+p6YdV8O3R84/oPnGz6h5b6QT8nm2ftLPlbTzYnZ89M9M3VD9xuvzdT3bdf7fmXfwM=</diagram></mxfile>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.