Some Source Interface uploads fail with Internal Server Error #4078

zenmonkeykstop · 2019-01-28T19:57:24Z

Description

The SD Source Interface should support file uploads of up to 500MB, with larger uploads failing immediately. Instead, large file uploads <500MB are frequently failing after a long timeout.

(This was initially reported as happening with files of 50MB or more, but it isn't consistently reproducible at that size.)

Steps to Reproduce

Using VMs or a hardware instance:

Visit the Source Interface using the Tor Browser
Click through to the /lookup page
Upload a large file (50MB+)

Expected Behavior

File upload completes with success message on page.

Actual Behavior

Upload has a pretty good chance of failing with message below:

redshiftzero · 2019-01-28T23:37:04Z

Corresponding log lines in source error apache logs:

[Mon Jan 28 23:16:55.007331 2019] [reqtimeout:info] [pid 5161:tid 3478096422656] [client 127.0.0.1:32854] AH01382: Request body read timeout
[Mon Jan 28 23:16:55.008916 2019] [:error] [pid 5161:tid 3478096422656] (70007)The timeout specified has expired: [client 127.0.0.1:32854] mod_wsgi (pid=5161): Unable to get bucket brigade for request., referer: http://<my staging server>.onion/lookup
[Mon Jan 28 23:16:55.048291 2019] [:error] [pid 5159:tid 3478277211904] [remote 127.0.0.1:36570] mod_wsgi (pid=5159): Exception occurred processing WSGI script '/var/www/source.wsgi'.
[Mon Jan 28 23:16:55.048872 2019] [:error] [pid 5159:tid 3478277211904] [remote 127.0.0.1:36570] IOError: failed to write data

The connection is dropping for some reason before the file upload finishes, this could be an Apache timeout or tor level issue...

redshiftzero · 2019-01-28T23:46:54Z

It might be worth experimenting with the TimeOut and KeepAliveTimeout directives - I've tried increasing them from the default, which doesn't resolve the situation entirely, but resolving this issue may be a matter of tuning these values. But I currently have insufficient data to determine whether or not increasing these timeouts improves the situation or not.

zenmonkeykstop · 2019-01-29T02:11:16Z

One initial thought was that this was a regression error introduced relatively recently. Testing against the 0.8.0 tag on staging VMs with a 100MB test file, I managed to reproduce the error, so it's been around for a while. If we can tweak keepalives and such without increasing the risk of DoS attacks, that might be the simplest way to resolve this bug.

zenmonkeykstop · 2019-01-29T02:33:48Z

Looking at:

https://metrics.torproject.org/torperf.html?start=2018-10-31&end=2019-01-29&server=onion&filesize=5mb

it seems like a 500MB file could take 25min+ on average (5s to set up, 3s/MB), if that informs timeout settings.

kushaldas · 2019-01-29T09:51:08Z

I am testing various numbers on those configuration files with a 125MB file.

kushaldas · 2019-01-29T12:37:58Z

Timeout 120
KeepAlive On
MaxKeepAliveRequests 400
KeepAliveTimeout 300

Allows me to submit 125MB file nicely, but, not 250MB files. I am still trying out other numbers.

kushaldas · 2019-01-29T14:16:07Z

Even by doubling every number, a file of 250MB size is failing. Someone else please pick it up from here as I will be logging out soon.

zenmonkeykstop · 2019-01-29T16:24:53Z

This morning tried

KeepAlive On
KeepAliveTimeout 60

(which I see now was a bit more conservative than @kushaldas' settings) and it made no difference. 100MB+ still failing on staging VMs for me.

After reading a bit more I don't see the KeepAlive timeout value as being relevant, it would affect subsequent connections but not single long-running ones AFAICT. the Timeout value is probably more important.

zenmonkeykstop · 2019-01-29T16:53:06Z

Looking into mod_reqtimeout in /etc/apache2/mods-enabled/reqtimeout.conf - settings are the default, which includes a minimum bitrate of 500bytes/s. Setting that down to 100bytes/s got me my first successful 100MB upload!

(Again depending on how much protection we need against DoSes, that min bitrate setting could be turned off altogether. Then the Apache Timeout directive would be the relevant one to tweak.)

zenmonkeykstop · 2019-01-30T00:37:35Z

Tested 250MB upload with minrate disabled: connection never timed out, no network activity after 30mins or so, killed it after 1hr.
Tested 250MB upload with minrate set to 100: connection timed out after 30min with "Internal Server Error" message
Tested 250MB upload with head and body timeouts set to 0 (effectively disabling mod_reqtimeout): connection stays open after download completed, never closes.
Tested 250MB upload with mod_reqtimeout disabled with a2dismod reqtimeout && service apache2 restart: connection stays open after download completed, never closes.

kushaldas · 2019-01-30T18:09:40Z

I also tried to disable the whole gpg encryption just to test, the last file uploads were still getting connection reset.

kushaldas · 2019-01-30T18:26:51Z

I modified the following in the store.py.

            import shutil
            shutil.copyfile(stf, encrypted_file_path)
            #current_app.crypto_util.encrypt(
            #    stf, self.__gpg_key, encrypted_file_path)

redshiftzero · 2019-02-06T18:29:14Z

I applied the diff below in a staging VM in order to determine if offloading work to an async worker would address this issue. The diff below disables both gpg encryption and moving files around on disk in case the length of time these tasks are taking is causing the timeouts. My expectation is that if the length of time these tasks is taking is the sole cause of the timeouts, then with the diff below no timeouts should occur. Unfortunately, for a 70MB file, I still saw the timeout 50% of the time (warning: low number statistics, I only did this 4 times). So we need to do a bit more investigation, as there is either another cause of the issue, instead of or - if there are multiple contributing factors - in addition to the hypothesized cause above.

Here's the diff:

diff --git a/securedrop/source_app/main.py b/securedrop/source_app/main.py
index e3dd66b4..649cff3e 100644
--- a/securedrop/source_app/main.py
+++ b/securedrop/source_app/main.py
@@ -157,6 +157,8 @@ def make_blueprint(config):
                     fh.filename,
                     fh.stream))

+        current_app.logger.error('we got back into the view function!')
+
         if first_submission:
             msg = render_template('first_submission_flashed_message.html')
             flash(Markup(msg), "success")
diff --git a/securedrop/store.py b/securedrop/store.py
index 924f8d5e..f0e909f2 100644
--- a/securedrop/store.py
+++ b/securedrop/store.py
@@ -71,8 +71,8 @@ class Storage:
             if ext != '.gpg':
                 # if there's an extension, verify it's a GPG
                 raise PathException("Invalid file extension %s" % (ext, ))
-            if not VALIDATE_FILENAME(filename):
-                raise PathException("Invalid filename %s" % (filename, ))
+            #if not VALIDATE_FILENAME(filename):
+            #    raise PathException("Invalid filename %s" % (filename, ))

     def path(self, *s):
         """Get the normalized, absolute file path, within
@@ -136,7 +136,8 @@ class Storage:
             count,
             journalist_filename)
         encrypted_file_path = self.path(filesystem_id, encrypted_file_name)
-        with SecureTemporaryFile("/tmp") as stf:  # nosec
+        current_app.logger.error('file starting to stream to disk!')
+        with SecureTemporaryFile("/var/lib/securedrop/store") as stf:  # nosec
             with gzip.GzipFile(filename=sanitized_filename,
                                mode='wb', fileobj=stf, mtime=0) as gzf:
                 # Buffer the stream into the gzip file to avoid excessive
@@ -147,10 +148,20 @@ class Storage:
                         break
                     gzf.write(buf)

-            current_app.crypto_util.encrypt(
-                stf, self.__gpg_key, encrypted_file_path)
+            current_app.logger.error('file finished streaming to disk!')

-        return encrypted_file_name
+            # Disable gpg encryption as this might take some time and cause
+            # a timeout.
+            #current_app.crypto_util.encrypt(
+            #    stf, self.__gpg_key, encrypted_file_path)
+
+        # Touch a file in place instead of copying, as copying might take some
+        # time and cause a timeout.
+        with open(stf.name + '.gpg', 'a'):
+            os.utime(stf.name + '.gpg', None)
+
+        # Return this dummy file so success is reported back to the user/tester.
+        return stf.name + '.gpg'

     def save_pre_encrypted_reply(self, filesystem_id, count,
                                  journalist_filename, content):

redshiftzero · 2019-02-12T20:58:06Z

Likely related: onionshare/onionshare#899 (courtesy @micahflee)

redshiftzero · 2019-02-13T02:03:21Z

OK I've now tested two 300 MB file uploads to a staging instance after unchecking the Sanitize cross-site suspicious requests option in NoScript:

and they both worked, so we're indeed hitting the same problem as onionshare.

redshiftzero · 2019-02-13T02:23:32Z

Until upstream resolves this issue, choices are:

Drastically restrict file upload size.
Enable js and use an AJAX based approach similar to what onionshare is doing in https://github.com/micahflee/onionshare/pull/901/files as a workaround (this involves instructing sources to set the security slider to Safer instead of Safest) - obviously a big departure from SecureDrop's prior avoidance of JavaScript
Provide a page walking users through how to disable the sanitization option above, which involves going into Noscript's advanced settings, and seems somewhat scary and suspicious so the messaging needs to be carefully written (and decided on ASAP because the strings needs to get translated)

1 or 3 is something we could realistically do for 0.12.0, I prefer 3 since it preserves functionality. We should make sure tomorrow to evaluate the implications of disabling this protection in more detail, but from a first analysis it seems it is the lowest risk option that preserves SecureDrop's main functionality, which after all, is the uploading of files.

eloquence · 2019-02-13T02:42:56Z

Tested with NoScript "Sanitize cross-site suspicious requests" checkbox unchecked & 271MB file on 0.11.1 prod/hardware instance: success! Let's indeed discuss how to best engage with the associated upstream issues but I agree that, in the absence of a fix, option 3 makes the most sense.

micahflee · 2019-02-13T16:56:03Z

It looks like you're probably not going with option 2. But if you do, I just discovered that it will only work with the Tor Browser security slider set to Standard, not to Safer or Safest (except on SecureDrop servers that use HTTPS, then Safer will work). The slider set to Safer blocks javascript on non-HTTPS sites, including non-HTTPS onion sites.

ninavizz · 2019-02-13T22:47:27Z

^ Design ticket to work on messaging/walkthrough things for Option 3. FYI. Will post what the team decides they like, here, once the design task Issue is resolved.

eloquence · 2019-02-13T23:25:29Z

A few notes:

Given that the root cause is POST data sanitization by a browser extension, it's unlikely that this error will only be triggered above a certain size threshold. It is more likely that the error will be triggered at higher probability at larger sizes. I have changed the issue title accordingly.
Our plan of record is to add an explanation to the Source Interface. We've also engaged upstream NoScript, and may reach out to upstream Mozilla as well, but that's unlikely to help in the near term. @rmol is taking the lead on implementation of the explanatory page, starting with some preliminary investigations, and @ninavizz is taking a design stab at different integration options.
As a reminder, we need to get Source Interface text changes in before EOD 2/19 (string freeze), and 2/18 is a US holiday. So we're in a bit of a crunch on this, time-wise. This argues for the simplest possible implementation. I suggest we aim to have a first PR at least in WIP state (with text/design possibly in flux) by EOD tomorrow (Thursday).

ninavizz · 2019-02-14T19:16:47Z

Decided Upon Source Interface Messaging (ignore the pink prototype nav arrows in mox)

All text in items 2 and 3, cut-and-pasted verbatim from existing experience to preserve translations
@eloquence and @redshiftzero on the fence about ordering of items 2 and 3; see discussion-points in "Feedback" in ux Issue comment.

UX Action Items remaining

Nina will post final text & PNG assets for dev(s) to build page(s) and UI edits in new comment, here, by Thurs Eve (PST)

eloquence · 2019-02-14T20:24:00Z

We've agreed to keep the ordering as in the mock above. Per @emkll we'll also add a recommendation to re-enable the setting post-upload to the instructions, and of course we'll need to monitor the NoScript situation carefully, since asking users explicitly to turn off a security setting is far from ideal, even if it has no effect for users with JavaScript disabled.

redshiftzero · 2019-02-14T21:14:40Z

Chatted with @emkll a bit about this and he made two other points that I think we should include:

Can we indicate in the text that this disabling of the XSS filter is a temporary situation, and not the New Way that SecureDrop works?
Can we say: "(Required for submission of files larger than 50MB, or if you have trouble uploading)" instead of "(Required)"? If someone has a small file, chances are they will be fine, and we can have them just upload and then disable the XSS filter if there is an issue.

Thoughts?

eloquence · 2019-02-15T17:40:14Z

Can we indicate in the text that this disabling of the XSS filter is a temporary situation, and not the New Way that SecureDrop works?

That makes sense to me!

Can we say: "(Required for submission of files larger than 50MB, or if you have trouble uploading)" instead of "(Required)"? If someone has a small file, chances are they will be fine, and we can have them just upload and then disable the XSS filter if there is an issue.

How about "required for uploads to work reliably"? My understanding is that this is technically the most accurate description. The "50MB" really just traces back to our early investigation and we don't have sufficient empirical grounding for using it in messaging, IMO.

Users will see an "Internal Server Error" if their uploads fail, without any obvious explanation of how to mitigate. Some uploads will work, some will fail. I think we should encourage all users to uncheck the box so they don't hit that error wall, and then re-check it when they're done.

rmol · 2019-02-15T18:09:48Z

How about "required for uploads to work reliably"? My understanding is that this is technically the most accurate description. The "50MB" really just traces back to our early investigation and we don't have sufficient empirical grounding for using it in messaging, IMO.

I like that. Could we also reassure the source that this isn't going to endanger them, e.g. "This is required for uploads to work reliably, and is safe with our recommended Tor Browser settings." or is that too verbose here, and better left for the how-to page?

eloquence · 2019-02-15T19:36:21Z

I've taken the text from Nina's designs, edited it a bit, and put it in a GDoc for easier collaboration:

https://docs.google.com/document/d/1HNrRaOrRJfUX61DDNRpQcc9UKs6y80Y9DWUf3PxypU4/edit#

Please make edits in suggest mode. I've tried to keep the text as short as possible, to minimize the workload for translators, and to keep things as simple as possible from the user's point of view.

eloquence · 2019-02-15T20:30:58Z

@ninavizz made a good case for keeping the language in the 1,2,3 list as simple as possible to ensure users actually follow it (see comment history in GDoc). The current draft has a minimal "ask" on the upload screen. The "show me how" page offers an explanation why this is needed and what the security implications are, a link to the relevant issue for people who are interested, and clear instructions for what to do.

(I recommend that we link to the NoScript issue since it provides the most relevant context.)

ninavizz · 2019-02-15T21:21:47Z

Images for 411 page. Will get SVG dialog "!" image up (an optional thing for this PR, I know) for the confirmation page, once home from the vet in 2hrs: https://drive.google.com/open?id=16DUGdVwEZvQhOkPbIYFMJ8eWpcJHP6Q5

eloquence · 2019-02-16T01:47:55Z

Just making a note here that we should test a bit more (or look at the NoScript code) whether:

it makes any difference whether or not the SecureDrop upload page is reloaded or not after the setting is unchecked.
it makes any difference whether or not the NoScript Settings browser tab is closed or not after the setting is unchecked.

Preliminary testing indicates "no difference" for 1) (i.e. it is fine to just upload immediately after unchecking the setting). I've not tested 2) yet. I'm guessing the setting takes effect immediately, but we should verify that.

There is currently an unpredictable problem with uploads, due to a Firefox bug. For now, the only workarounds are implementing AJAX uploads or disabling NoScript's cross-site request sanitization. We're recommending the latter, since it should be safe with our recommended Tor Browser settings, and doesn't require enabling JavaScript to use SecureDrop. This change adds: - instructions in the source UI for disabling NoScript cross-site request sanitization before uploading - a note about the bug in the source guide - updated screenshots of the source submission page Fixes: freedomofpress#4078

There is currently an unpredictable problem with uploads, due to a Firefox bug. For now, the only workarounds are implementing AJAX uploads or disabling NoScript's cross-site request sanitization. We're recommending the latter, since it should be safe with our recommended Tor Browser settings, and doesn't require enabling JavaScript to use SecureDrop. This change adds: - instructions in the source UI for disabling NoScript cross-site request sanitization before uploading - a note about the bug in the source guide - updated screenshots of the source submission page Fixes: #4078 (cherry picked from commit 4d7851a)

There is currently an unpredictable problem with uploads, due to a Firefox bug. For now, the only workarounds are implementing AJAX uploads or disabling NoScript's cross-site request sanitization. We're recommending the latter, since it should be safe with our recommended Tor Browser settings, and doesn't require enabling JavaScript to use SecureDrop. This change adds: - instructions in the source UI for disabling NoScript cross-site request sanitization before uploading - a note about the bug in the source guide - updated screenshots of the source submission page Fixes: #4078

redshiftzero added bug able to reproduce priority/high labels Jan 28, 2019

eloquence added this to the 0.12.0 milestone Jan 29, 2019

This was referenced Feb 12, 2019

Receive mode seems to stall at 99% with large files onionshare/onionshare#899

Closed

An obscure bug in NoScript is causing a bug in OnionShare hackademix/noscript#64

Closed

ninavizz mentioned this issue Feb 13, 2019

File Upload Messaging re: NoScript freedomofpress/securedrop-ux#44

Closed

eloquence changed the title ~~Large file uploads (> 70MB) fail on the Source Interface~~ Some Source Interface uploads fail with Internal Server Error Feb 13, 2019

rmol mentioned this issue Feb 18, 2019

Add instructions for working around NoScript XSS upload problem #4153

Merged

2 tasks

redshiftzero closed this as completed in #4153 Feb 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some Source Interface uploads fail with Internal Server Error #4078

Some Source Interface uploads fail with Internal Server Error #4078

zenmonkeykstop commented Jan 28, 2019

redshiftzero commented Jan 28, 2019

redshiftzero commented Jan 28, 2019

zenmonkeykstop commented Jan 29, 2019

zenmonkeykstop commented Jan 29, 2019

kushaldas commented Jan 29, 2019

kushaldas commented Jan 29, 2019

kushaldas commented Jan 29, 2019

zenmonkeykstop commented Jan 29, 2019

zenmonkeykstop commented Jan 29, 2019 •

edited

Loading

zenmonkeykstop commented Jan 30, 2019

kushaldas commented Jan 30, 2019

kushaldas commented Jan 30, 2019

redshiftzero commented Feb 6, 2019

redshiftzero commented Feb 12, 2019

redshiftzero commented Feb 13, 2019

redshiftzero commented Feb 13, 2019 •

edited

Loading

eloquence commented Feb 13, 2019 •

edited

Loading

micahflee commented Feb 13, 2019

ninavizz commented Feb 13, 2019

eloquence commented Feb 13, 2019 •

edited

Loading

ninavizz commented Feb 14, 2019

eloquence commented Feb 14, 2019 •

edited

Loading

redshiftzero commented Feb 14, 2019

eloquence commented Feb 15, 2019

rmol commented Feb 15, 2019

eloquence commented Feb 15, 2019

eloquence commented Feb 15, 2019

ninavizz commented Feb 15, 2019

eloquence commented Feb 16, 2019 •

edited

Loading

Some Source Interface uploads fail with Internal Server Error #4078

Some Source Interface uploads fail with Internal Server Error #4078

Comments

zenmonkeykstop commented Jan 28, 2019

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

redshiftzero commented Jan 28, 2019

redshiftzero commented Jan 28, 2019

zenmonkeykstop commented Jan 29, 2019

zenmonkeykstop commented Jan 29, 2019

kushaldas commented Jan 29, 2019

kushaldas commented Jan 29, 2019

kushaldas commented Jan 29, 2019

zenmonkeykstop commented Jan 29, 2019

zenmonkeykstop commented Jan 29, 2019 • edited Loading

zenmonkeykstop commented Jan 30, 2019

kushaldas commented Jan 30, 2019

kushaldas commented Jan 30, 2019

redshiftzero commented Feb 6, 2019

redshiftzero commented Feb 12, 2019

redshiftzero commented Feb 13, 2019

redshiftzero commented Feb 13, 2019 • edited Loading

eloquence commented Feb 13, 2019 • edited Loading

micahflee commented Feb 13, 2019

ninavizz commented Feb 13, 2019

eloquence commented Feb 13, 2019 • edited Loading

ninavizz commented Feb 14, 2019

eloquence commented Feb 14, 2019 • edited Loading

redshiftzero commented Feb 14, 2019

eloquence commented Feb 15, 2019

rmol commented Feb 15, 2019

eloquence commented Feb 15, 2019

eloquence commented Feb 15, 2019

ninavizz commented Feb 15, 2019

eloquence commented Feb 16, 2019 • edited Loading

zenmonkeykstop commented Jan 29, 2019 •

edited

Loading

redshiftzero commented Feb 13, 2019 •

edited

Loading

eloquence commented Feb 13, 2019 •

edited

Loading

eloquence commented Feb 13, 2019 •

edited

Loading

eloquence commented Feb 14, 2019 •

edited

Loading

eloquence commented Feb 16, 2019 •

edited

Loading