Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MultipartReader for big files #4865

Merged
merged 16 commits into from
Apr 29, 2020
Merged

Conversation

marcosc90
Copy link
Contributor

@marcosc90 marcosc90 commented Apr 23, 2020

Fixes #4864

MultipartReader is truncating big files. This happens because instead of calling copyN with n it's using maxValueBytes.

Aside from that maxMemory was fixed to 1MB instead of 10MB which is incorrect according to the comments.

@marcosc90
Copy link
Contributor Author

marcosc90 commented Apr 23, 2020

I'm working on a test that will correctly check this since the current one is working with a small file.

You can check the correct behaviour by using https://raw.githubusercontent.com/denoland/deno/e75b476ada521a892a7abe9d52b0af120efc6fd1/std/mime/multipart.ts instead of https://deno.land/std/mime/multipart.ts"

import { serve } from "https://deno.land/std/http/server.ts";
import { MultipartReader } from "https://raw.githubusercontent.com/denoland/deno/e75b476ada521a892a7abe9d52b0af120efc6fd1/std/mime/multipart.ts";

function getHeaderValueParams(value) {
  const params = new Map();
  // Forced to do so for some Map constructor param mismatch
  value
    .split(";")
    .slice(1)
    .map((s) => s.trim().split("="))
    .filter((arr) => arr.length > 1)
    .map(([k, v]) => [k, v.replace(/^"([^"]*)"$/, "$1")])
    .forEach(([k, v]) => params.set(k, v));
  return params;
}

const s = serve({ port: 8000 });
for await (const req of s) {
    const params = getHeaderValueParams(req.headers.get("content-type"))
    
    const reader = new MultipartReader(req.body, params.get('boundary'));

    const form = await reader.readForm(1 << 20)
    console.log(form.file('file'))
    req.respond({ body: 'deno' });
}

@marcosc90 marcosc90 changed the title Fix MultipartReader [WIP] Fix MultipartReader Apr 23, 2020
@marcosc90 marcosc90 changed the title [WIP] Fix MultipartReader Fix MultipartReader Apr 23, 2020
@@ -308,22 +308,22 @@ export class MultipartReader {
}
// file
let formFile: FormFile | undefined;
const n = await copy(buf, p);
const n = await copyN(buf, p, maxValueBytes);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While testing I noticed that when parsing big files, it was crashing with: error: Uncaught Error: The buffer cannot be grown beyond the maximum size.

This happens because copy was being used instead of copyN, and was trying to fill the buffer with the whole file.

@marcosc90 marcosc90 changed the title Fix MultipartReader Fix MultipartReader for big files Apr 23, 2020
Copy link
Member

@ry ry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @marcosc90 - thanks for this patch - it looks good at first glance. I wonder if you could add a test to std/mime/multipart_test.ts to demonstrate the bug?

@marcosc90
Copy link
Contributor Author

marcosc90 commented Apr 23, 2020

Hi @marcosc90 - thanks for this patch - it looks good at first glance. I wonder if you could add a test to std/mime/multipart_test.ts to demonstrate the bug?

Hello @ry you're welcome. Just added the test, it fails on master as expected.

Regarding the max buffer size error, don't know an efficient way to create a big file (> Buffer limit) on the tests, any guidance is appreciated. If you don't think that test is worth it on this PR, I can add it later on.

@@ -281,10 +281,10 @@ export class MultipartReader {
* null value means parsing or writing to file was failed in some reason.
* @param maxMemory maximum memory size to store file in memory. bytes. @default 1048576 (1MB)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ry go uses 10MB as default in readForm. Yet the comment in the code says 1MB, just want confirmation. I think 1MB it's a good value to keep in Buffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely follow Go's example. 10MB sounds good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Reverted back to 10MB and updated comment.

@marcosc90
Copy link
Contributor Author

marcosc90 commented Apr 26, 2020

@ry I've been working on a test to check against max buffer error, the only problem is that it takes over a minute to run.

Deno.test(async function multipartBigFile() {
  const multipartFile = path.resolve("form-data-big.dat");
  const sampleFile = await Deno.makeTempFile();
  
  const writer = await Deno.open(multipartFile, { write: true, create: true })
  const encoder = new TextEncoder();
  
  const size = 2 ** 32; // Bigger than buffer limit
  await Deno.truncate(sampleFile, size)
  const bigFile = await Deno.open(sampleFile, { read: true });

  const mw = new MultipartWriter(writer);
  await mw.writeField("deno", "land");
  await mw.writeFile("file", "sample.txt", bigFile);

  await mw.close()
  writer.close()
  bigFile.close();

  const o = await Deno.open(multipartFile);
  const mr = new MultipartReader(
    o,
    mw._boundary
  );
  // use low-memory to write "file" into temp file.
  const form = await mr.readForm(20);
  try {
    assertEquals(form.value("deno"), "land");
    const file = form.file("file");
    assert(file != null);
    assert(file.tempfile != null);
    assertEquals(file.size, size)
    assertEquals(file.type, "application/octet-stream");
    
  } finally {
    await Deno.remove(multipartFile);
    await Deno.remove(sampleFile);
    await form.removeAll();
    o.close();
  }
});

The problem is that Deno.copy takes forever, since there's no way to increase the amount of bytes read AFAIK, it always 16384, when the reader is a file. I've been trying to increase it from TS, but it seems it can only be changed in rust code, since no matter how big the buffer size is, it's always reading 16384

const b = new Uint8Array(DEFAULT_BUFFER_SIZE);

Let me know if you think it should be added even if it takes too long.

@npup
Copy link

npup commented Apr 29, 2020

Related to (big) files in multipartrequest, I have noticed that tempFiles are created at ./, hardcoded.

const { file, filepath } = await tempFile(".", {
    prefix: "multipart-",
    postfix: ext,
});

Wouldn't it be worth it to add a parameter for this in the readForm method (and default to "." for bw compat if wanted)? Possibly even an option for the prefix.

@marcosc90
Copy link
Contributor Author

marcosc90 commented Apr 29, 2020

Wouldn't it be worth it to add a parameter for this in the readForm method (and default to "." for bw compat if wanted)? Possibly even an option for the prefix.

@npup I think it's worth adding an options argument, but definitely not on this PR. If you open an issue I can work on it after this get merged :)

@marcosc90
Copy link
Contributor Author

With #4978 & #4979 merged now all tests are passing, and was able to simplify the code a bit.

From:

deno/std/mime/multipart.ts

Lines 322 to 325 in 509dce0

// write buffer to file
let size = await copyN(file, buf, n);
// Write the rest of the file
size += await copy(file, new MultiReader(buf, p));

to

const size = await copy(new MultiReader(buf, p), file);

Also added a throw if we get an error, otherwise we would be resolving for an incomplete read:

deno/std/mime/multipart.ts

Lines 331 to 334 in 669311a

} catch (e) {
await remove(filepath);
throw e;
}

Copy link
Member

@ry ry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - sorry for the delay! Thank you

@ry ry merged commit 78e0ae6 into denoland:master Apr 29, 2020
@marcosc90 marcosc90 deleted the fix-multipart-reader branch April 29, 2020 21:50
SASUKE40 pushed a commit to SASUKE40/deno that referenced this pull request May 7, 2020
caspervonb pushed a commit to caspervonb/deno_std that referenced this pull request Jan 21, 2021
caspervonb pushed a commit to caspervonb/deno_std that referenced this pull request Jan 24, 2021
caspervonb pushed a commit to caspervonb/deno_std that referenced this pull request Jan 24, 2021
caspervonb pushed a commit to caspervonb/deno_std that referenced this pull request Jan 24, 2021
caspervonb pushed a commit to caspervonb/deno_std that referenced this pull request Jan 31, 2021
caspervonb pushed a commit to caspervonb/deno_std that referenced this pull request Jan 31, 2021
caspervonb pushed a commit to caspervonb/deno_std that referenced this pull request Jan 31, 2021
caspervonb pushed a commit to caspervonb/deno_std that referenced this pull request Jan 31, 2021
caspervonb pushed a commit to caspervonb/deno_std that referenced this pull request Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MultipartReader creates broken files
3 participants