Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory consumption when piping between streams #50762

Open
awvalenti opened this issue Nov 16, 2023 · 7 comments
Open

High memory consumption when piping between streams #50762

awvalenti opened this issue Nov 16, 2023 · 7 comments
Labels
memory Issues and PRs related to the memory management or memory footprint. stream Issues and PRs related to the stream subsystem.

Comments

@awvalenti
Copy link

awvalenti commented Nov 16, 2023

Version

Both v18.17.0 and v20.9.0

Platform

Linux (hostname-ommitted) 6.2.0-36-generic #37~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 9 15:34:04 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

import { createReadStream, createWriteStream } from 'fs';

const srcPath = 'paht/to/some-100mb-file'
const dstPath = '/tmp/test-destination-stream'

const srcStream = createReadStream(srcPath)
const dstStream = createWriteStream(dstPath)

dstStream.on('drain', () => {
  // Simulates a slow WriteStream
  setTimeout(() => {
    srcStream.resume()
  }, 20)
})

srcStream.on('end', () => {
  console.log("srcStream.on('end')");
  dstStream.end()
})

let i = 0
srcStream.on('data', (chunk) => {
  dstStream.write(chunk, (error) => {
    // No errors occur, even if destination file is removed
    if (error) console.error(error);
  })
  srcStream.pause()
  console.log('on(data)', i++)
})

let j = 0
setInterval(() => {
  console.log('Keeping process alive', j++)
}, 3000);

How often does it reproduce? Is there a required condition?

No. It always happens. By the way, I tried the ReadStream.pipe method, pipeline function and the manual piping above with readStream.on('data') and writeStream.on('drain'). All presented the same results.

What is the expected behavior? Why is that the expected behavior?

Memory consumption should remain constant, approximately the size of a buffer, or little more than that.

This is because there's no reason to store old, consumed buffers in memory once they have already been written to the output.

What do you see instead?

Memory consumption is ever-increasing until the end of the pipe operation. After it ends, no memory is released. So, there are two problems here: 1) high memory consumption during the pipe operation; 2) memory leak after the operation finishes.

Additional information

I found this while working with the aplay program on Linux to play sounds. However, I changed the example program to simply copy a file from input to output path, and it still happens.

@H4ad
Copy link
Member

H4ad commented Nov 16, 2023

I tried on 18.17.0 and 20.9.0 and I didn't see a memory leak:

I modified a little bit the script to print the memory.

import { createReadStream, createWriteStream } from 'fs';
import { memoryUsage } from 'process';

const srcPath = './large-file.txt'
const dstPath = '/tmp/test-destination-stream'

const srcStream = createReadStream(srcPath)
const dstStream = createWriteStream(dstPath)

dstStream.on('drain', () => {
  // Simulates a slow WriteStream
  setTimeout(() => {
    srcStream.resume()
  }, 20)
})

srcStream.on('end', () => {
  console.log("srcStream.on('end')");
  dstStream.end()
})

let i = 0
srcStream.on('data', (chunk) => {
  dstStream.write(chunk, (error) => {
    // No errors occur, even if destination file is removed
    if (error) console.error(error);
  })
  srcStream.pause()
  console.log('on(data)', i++)
})

console.log('Memory usage: ' + memoryUsage().heapTotal / (1024 * 1024));

let j = 0
setInterval(() => {
  console.log('Memory usage: ' + memoryUsage().heapTotal / (1024 * 1024));
  console.log('Keeping process alive', j++)
}, 3000);
node memory-leak.mjs   0,40s  user 21,66s system 19% cpu 1:54,73 total
avg shared (code):         0 KB
avg unshared (data/stack): 0 KB
total (sum):               0 KB
max memory:                62 MB
page faults from disk:     1
other page faults:         28789

I use the following command to create the file:

dd if=/dev/random of=./large-file.txt bs=4k iflag=fullblock,count_bytes count=100MB

@H4ad H4ad added stream Issues and PRs related to the stream subsystem. memory Issues and PRs related to the memory management or memory footprint. labels Nov 16, 2023
@awvalenti
Copy link
Author

Thanks for your execution.

I tried it here and I think memoryUsage() may be reporting data incorrectly. I used both Linux Mint's System Monitor and heaptrack / heaptrack_gui's for analyzing memory. Please see below.

The output for heaptrack_print *.zst was:

total runtime: 41.17s.
calls to allocation functions: 53258 (1293/s)
temporary memory allocations: 8156 (198/s)
peak heap memory consumption: 24.26M
peak RSS (including heaptrack overhead): 81.31M
total memory leaked: 13.82M
suppressed leaks: 38.48K

(some texts are in Portuguese, I couldn't change them to English)

Main parts:
vlcsnap-2023-11-21-19h44m47s469
vlcsnap-2023-11-21-19h45m10s158

Full recording:
https://github.com/nodejs/node/assets/834921/335c3128-144a-4d12-94ad-b43d88cc9533

I modified your code slightly:

import { createReadStream, createWriteStream } from 'fs';
import { memoryUsage } from 'process';

console.log('Waiting 5s to start...');

setTimeout(() => {
  console.log('Starting');

  const srcPath = '/tmp/large-file.txt'
  const dstPath = '/tmp/test-destination-stream'

  const srcStream = createReadStream(srcPath)
  const dstStream = createWriteStream(dstPath)

  dstStream.on('drain', () => {
    // Simulates a slow WriteStream
    setTimeout(() => {
      srcStream.resume()
    }, 8)
  })

  srcStream.on('end', () => {
    console.log("srcStream.on('end')");
    dstStream.end()
  })

  let i = 0
  srcStream.on('data', (chunk) => {
    dstStream.write(chunk, (error) => {
      // No errors occur, even if destination file is removed
      if (error) console.error(error);
    })
    srcStream.pause()
    if (i++ % 300 === 0) console.log('on(data)', i)
  })

  console.log('Memory usage: ' + memoryUsage().heapTotal / (1024 * 1024));

  let j = 0
  setInterval(() => {
    console.log('Memory usage: ' + memoryUsage().heapTotal / (1024 * 1024));
    console.log('Keeping process alive', j++)
  }, 3000);

}, 5000);

@H4ad
Copy link
Member

H4ad commented Nov 22, 2023

(some texts are in Portuguese, I couldn't change them to English)

I'm br, no worries hahaha


About the heaptrack, I tried executing it with an empty file and this was the result:

image

Does it mean is leaking even with empty files? Maybe, or no, I don't know how this heaptrack reports leaks, so I can't have any conclusion here.

I also tried using 18.18.0 with the file that you paste and:

image

The memory "leaked" is way less than before, so the newer version of Node is using more memory I think but I don't have idea if is leaking or not.

@H4ad
Copy link
Member

H4ad commented Nov 22, 2023

I modified a little bit the script to include the stream.close and also GC:

import { createReadStream, createWriteStream } from 'fs';
import { memoryUsage } from 'process';

console.log('Waiting 5s to start...');

setTimeout(() => {
  console.log('Starting');

  const srcPath = './large-file.txt'
  const dstPath = '/tmp/test-destination-stream'

  let srcStream = createReadStream(srcPath)
  let dstStream = createWriteStream(dstPath)

  dstStream.on('drain', () => {
    // Simulates a slow WriteStream
    setTimeout(() => {
      srcStream.resume()
    }, 8)
  })

  srcStream.on('end', () => {
    console.log("srcStream.on('end')");
    dstStream.end()
  })

  let i = 0
  srcStream.on('data', (chunk) => {
    dstStream.write(chunk, (error) => {
      // No errors occur, even if destination file is removed
      if (error) console.error(error);
    })
    srcStream.pause()
    if (i++ % 300 === 0) console.log('on(data)', i)
  })

  console.log('Memory usage: ' + memoryUsage().heapTotal / (1024 * 1024));

  let j = 0
  setInterval(() => {
    console.log('Memory usage: ' + memoryUsage().heapTotal / (1024 * 1024));
    console.log('Keeping process alive', j++)
  }, 3000);


  process.on('exit', () => {
    srcStream.close();
    dstStream.close();

    srcStream = null;
    dstStream = null;

    global.gc();
  })
}, 5000);

You can run with:

heaptrack node --expose-gc test-memory-2.mjs

image

So, I think is not leaking, is just V8 not calling the GC when is not needed, if we force GC, it's okay.

On 20.9.0 still a little bit high but at least in the newer version is ok.

image

@awvalenti awvalenti changed the title Memory leak when piping between streams High memory consumption when piping between streams Nov 28, 2023
@awvalenti
Copy link
Author

Awesome! Thanks, @H4ad. (BTW, I saw that you are Brazilian ;))

I tested here and had the same results: global.gc() frees the allocated memory.

With that in mind, I renamed the issue. It's not as critical as I thought.

Anyway, Node doesn't release the used memory after the pipe finishes. It only does that if you call the pipe method (srcStream.pipe(dstStream)) and it finishes, or if you manually call global.gc(). I think there's room for memory management improvements here.

@H4ad
Copy link
Member

H4ad commented Nov 28, 2023

Node doesn't release the used memory after the pipe finishes

It's an expected behavior, if you want the pipe to be closed after it finishes, you should use pipeline.

if you manually call global.gc()

Well, this is more about V8 than node itself, I only use the global.gc to demonstrate the memory could be cleaned, when that memory will be cleaned is not node responsibility and we don't have any control over it.

Maybe using pipeline will help you but there's no guarantee the memory will be freed when you expect it.

I think there's room for memory management improvements here.

About this, I think we can close this issue and you can share your thoughts/findings at nodejs/performance#117, we are already trying to optimize streams, so more data/insights are always helpful.

@awvalenti
Copy link
Author

Hello. Even using pipeline, when the read stream finishes and audio stops playing, memory consumption keeps going up. This is a bug to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory Issues and PRs related to the memory management or memory footprint. stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

No branches or pull requests

2 participants