Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZstdOutputStream leaking thread handles #203

Closed
ub3rmind opened this issue Jan 20, 2022 · 10 comments
Closed

ZstdOutputStream leaking thread handles #203

ub3rmind opened this issue Jan 20, 2022 · 10 comments

Comments

@ub3rmind
Copy link

ub3rmind commented Jan 20, 2022

We have a long-lived service process on Windows that periodically uses zstd-jni to create compressed files. We've noticed that this process accumulates thread handles over time that are not released, even after streams are closed and Java garbage collection is triggered. The accumulation of leaked handles is more drastic with higher numbers of concurrent compression threads, but still occurs when configured for a single compression thread. Note that the number actual threads at any given time is constant, but the number of HANDLES does increase over time as repeated compression tasks occur.

This issue is present in the newest release of zstd-jni as of today, which is 1.5.1-1.

How we're measuring handle accumulation

We're using process explorer on Windows to view the accumulation of thread handles in our Java process over time. The count of handles for a given process can be viewed by right clicking the process, going to properties, and then clicking the "environment" tab within the subsequent window that opens. The individual handles for a process are enumerated in the lower pane of the main process explorer window where you select the process and then View --> Lower Pane View --> Handles. From this we can see that the accumulating handles are thread handles and not file handles or some other type.

Sample Code

Below is the simplest example we could come up with to illustrate the problem. If you run the following code against a relatively small file and examine the handles via the process explorer method described above, you should be able to see the growth. It may help to place a breakpoint at the end of the main method to keep the process running while you inspect in process explorer. Note that we are properly closing the streams via try-with-resources.

For reference, when we run this code against a 10MB input file with 16 compression threads, we see an accumulation of about 2000 handles, and this count increases as you increase the number of for loop iterations. If you remove the ZstdOutputStream, the count of handles generally remains constant.

package my.package;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

import com.github.luben.zstd.ZstdOutputStream;

public class HandleLeakTest {
    public static void main(String[] args) {
        Path inputFile = Paths.get("path/to/input/file");
        Path outputFile = Paths.get("path/to/output/file");
        try {
            for (int i = 0; i < 100; i++) {
                compress(inputFile, outputFile);
                Files.delete(outputFile);
            }
        } catch (Exception e) {
            e.printStackTrace();
            System.exit(1);
        }
    }

    private static void compress(Path inputFile, Path outputFile) throws Exception {
        try(InputStream inputStream = new BufferedInputStream(new FileInputStream(inputFile.toFile()));
            ZstdOutputStream outputStream = new ZstdOutputStream(new BufferedOutputStream(new FileOutputStream(outputFile.toFile())), 16)) {

            outputStream.setWorkers(16);
            byte[] buffer = new byte[4096];
            int n;
            while (-1 != (n = inputStream.read(buffer))) {
                outputStream.write(buffer, 0, n);
            }
        }
    }
}

@luben
Copy link
Owner

luben commented Jan 21, 2022

Hmm, interesting. Does it still leak threads if you add outputStream.close() after the loop?

My hypothesis is that the outputStream is not being closed/GCed fast enough so threads are not released.

@ub3rmind
Copy link
Author

Thanks for the response! I'm back to work today.

Yes, the issue still happens if I add an explicit call to close at the end of the try block, which is expected since try-with-resources does the same thing implicitly.

I also thought the leak may have to do with GC, but now I'm not convinced. Note that although the sample code I supplied is a short lived program, our production code runs continuously as a service process for days/weeks/months and the thread handle count is not reduced until the process is restarted. We've had customers report millions of handles held by our process after multiple days of repeated compression tasks.

I also tried triggering GC manually via jconsole and was able to see the in-use memory drop, but the open thread handle count seems to be unaffected by the cleanup. I also want to re-emphasize that we're not leaking actual threads - just thread handles, which are essentially pointers to threads that no longer exist.

@luben
Copy link
Owner

luben commented Jan 25, 2022

My bad, yes, it's not related to GC - I didn't notice it was in with block. Currently the worker threads are spawned somewhere in the native library that we use from upstream. What do you mean by 'thread handle"? Worker threads should not be visible to the JVM side as the threads are internal handled by the zstd library. On the JVM side we don't create any threads.

BTW, noticed you may be using Windows. Unfortunately I don't have access to windows to check. Can you reproduce it with the zstd binaries provided by https://github.com/facebook/zstd? For example you can use something like:

zstd -b1 -e9 -i10 -T16 BIG_FILE

to run benchmark for levels 1-9, 10 seconds each level and use 16 threads for compression.

@ub3rmind
Copy link
Author

That's a good thought - I was also thinking they might be native thread handles and not related to the zstd-jni java code. It also seems possible that the java code is not properly closing out some native objects, in which case it would be a bug in the java code.

I can try a repro with the native zstd binaries, and I can also see if the issue reproduces on Linux too. It will take a little time to get back to you.

@luben
Copy link
Owner

luben commented Jan 26, 2022

No rush. The Java code calls ZSTD_freeCStream on close and it should release all resources.

I found the Zstd threading implementation for win32 https://github.com/facebook/zstd/blob/12c045f74d922dc934c168f6e1581d72df983388/lib/common/threading.c#L23-L77 and it looks it does not close the handles (CloseHandle) but I have not worked on windows and I am not sure about the semantics and requirements.

@ub3rmind
Copy link
Author

ub3rmind commented Feb 4, 2022

I completed a couple of tests on Windows using the native zstd executable (version 1.5.2).

For the first test I generated a list of 323 files to be compressed (total size 51GB) and then compressed those files individually within a single instance of the zstd process using the following command:

zstd -T16 --filelist .\FILE_LIST.TXT --output-dir-flat .\COMPRESSED_FILES

This resulted in the zstd executable opening about 80 handles and staying constant at level for the entire duration. No evidence of thread handles leaking over time, and the count does not seem to vary with the number or size of the files.

For the second test I ran the command suggested by @luben above on a single 24GB file:

zstd -b1 -e9 -i10 -T16 BIG_FILE

This resulted in a total of 250 handles or so by the time the process finished. The number of opened handles did increase over time as it proceeded through the benchmarks, but not as dramatically as it does with the zstd-jni code sample I shared in my original post.

I am not sure why the results are slightly different for these two test scenarios, but I suppose the results from the second test show that the number of open handles can increase over time in the native implementation. This is not a critical issue for my organization at the moment so I don't intend to pursue this further right now, but figured I would share these findings.

@luben
Copy link
Owner

luben commented Feb 6, 2022

Thanks. I will share with upstream if they have any idea how to fix it.

@luben
Copy link
Owner

luben commented Jul 15, 2022

The leak was fixed upstream with facebook/zstd#3147 so next release will have it fixed.

@ub3rmind
Copy link
Author

Yes, that's great! Looks like they put out 2-3 releases per year on average.

@luben
Copy link
Owner

luben commented Jun 5, 2023

I think this should be fixed in 1.5.5. Please reopen if you still see the problem.

@luben luben closed this as completed Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants