File table engine may open too many files at once. #8857

alexey-milovidov · 2020-01-27T16:44:59Z

I almost downloaded the data from http://www.gharchive.org/
64901 files, 839 GiB in size.

But I cannot simply process this data as is:

$ clickhouse-local --query "SELECT * FROM file('*.json.gz', TSV, 'data String') LIMIT 10"
Code: 76, e.displayText() = DB::ErrnoException: Cannot open file /opt/milovidov/example_datasets/gharchive/2018-12-30-10.json.gz, errno: 24, strerror: Too many open files (version 20.2.1.1)

The text was updated successfully, but these errors were encountered:

alexey-milovidov · 2020-01-29T16:55:05Z

Now it works like a charm. Even with data in garbage format json.gz it is processed with more than 500 MB/sec on old E5-2650v2 and RAID-5 of 8 HDDs.

alexey-milovidov added the bug Confirmed user-visible misbehaviour in official release label Jan 27, 2020

alexey-milovidov mentioned this issue Jan 27, 2020

Fix StorageFile too many open files error #8861

Merged

alexey-milovidov closed this as completed in #8861 Jan 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File table engine may open too many files at once. #8857

File table engine may open too many files at once. #8857

alexey-milovidov commented Jan 27, 2020 •

edited

Loading

alexey-milovidov commented Jan 29, 2020

File table engine may open too many files at once. #8857

File table engine may open too many files at once. #8857

Comments

alexey-milovidov commented Jan 27, 2020 • edited Loading

alexey-milovidov commented Jan 29, 2020

alexey-milovidov commented Jan 27, 2020 •

edited

Loading