server: check open file ulimit for local backend #343

glorv · 2020-07-06T03:42:52Z

What problem does this PR solve?

Check process max open file limit for local backend

What is changed and how it works?

In local backend mode, pebble need to write a lot of L0 sst files. So If data source is fairly big, the max_open_file limit may not be enough. So we check the RLIMIT_NOFILE base on source file size, and try to update it if not big enough.

close #342

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Related changes

kennytm · 2020-07-06T03:46:23Z

cmd/tidb-lightning/main.go

@@ -96,3 +105,47 @@ func main() {
 		os.Exit(1)
 	}
 }
+
+func checkSystemRequirement(cfg *config.GlobalConfig) error {


(1) This should be checked using the task config before every task starts
(2) This should be ignorable using --check-requirements=false

kennytm · 2020-07-06T03:55:13Z

cmd/tidb-lightning/main.go

+		rLimit.Cur = estimateMaxFiles
+		err = syscall.Setrlimit(syscall.RLIMIT_NOFILE, &rLimit)
+		if err != nil {
+			return errors.Wrap(err, fmt.Sprintf("the maximum number of open file descriptors is too small, got %d, expect greater or equal to %d", prevLimit, estimateMaxFiles))


Suggested change

return errors.Wrap(err, fmt.Sprintf("the maximum number of open file descriptors is too small, got %d, expect greater or equal to %d", prevLimit, estimateMaxFiles))

return errors.Annotatef(err, "the maximum number of open file descriptors is too small, got %d, expect greater or equal to %d", prevLimit, estimateMaxFiles)

kennytm · 2020-07-06T03:58:50Z

cmd/tidb-lightning/main.go

+
+		// we estimate each sst as 50MB, this is a fairly coarse predict, the actual fd needed is
+		// depend on chunk&import concurrency and each data&index key-value size
+		estimateMaxFiles := totalSize / (50 * 1024 * 1024)


how this "50 MiB" is determined?

also, for a 10 TB data set this means Pebble will open up to 200000 files simultaneously. This sounds more like a bug to me.

In the tpcc benchmaerk, with a memtable size of 512MB, the generated L0 sst file sizes ranges from 40-450MB, so I'm not sure how to estimate the avg size. And In the common case, L0 sst files will be compressed to L1-Ln level ssts, thus total files should be very large, but we prevent the compression phase in favor of better performance thus the generated files numbers will be quite big. Because index engine can't be split, so if a table is huge, and there are many indices, then maybe will generate handard of thousand of L0 sst files.

can we use the largest N total size of each table (N = index concurrency) instead of the entire data source?

from the pebble source code it seems 50 MiB = 25 × 2 MiB, so maybe we could increase the TargetFileSize? 🤔 In TiKV Importer this is set to 1 GiB.

Seems the L0 sst file size is depend on the memory table size and the conf MaxOpenFiles is used for compaction. After explict set MaxOpenFiles option to 1GB, the table order_line index engine still generates a lots of sst files with size of 54MB. And almost each engine generates sst files with different size. Maybe we should use MemoryTableSize as each sst file's size, as memory table accounts for the raw size of the kv-pairs. The sst file size depends on the flush compaction ratio
So we can estimate by: {top N table size} / {MemoryTableSize} as the needed fds

kennytm

rest LGTM

kennytm · 2020-07-07T05:21:00Z

lightning/lightning_test.go

+		{
+			Tables: []*mydump.MDTableMeta{
+				{
+					TotalSize: 150_800 << 20, //150_288MB


not sure what these comments mean but the numbers looks way off.

3pointer

LGTM

…to check-ulimit

glorv · 2020-07-14T06:17:03Z

/run-all-tests

glorv · 2020-07-14T06:23:18Z

/run-all-tests

kennytm · 2020-07-14T07:26:20Z

Blocked by #348.

CLAassistant · 2020-07-14T07:50:05Z

All committers have signed the CLA.

check open file ulimit for local backend

485c884

glorv requested review from kennytm and 3pointer July 6, 2020 03:42

kennytm reviewed Jul 6, 2020

View reviewed changes

glorv added 2 commits July 7, 2020 11:46

fix comment and add a test

b7c0ec9

fix tests

14ab056

kennytm reviewed Jul 7, 2020

View reviewed changes

remove useless comments

f3b7696

kennytm added the status/LGT1 One reviewer already commented LGTM (LGTM1) label Jul 8, 2020

3pointer reviewed Jul 14, 2020

View reviewed changes

Merge branch 'master' of https://github.com/pingcap/tidb-lightning in…

645d6dd

…to check-ulimit

glorv force-pushed the check-ulimit branch from 887b9cb to 645d6dd Compare July 14, 2020 03:14

fix

6fcf868

3pointer approved these changes Jul 14, 2020

View reviewed changes

overvenus approved these changes Jul 14, 2020

View reviewed changes

Merge branch 'master' into check-ulimit

94a174e

kennytm merged commit 73e48bb into master Jul 14, 2020

kennytm deleted the check-ulimit branch July 14, 2020 08:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: check open file ulimit for local backend #343

server: check open file ulimit for local backend #343

glorv commented Jul 6, 2020 •

edited

Loading

kennytm Jul 6, 2020

kennytm Jul 6, 2020

kennytm Jul 6, 2020 •

edited

Loading

glorv Jul 6, 2020 •

edited

Loading

kennytm Jul 6, 2020

glorv Jul 6, 2020 •

edited

Loading

kennytm Jul 6, 2020

kennytm left a comment

kennytm Jul 7, 2020

3pointer left a comment

glorv commented Jul 14, 2020

glorv commented Jul 14, 2020

kennytm commented Jul 14, 2020

CLAassistant commented Jul 14, 2020 •

edited

Loading

	return errors.Wrap(err, fmt.Sprintf("the maximum number of open file descriptors is too small, got %d, expect greater or equal to %d", prevLimit, estimateMaxFiles))
	return errors.Annotatef(err, "the maximum number of open file descriptors is too small, got %d, expect greater or equal to %d", prevLimit, estimateMaxFiles)

server: check open file ulimit for local backend #343

server: check open file ulimit for local backend #343

Conversation

glorv commented Jul 6, 2020 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

kennytm Jul 6, 2020

Choose a reason for hiding this comment

kennytm Jul 6, 2020

Choose a reason for hiding this comment

kennytm Jul 6, 2020 • edited Loading

Choose a reason for hiding this comment

glorv Jul 6, 2020 • edited Loading

Choose a reason for hiding this comment

kennytm Jul 6, 2020

Choose a reason for hiding this comment

glorv Jul 6, 2020 • edited Loading

Choose a reason for hiding this comment

kennytm Jul 6, 2020

Choose a reason for hiding this comment

kennytm left a comment

Choose a reason for hiding this comment

kennytm Jul 7, 2020

Choose a reason for hiding this comment

3pointer left a comment

Choose a reason for hiding this comment

glorv commented Jul 14, 2020

glorv commented Jul 14, 2020

kennytm commented Jul 14, 2020

CLAassistant commented Jul 14, 2020 • edited Loading

glorv commented Jul 6, 2020 •

edited

Loading

kennytm Jul 6, 2020 •

edited

Loading

glorv Jul 6, 2020 •

edited

Loading

glorv Jul 6, 2020 •

edited

Loading

CLAassistant commented Jul 14, 2020 •

edited

Loading