Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support deploy TiFlash on multi-disks with "storage" configurations since v4.0.9 #931

Merged
merged 10 commits into from
Nov 25, 2020

Conversation

JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Nov 24, 2020

What problem does this PR solve?

enhancement for TiFlash multi-disks deployment
TiFlash has added some new configurations (storage.*) to improve performance under multi-disk deployment, since v4.0.9.

This PR

  • supports new configurations (storage.*)
  • validates whether the path configurations are valid for deploying/edit-config/scale-out
  • keeps backward compatibility for deploying cluster with version less than v4.0.9

What is changed and how it works?

  • Rewrite TiFlashSpec DataDir in unmarshal with overridden dirs
  • When users want to deploy/scal-out/edit-config, Specification.Validate will check whether the configuration in TiFlash spec is valid or not
  • Mark TiFlashSpec DataDir is expandable so that allows users to expand the data directories of TiFlash. (Shrinking directories is not allowed)
  • Refine the generated configuration file for version >= v4.0.9, and also keep backward compatibility for version < v4.0.9

Check List

Tests

Code changes

  • Has exported function/method change
  • Has persistent data change

Side effects

  • Possible performance regression
  • Increased code complexity

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation (TODO)

Release notes:

Support deploy TiFlash node with `storage` configurations

@CLAassistant
Copy link

CLAassistant commented Nov 24, 2020

CLA assistant check
All committers have signed the CLA.

@JaySon-Huang JaySon-Huang changed the title [WIP] Support deploy TiFlash on multi-disks with "storage" configurations since v4.0.9 Support deploy TiFlash on multi-disks with "storage" configurations since v4.0.9 Nov 24, 2020
@codecov-io
Copy link

codecov-io commented Nov 24, 2020

Codecov Report

Merging #931 (4c67275) into master (c989f9d) will decrease coverage by 29.11%.
The diff coverage is 2.45%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #931       +/-   ##
===========================================
- Coverage   55.48%   26.37%   -29.12%     
===========================================
  Files         263      204       -59     
  Lines       19250    15287     -3963     
===========================================
- Hits        10681     4032     -6649     
- Misses       6864    10504     +3640     
+ Partials     1705      751      -954     
Flag Coverage Δ
cluster 26.37% <2.45%> (-17.23%) ⬇️
dm ?
integrate 26.37% <2.45%> (-23.75%) ⬇️
playground ?
tiup ?
unittest ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/cluster/spec/spec.go 66.05% <0.00%> (-22.63%) ⬇️
pkg/cluster/spec/tiflash.go 0.87% <0.00%> (-66.08%) ⬇️
pkg/utils/diff.go 43.03% <9.52%> (-46.80%) ⬇️
pkg/cluster/spec/validate.go 54.78% <33.33%> (-37.95%) ⬇️
pkg/utils/semver.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/repository/store/store.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/cluster/template/config/config.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/cluster/template/scripts/scripts.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/repository/utils/hash.go 0.00% <0.00%> (-81.82%) ⬇️
pkg/meta/err.go 0.00% <0.00%> (-81.25%) ⬇️
... and 187 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c989f9d...4c67275. Read the comment docs.

Copy link
Contributor

@AstroProfundis AstroProfundis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea for TiUP Cluster to be aware of what config of a server node is, it's designed to be an interface by intend.

Maybe we can reuse current specification of data_dir and use the first item in the list as main.dir, so that the specification of other components could remain unchanged.

pkg/cluster/spec/spec_test.go Outdated Show resolved Hide resolved
@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Nov 24, 2020

Maybe we can reuse current specification of data_dir and use the first item in the list as main.dir, so that the specification of other components could remain unchanged.

@AstroProfundis The current specification of data_dir can not exactly represent the "storage.*" configurations when users
want to deploy on a multi-disk node.

data_dir: "/data0/tiflash,/data1/tiflash,/data2/tiflash"
config:
  path_realtime_mode = false

is equal to:

config:
  storage.main.dir: ["/data0/tiflash", "/data1/tiflash", "/data2/tiflash"]
  storage.latest.dir: ["/data0/tiflash"]

But if users want to fully make use of their disks IO after v4.0.9, they need to manually change their configs like this

config:
  storage.main.dir: ["/data0/tiflash", "/data1/tiflash", "/data2/tiflash"]
  storage.latest.dir: ["/data0/tiflash", "/data1/tiflash", "/data2/tiflash"]
#OR
#config:
#  storage.main.dir: ["/data0/tiflash", "/data1/tiflash", "/data2/tiflash"]

@JaySon-Huang
Copy link
Contributor Author

These configurations are related to the directories to store data. We need to detect conflict directories when deploying, remove them after TiFlash gets destroyed. And validate them when they are changed.

@AstroProfundis
Copy link
Contributor

OK that sounds reasonable to me

@JaySon-Huang
Copy link
Contributor Author

After a cluster destroyed, the directories defined in "storage.*" didn't get destroyed.
It is because that some methods (like Specification::CountDir in validate.go) use reflection and get the wrong directories from TiFlashSpec::DataDir but not TiFlashSpec::GetDataDir.

Do you have any idea to solve this problem? @AstroProfundis @lucklove

@lucklove
Copy link
Member

After a cluster destroyed, the directories defined in "storage.*" didn't get destroyed.
It is because that some methods (like Specification::CountDir in validate.go) use reflection and get the wrong directories from TiFlashSpec::DataDir but not TiFlashSpec::GetDataDir.

Do you have any idea to solve this problem? @AstroProfundis @lucklove

IMO, we shouldn't add the function GetDataDir, but reset the DataDir filed to the correct one in https://github.com/pingcap/tiup/blob/master/pkg/cluster/spec/spec.go#L279. Because there are many places use reflect to deal with DeployDir, DataDir and LogDir

Add validator about tiflash storage configuration

Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Nov 25, 2020

Have added a document about manual test cases and result in the PR description.
PTAL again @lucklove @AstroProfundis

Copy link
Member

@lucklove lucklove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants