Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PD --peer-urls flag to use listen_host variable #941

Closed
wants to merge 2 commits into from

Conversation

fln
Copy link
Contributor

@fln fln commented Nov 26, 2020

What problem does this PR solve?

This allows us to use server hostname instead of IP address. Without
this change if PD service is declared using hostname it fails to start,
reporting an error that it can not bind to given interface.

Metadata variable listen_host was introduced in #495 and is already used for
the --client-urls flag.

This should resolve #337 and partially implement #691 .

What is changed and how it works?

This commit updates run_pd.sh template to use listen_host for
--peer-urls flag instead of host.

Check List

Tests

  • No code

Side effects

  • Breaking backward compatibility

If existing installations relied on host field to be used as address for accepting peer connections this change could be treated as breaking change. Default value of listen_host is 0.0.0.0 so it should not break existing clusters.

Some might want to use different listen addresses for peer and client connections. However I think this would add unnecessary complexity and not worth supporting.

Related changes

  • Need to update the documentation

Not sure where metadata file is properly documented. So far I have used source code to understand what is expected in the metadata.yaml file.

Release notes:

Support using hostname instead of IP for PD servers in host section.

@CLAassistant
Copy link

CLAassistant commented Nov 26, 2020

CLA assistant check
All committers have signed the CLA.

@codecov-io
Copy link

codecov-io commented Nov 26, 2020

Codecov Report

Merging #941 (220429b) into master (bc0270c) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #941   +/-   ##
=======================================
  Coverage   55.66%   55.67%           
=======================================
  Files         263      263           
  Lines       19392    19396    +4     
=======================================
+ Hits        10795    10799    +4     
  Misses       6880     6880           
  Partials     1717     1717           
Flag Coverage Δ
cluster 43.38% <100.00%> (+<0.01%) ⬆️
dm 23.92% <28.57%> (-0.01%) ⬇️
integrate 49.90% <100.00%> (+<0.01%) ⬆️
playground 20.21% <ø> (ø)
tiup 16.92% <100.00%> (ø)
unittest 23.03% <28.57%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/cluster/embed/autogen_pkger.go 100.00% <100.00%> (ø)
pkg/cluster/task/tls.go 65.38% <100.00%> (+2.88%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc0270c...220429b. Read the comment docs.

@9547
Copy link
Contributor

9547 commented Nov 26, 2020

@fln Could you please add some test case, such as the case of #337

@fln fln force-pushed the master branch 2 times, most recently from 245e63c to 311eb0d Compare November 29, 2020 19:34
@fln
Copy link
Contributor Author

fln commented Nov 29, 2020

@9547 Thanks for feedback and suggestions.

Updated the tests as requested. First attempt was to create a separate test just for host name support. It turned out the integration tests are quite convoluted and performs some tricks to actually work with IP addresses which are no longer necessary if hostnames are used.

In the end all IP addresses in tests were replaced with docker hostnames n1 - n5. One could argue that it is good idea to use hostnames as a default since we have a support for it already. It will also prevent us from breaking hostname support in the future. Furthermore it is more likely that code which works with hostnames will work with IP addresses without modifications, which is not true in the opposite direction.

Key changes that were made since initial commit:

  • Discovered that run_pd_scale.sh.tpl is used for scaling it needed same template change that was done in the initial commit. This made refactoring the tests worthwhile and saved us tripping on it at the later stage.
  • Made a small change to the automated TLS certificate generator, to generate correct certificates if hostname are used - IP and hostname is goes to different x509 sections.
  • Made all test topology files stable. Removed sed invocations that used to create actual topology files from the topology templates. Running tests no longer creates temporary topology files (it was kind of annoying as docker running with root created temp files owned by root). In the process file full_scale_in_tidb.yaml had to be split into two versions, apparently this file was being mutated in the middle of the tests.

This commit updates run_pd.sh template to use `listen_host` for
--peer-urls flag instead of `host`.

This allows us to use server hostname instead of IP addresse. Without
this change if PD service is declared using hostname it fails to start,
reporting an error that it can not bind to given interface.

Metadata variable `listen_host` was introduced in pingcap#495 and is already used for
the --client-urls flag.

This should resolve pingcap#337 and partially implement pingcap#691.
This commit updates TLS certificate generator to detect if IP address or
hostname was used as host value.

Another update is for tests to start using docker host names n1-n5
instead of templating topology with IP addresses.

File full_scale_in_tidb.yaml was forked into full_scale_in_tidb_2nd.yaml
to avoid rewriting existing file for second tidb scale-out attempt.
@9547
Copy link
Contributor

9547 commented Nov 30, 2020

Thanks, but could you please split this pr into two or three

  • this pr: fix the pd url
  • refact tests: use hostname instead of ip for common tests, but keep some ip cases
  • fix the x509 issue

thanks again.

@AstroProfundis AstroProfundis added good-first-issue type/enhancement Categorizes issue or PR as related to an enhancement. labels Nov 30, 2020
@fln
Copy link
Contributor Author

fln commented Nov 30, 2020

PRs are split. Just noticed that I have overlooked the request to have mixed IP and host name tests.

Mixing host names and IP addresses in tests is not that trivial.

If we mix host names and IP address in a single topology - tiup-cluster will treat them as different hosts and complain about ports being already in use by external services.

Leaving some tests with host names while other test with IP addresses also causes issue because full.yaml is shared between scale and cmd tests.

Is there any chance full test suite switch to host names would be accepted 😄 ?

Alternatively would a simple test of just building a minimal cluster and tearing it down using only IP addresses suffice? My opinion is that if we have code that works with host names it should with high confidence work with IP addresses as well without explicit testing.

@ti-chi-bot
Copy link
Member

@fln: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fln
Copy link
Contributor Author

fln commented Nov 30, 2020

@fln fln closed this Nov 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
first-time-contributor type/enhancement Categorizes issue or PR as related to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

failed to start pd with the host name in topo.yaml
7 participants