Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New config parser, HCL support, multiple bind addrs #3480

Merged
merged 70 commits into from
Sep 25, 2017
Merged

Conversation

magiconair
Copy link
Contributor

@magiconair magiconair commented Sep 18, 2017

This patch implements a new config parser for the consul agent which
makes the following changes to the previous implementation:

  • add HCL support
  • all configuration fragments in tests and for default config are
    expressed as HCL fragments
  • HCL fragments can be provided on the command line so that they
    can eventually replace the command line flags.
  • HCL/JSON fragments are parsed into a temporary Config structure
    which can be merged using reflection (all values are pointers).
    The existing merge logic of overwrite for values and append
    for slices has been preserved.
  • A single builder process generates a typed runtime configuration
    for the agent.

The new implementation is more strict and fails in the builder process if no
valid runtime configuration can be generated. Therefore, additional
validations in other parts of the code should be removed.

The builder also pre-computes all required network addresses so that no
address/port magic should be required where the configuration is used and
should therefore be removed.

As a side effect the code also supports multiple bind addresses for HTTP,
HTTPS and DNS via go-sockaddr and static ip addresses.

Fixes #895
Fixes #2217
Fixes #2825

@magiconair
Copy link
Contributor Author

There are still a handful of failing tests which I'll address tomorrow.

@magiconair
Copy link
Contributor Author

Still need to update all the docs ...

@magiconair magiconair changed the title New config parser New config parser, HCL support, multiple bind addrs Sep 20, 2017
@shilov
Copy link
Contributor

shilov commented Sep 20, 2017

👍 big thanks for adding HCL support + enabling multiple bind addresses

cfg.ACLEnforceVersion8 = Bool(false)
a := NewTestAgent(t.Name(), cfg)
a := NewTestAgent(t.Name(), `
acl_enforce_version_8 = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually set it to true when it was supposed to be false. I'll fix this and make sure this test positively ensures that things are working.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is set to a wrong key acl_enforce_version8 (not the lacking final underscore) which may contribute to the test failing.

I am working on a patch which detects invalid field names again and restores the previous functionality. However, since encoding/json and hashicorp/hcl decode embedded structs differently (map[string]interface{} vs []map[string]interface{}). Therefore, just using mapstructure as before to detect unused fields won't work. This is a bug in HCL itself for which there is no easy fix AFAICT. Also, using the embedded JSON parser in hcl will also return []map[string]interface{} which will trip up mapstructure. hashicorp/hcl#209

Unless there is an option in mapstructure to do the mapping I think there are several viable approaches:

  1. Patching all []map to map and then use mapstructure for the mapping and invalid field detection. However, watches has []map[string]interface{} so there would need to be an exception for that field and others in the future.

  2. Build a custom invalid field detector which given a map[string]interface{} and struct type+tag name can detect which fields will not be mappable.

  3. Enhance the hashicorp/hcl decoder with an invalid field detector and do the same thing with encoding/json by first forking it and then adding that to the custom code until proposal: encoding/json: reject unknown fields in Decoder golang/go#15314 is added to the stdlib.

  4. Add that capability to mapstructure.

For the 1.0 milestone, the first option seems to be the one with the least amount of work so I'll try that first.

@@ -167,13 +101,20 @@ func TestAgent_TokenStore(t *testing.T) {
}

func TestAgent_CheckPerformanceSettings(t *testing.T) {
t.Skip("this test fails because of the change in agent.consulConfig(). We need to decide what to do")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO - I'll take a look at this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've followed your suggestion and dropped the PerformanceRaftMultiplier parameter and instead compute the right values during the build step.

agent/agent.go Outdated
defer a.wgServers.Done()

err := s.ListenAndServe(p.Net, p.Addr, func() { notif <- p })
// todo(fs): does this also work for unix sockets?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add unix socket unit tests before we finish the beta.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying DNS lib does not support Unix sockets. We can add this at a later stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - yeah if DNS never could have supported it previously it's not important to get that in now (also seems like a fairly rare / low priority use case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless you want to test DNS on the socket without exposing it on a network interface. But lets add that later.

return tc.IncomingTLSConfig()
}

func (c *RuntimeConfig) Sanitized() RuntimeConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a test for this - It would be good to fuzz the structure, sanitize, and make sure anything with token in there is redacted. Maybe we should just use reflect here and a small list of other places to redact and make the logic generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test and implemented it reflectively. The expected test result is static so the test should fail once a new field with token, key or secret in the name is added. gofuzz failed on me with some error but I don't think it is necessary with this approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used different values for the WAN retry join case in a later commit.

Head: []Source{DefaultSource()},
}

if b.stringVal(b.Flags.DeprecatedDatacenter) != "" && b.stringVal(b.Flags.Config.Datacenter) == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need a special case here? It's also handled in Build().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a left over from moving code around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit more complex than that but why do we want to carry on deprecated flags and fields to 1.0 in the first place? This looks like a good time to finally remove them.

CleanupDeadServers *bool `json:"cleanup_dead_servers,omitempty" hcl:"cleanup_dead_servers"`
DisableUpgradeMigration *bool `json:"disable_upgrade_migration,omitempty" hcl:"disable_upgrade_migration"`
LastContactThreshold *string `json:"last_contact_threshold,omitempty" hcl:"last_contact_threshold"`
// todo(fs): do we need uint64 here? If yes, then I need to write a special parser b/c of JSON limit of 2^53-1 for ints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always be much lower than the max. We could even accept an int here and cast it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought so. will fix

rt.DataDir = dataDir
},
},
{ // todo(fs): shouldn't this be '-encrypt-key'?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this looks correct.

rt.DataDir = dataDir
},
},
{ // todo(fs): shouldn't this be '-node-name'?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct for the CLI flags (it's node_name in JSON).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the fact that they are different is one of the issues.

rt.DataDir = dataDir
},
},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplicate of the previous test case - did you have something else in mind here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, copy/paste left over when creating additional test cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed it.

},
},
{
desc: "client addr, addresses and ports < 0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paused here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got to this spot - I'll continue from here...

@slackpad
Copy link
Contributor

slackpad commented Sep 24, 2017 via email

@magiconair
Copy link
Contributor Author

I've pushed a change which removes all deprecated fields and flags and keeps a list in agent/config/doc.go The last remaining thing is recursor since we have both recursor and recursors and usually merge the single recursor into the list.

@magiconair magiconair added this to the 1.0 milestone Sep 25, 2017
Copy link
Contributor

@slackpad slackpad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it through the whole thing and I think we are super close. If we can get a check in there to catch unknown fields (the simple plan with an exception for watches sounded reasonable) before the beta I think that would a good cross-check for the all the unit tests and to help users.

patch: func(rt *RuntimeConfig) {
rt.Bootstrap = true
rt.ServerMode = true
rt.LeaveOnTerm = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In client mode the LeaveOnTerm and SkipLeaveOnInt values should be the opposite of server mode, so true and false, respectively. We should add a case that covers that behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

flags: []string{
`-data-dir=runtime_test.go`,
},
patch: func(rt *RuntimeConfig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a patch section when it's checking for an error?

hcl: []string{`dns_config = { udp_answer_limit = 0 }`},
err: "dns_config.udp_answer_limit cannot be 0. Must be positive",
},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of previous test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

"dogstatsd_addr": "0wSndumK",
"dogstatsd_tags": [ "3N81zSUB","Xtj8AnXZ" ],
"filter_default": true,
"prefix_fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we can eventually divorce the runtime config from the structure with both check and checks.

@@ -2913,7 +2858,7 @@ func TestDNS_ServiceLookup_AnswerLimits(t *testing.T) {
expectedANYQuery int
expectedANYQueryID int
}{
{"0", 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
//{"0", 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what happened here? Does this fail to validate then we can just delete it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code did not allow dns_config.udp_answer_limit to be zero which made this test fail. I guess we commented this out while getting things to work. I've restored the previous behavior.

agent/node-id Outdated
@@ -0,0 +1 @@
9e8edccb-1f67-0ef6-a36f-e5121d91d8fb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be deleted (or does a test use this)?


return nil
}
// done(fs): func ValidateSegments(conf *Config) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this get moved elsewhere? There's an enterprise version of this file that's in the enterprise fork, so we had this out here so we can swap it with build tags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had incorporated this into the normal validation. However, to support different types of validation for enterprise I've split up the tests so that we can overload them. I'll send you the ent fragments you can add.


func TestDefaultConfig(t *testing.T) {
for i := 0; i < 500; i++ {
t.Run("bla", func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bla might as well be an empty string


// GetPrivateIPv4 returns the list of private network IPv4 addresses on
// all active interfaces.
func GetPrivateIPv4() ([]*net.IPAddr, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably remove these helpers and change the default config to go-sockaddr templates that accomplishes this as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes mocking the address resolution more difficult. Let me think about a solution.

@@ -58,7 +58,8 @@ func NewIPv4Addr(ipv4Str string) (IPv4Addr, error) {
// Strip off any bogus hex-encoded netmasks that will be mis-parsed by Go. In
// particular, clients with the Barracuda VPN client will see something like:
// `192.168.3.51/00ffffff` as their IP address.
trailingHexNetmaskRe := trailingHexNetmaskRE.Copy()
// trailingHexNetmaskRe := trailingHexNetmaskRE.Copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a monkey patch? Why was this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for some reason this creates a data race which made tests fail randomly at high concurrency. I still need to understand why that is.

@slackpad
Copy link
Contributor

I'll try to sort the Raft multiplier stuff.

@slackpad
Copy link
Contributor

slackpad commented Sep 25, 2017

@magiconair I'm thinking we can get rid of PerformanceRaftMultiplier from the runtime struct and do this scaling at the end of builder.go:Build(), like we do for other derived parameters:

base.RaftConfig.HeartbeatTimeout = a.config.ConsulRaftHeartbeatTimeout * time.Duration(a.config.PerformanceRaftMultiplier)
base.RaftConfig.LeaderLeaseTimeout = a.config.ConsulRaftLeaderLeaseTimeout * time.Duration(a.config.PerformanceRaftMultiplier)
base.RaftConfig.ElectionTimeout = a.config.ConsulRaftElectionTimeout * time.Duration(a.config.PerformanceRaftMultiplier)

@slackpad
Copy link
Contributor

@magiconair Heads up I pushed my small stuff from the -slackpad branch up onto yours, so please pull that.

@magiconair
Copy link
Contributor Author

Will do and will go through the comments. I have something working for the invalid fields. Not pretty but it should do for the beta. I'll go through the rest of the comments and fix/respond

This patch implements a new config parser for the consul agent which
makes the following changes to the previous implementation:

 * add HCL support
 * all configuration fragments in tests and for default config are
   expressed as HCL fragments
 * HCL fragments can be provided on the command line so that they
   can eventually replace the command line flags.
 * HCL/JSON fragments are parsed into a temporary Config structure
   which can be merged using reflection (all values are pointers).
   The existing merge logic of overwrite for values and append
   for slices has been preserved.
 * A single builder process generates a typed runtime configuration
   for the agent.

The new implementation is more strict and fails in the builder process
if no valid runtime configuration can be generated. Therefore,
additional validations in other parts of the code should be removed.

The builder also pre-computes all required network addresses so that no
address/port magic should be required where the configuration is used
and should therefore be removed.
@magiconair
Copy link
Contributor Author

magiconair commented Sep 25, 2017

I've rebased the branch onto master and added the code for detecting invalid fields. I've immediately found two tests which were broken so this is very useful. The approach is a bit hacky but I've added a long description of why we're doing this. If I can come up with a better approach then I'll try over the next days.

The open issues are:

  • drop recursor in favor of recursors
  • look at the commented out section int TestDNS_ServiceLookup_AnswerLimits
  • document/follow up the monkey patches for hashicorp/go-sockaddr and hashicorp/hcl
  • require .json or .hcl extension for all config files. (need input/decision)

@slackpad you need to incorporate the enterprise test fragments for the segment code I've sent you.

@slackpad
Copy link
Contributor

slackpad commented Sep 25, 2017 via email

@magiconair
Copy link
Contributor Author

We also need to decide where to put the documentation that was part of agent/config.Config. We can either restore it to the Config struct, add it to the RuntimeConfig or omit it from code altogether and put it in the public documentation. Opinions?

@slackpad slackpad merged commit 1221658 into master Sep 25, 2017
@slackpad slackpad deleted the refactor-config branch September 25, 2017 18:40
@magiconair
Copy link
Contributor Author

F I N A L L Y !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants