New Resource: `azurerm_sql_failover_group` #3901

simongh · 2019-07-22T11:40:28Z

I've created a resource for adding SQL Failover Groups. I've added tests & documentation. Hopefully there's not much that I've missed. Still a novice when it comes to Go

(fixes #2989)

leewilson86 · 2019-07-26T12:22:19Z

I am very interested in this as a native resource. I currently have this functionality in place using ARM templates, but a proper resource would be much nicer.

It looks like you’ve covered all the key points.

The only questions I have are:

does this resource work properly on deletion, when you have databases synchronised into the secondary region? As failover groups will create databases in the secondary region for you, when using ARM templates, TF becomes unaware of the secondary databases existence, meaning an error is thrown by the ARM API on delete as the FOGs contain databases. Have you handled and tested for this eventuality?
does this work ok with databases in elastic pools? On creation, update and delete?
have you considered that in the primary region a database may be in an elastic pool, but when may be moved out as a Single SQL Database? Likewise, databases may be moved from one elastic pool into another, and a Single SQL Database may be moved into an elastic pool. Ideally the secondary region should always mirror the primary region so when failover happens infrastructure is identical. Does your work account for this?

simongh · 2019-07-26T15:15:54Z

Failover groups do create a secondary if they don't exist, but there is no way with the API to prevent this or as far as I can tell make TF create a resource for the secondary. If you have this requirement, you can create the primary & secondary, then add the failover group. You're then tracking everything. Doing this also gives you control over the sizing etc of the secondary.

Elastic pools shouldn't affect the databases. Certainly the docs make no mention of this, but then the docs are fairly light on detail.

With regard to moving or sizing the databases, failover groups have no control over this. if you need this control, it's best to create & manage the resources as specific items.

This seems to be the intent of failover groups as defined in Azure. I'm not sure that making it do more is the right thing, but then I'm not always right ;-)

katbyte

Thank you for the new resource @simongh,

I've left some comments inline that once addressed this should be good to merge

azurerm/resource_arm_sql_failover_group.go

website/docs/r/sql_failover_group.html.markdown

simongh · 2019-08-04T18:49:17Z

I think I've addressed your comments

katbyte

Thank you for making the requested changes @simongh,

It looks like you missed one, and i've left a couple additional minor ones. Overall this is looking pretty good and nearly ready to merge 🙂

katbyte · 2019-08-05T11:19:27Z

azurerm/resource_arm_sql_failover_group.go

+		for _, server := range *input {
+			info := make(map[string]interface{})
+			info["id"] = *server.ID
+			info["location"] = *server.Location


This still needs to be nil checked

azurerm/resource_arm_sql_failover_group.go

simongh · 2019-08-05T19:22:23Z

It causes an error if you set the grace minutes & the mode to manual

Failure sending request: StatusCode=400 -- Original Error: Code="FailoverGroupCreateOrUpdateRequestInvalidReadWriteEndpointFieldsForManualPolicy" Message="Grace period value should not be provided when failover policy Manual is selected for the read-write endpoint."

simongh · 2019-08-05T19:31:37Z

I've renamed the funcs as suggested. I'm not sure I follow where the nil check you mentioned needs to go. Partner Server is required & you always need at least 1.

katbyte

@simongh,

I've elaborated with suggestions inline on how to do the nil checks (and spotted another couple minor issues), but once those are fixed up this should be good to merge!

azurerm/resource_arm_sql_failover_group.go

katbyte · 2019-08-08T15:09:30Z

azurerm/resource_arm_sql_failover_group.go

+		for _, server := range *input {
+			info := make(map[string]interface{})
+			info["id"] = *server.ID
+			info["location"] = *server.Location


ie

Suggested change

info["location"] = *server.Location

if v := server.Location; v != nil {

info["location"] = *v

}

azurerm/resource_arm_sql_failover_group.go

simongh · 2019-08-09T09:36:49Z

Gotacha, thanks!

I've addressed those comments & rebased to the most recent master, fixing the merge conflict.

katbyte

Thanks for the revisions @simongh,

This LGTM now hwoever the tests are failing, i pushed a small change to try and fix them (hope you don't mind) but now i am getting:

------- Stdout: -------
=== RUN   TestAccAzureRMSqlFailoverGroup_basic
=== PAUSE TestAccAzureRMSqlFailoverGroup_basic
=== CONT  TestAccAzureRMSqlFailoverGroup_basic
--- FAIL: TestAccAzureRMSqlFailoverGroup_basic (502.57s)
    testing.go:568: Step 0 error: errors during apply:
        
        Error: Error issuing create/update request for SQL Failover Group "acctest190809180127363915" (Resource Group "acctestRG-190809180127363915", Server "acctestsqlserver190809180127363915-primary"): sql.FailoverGroupsClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidFailoverGroupRegion" Message="Secondary server specified in a Failover Group cannot reside in the same region."
        
          on /opt/teamcity-agent/temp/buildTmp/tf-test378278636/main.tf line 36:
          (source code not available)
        
        
FAIL

simongh · 2019-08-10T00:23:23Z

Don't mind at all.

Failover groups require the partner servers to be in different regions, ideally region replication partners, but any 2 region will do. The tests use testLocation() to get the location in which to create the resources. I'm not sure how to get 2 different locations for the test, other than hard-coding in 2 different regions.

katbyte · 2019-08-10T01:49:04Z

@simongh,

we can use testAltLocation() to get another location 🙂

simongh · 2019-08-10T23:33:13Z

@leewilson86 I don't think changing the tests is a hiding anything. Having tested it myself, a secondary db gets created, which isn't tracked by TF, but it's existence doesn't prevent the server from being deleted.

@katbyte could the tests be picking up on the fact that an untracked db was created on the secondary server? I've spotted & fixed another issue with the read-write policy. I've also removed the tags support, since although the API supports them, they don't appear anywhere in the UI.

katbyte · 2019-08-10T23:58:32Z

@simongh,

First, even if the tags don't show up in the portal i think we should leave them in? I think that test is failing because it is the only test that attempts to update the resource? and for some reason it is trying to create a new DB. If you create another test that calls update i imagine you will see the same error?

for insufficient items for attribute "partner_servers"; must have at least 1 i have a feeling the API isn't returning the correct number of partner_servers ? at some point.

@leewilson86, could you suggest a terraform config/test that we could write to ensure we have implemented it correctly?

simongh · 2019-08-11T12:56:36Z

It could be. I've tried doing updates, both by changing the resource & by changing the HCL, all without failure.

If you delete a server it also appears to remove the failover group.
If you delete the failover group, it does leave behind the replicated database it made, but then running apply just re-adds the failover group ok & destroy destroys as expected.

I'm at a loss as to what's causing the errors. I'll add the tags support back in though. I'm running the acceptance tests now & I'll see what I can figure out.

simongh · 2019-08-12T01:25:28Z

I found a couple of things - typos in the tests. The update now passes. The delete is still failing, even though I thought I found the cause. I'm fairly certain it's something that the test is doing, since it works when I manually run the same HCL.

Since I'm only half certain of what the test is actually doing, would you mind casting an eye over the test? I've likely missed something simple.

katbyte · 2019-08-13T19:20:01Z

👋 @simongh,

So i consulted core internally, and it sounds like that error means upon read, we got zeropartnert servers instead of the expected 1. Looking at the requests i see

{
	"location": "West Europe",
	"tags": {},
	"properties": {
		"readWriteEndpoint": {
			"failoverPolicy": "Automatic",
			"failoverWithDataLossGracePeriodMinutes": 60
		},
		"readOnlyEndpoint": {
			"failoverPolicy": "Disabled"
		},
		"replicationRole": "Primary",
		"replicationState": "CATCH_UP",
		"partnerServers": [{
			"id": "/subscriptions/x/resourceGroups/acctestRG-190813120733021836/providers/Microsoft.Sql/servers/acctestmssql190813120733021836-secondary",
			"location": "West US",
			"replicationRole": "Secondary"
		}],
		"databases": ["/subscriptions/x/resourceGroups/acctestRG-190813120733021836/providers/Microsoft.Sql/servers/acctestmssql190813120733021836-primary/databases/acctestdb190813120733021836"]
	},
	"id": "/subscriptions/x/resourceGroups/acctestRG-190813120733021836/providers/Microsoft.Sql/servers/acctestmssql190813120733021836-primary/failoverGroups/acctestsfg190813120733021836",
	"name": "acctestsfg190813120733021836",
	"type": "Microsoft.Sql/servers/failoverGroups"
}

a delete, and then a

	"location": "West Europe",
	"tags": {},
	"properties": {
		"readWriteEndpoint": {
			"failoverPolicy": "Automatic",
			"failoverWithDataLossGracePeriodMinutes": 60
		},
		"readOnlyEndpoint": {
			"failoverPolicy": "Disabled"
		},
		"replicationRole": "Primary",
		"replicationState": "CATCH_UP",
		"partnerServers": [],
		"databases": []
	},
	"id": "/subscriptions/x/resourceGroups/acctestRG-190813120733021836/providers/Microsoft.Sql/servers/acctestmssql190813120733021836-primary/failoverGroups/acctestsfg190813120733021836",
	"name": "acctestsfg190813120733021836",
	"type": "Microsoft.Sql/servers/failoverGroups"
}

So it seems like the API starts a delete and gets a:

{
	"operation": "DropFailoverGroup",
	"startTime": "2019-08-13T19:11:58.047Z"
}

and when terraform does a get on it next, it still returns the resource as its not done deleting yet. You may have to add a poll there so the delete call doesn't return until it succeeds?

simongh · 2019-08-17T13:20:18Z

You may have to add a poll there so the delete call doesn't return until it succeeds?

Do you some existing resource that does this that I can look at it & get the test working?

katbyte · 2019-08-18T14:38:07Z

@simongh,

the cosmos DB resources do this:

//the SDK now will return a `WasNotFound` response even when still deleting
	stateConf := &resource.StateChangeConf{
		Pending:    []string{"Deleting"},
		Target:     []string{"NotFound"},
		Timeout:    60 * time.Minute,
		MinTimeout: 30 * time.Second,
		Refresh: func() (interface{}, string, error) {

			resp, err2 := client.Get(ctx, resourceGroup, name)
			if err2 != nil {

				if utils.ResponseWasNotFound(resp.Response) {
					return resp, "NotFound", nil
				}
				return nil, "", fmt.Errorf("Error reading CosmosDB Account %q after delete (Resource Group %q): %+v", name, resourceGroup, err2)
			}

			return resp, "Deleting", nil
		},
	}
	if _, err = stateConf.WaitForState(); err != nil {
		return fmt.Errorf("Waiting forCosmosDB Account %q to delete (Resource Group %q): %+v", name, resourceGroup, err)
	}

simongh · 2019-08-19T13:59:23Z

I think I've found the problem. The resource does a future.WaitForCompletionRef and the acceptance test didn't. I modified the test to use this & it's working as expected.

I've also updated the documentation to outline what happens with secondary databases

katbyte

Awesome! thanks for fixing that @simongh 🙂 this LGTM now 🚀

katbyte · 2019-08-20T14:52:53Z

I hope you don't mind but i pushed some changes to fix the merge errors.

simongh · 2019-08-20T18:21:39Z

That's great. Thanks for your help.

…

On Tue, 20 Aug 2019, 16:24 kt, ***@***.***> wrote: Merged #3901 <#3901> into master. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3901>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AALHWT5537QX2RSILHCJFWDQFQEENANCNFSM4IFXWNJA> .

ghost · 2019-08-22T11:25:40Z

This has been released in version 1.33.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 1.33.0"
}
# ... other configuration ...

ghost · 2019-09-20T14:50:36Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

ghost added size/XL documentation labels Jul 22, 2019

katbyte requested changes Aug 2, 2019

View reviewed changes

katbyte added new-resource service/mssql Microsoft SQL Server waiting-response labels Aug 2, 2019

katbyte added this to the v1.33.0 milestone Aug 2, 2019

ghost removed the waiting-response label Aug 4, 2019

katbyte requested changes Aug 5, 2019

View reviewed changes

katbyte added the waiting-response label Aug 5, 2019

ghost removed the waiting-response label Aug 5, 2019

katbyte requested changes Aug 8, 2019

View reviewed changes

katbyte added the waiting-response label Aug 8, 2019

simongh added 4 commits August 9, 2019 09:34

Added SQL failover group resource

b93e14a

pr updates

821907c

PR review improvements

0e82371

More PR improvments

090a15f

ghost removed the waiting-response label Aug 9, 2019

Update resource_arm_sql_failover_group_test.go

4cf57cd

katbyte requested changes Aug 9, 2019

View reviewed changes

katbyte added the waiting-response label Aug 9, 2019

ghost removed the waiting-response label Aug 10, 2019

fixed Read-write policy & removed tags

bfb3499

katbyte added the waiting-response label Aug 10, 2019

ghost removed the waiting-response label Aug 11, 2019

Re-added tags schema

252e4a9

katbyte added the waiting-response label Aug 13, 2019

ghost removed the waiting-response label Aug 17, 2019

katbyte added the waiting-response label Aug 18, 2019

katbyte changed the title ~~Added SQL failover group resource~~ New Resource: azurerm_sql_failover_group Aug 18, 2019

fixed acceptance test. Updated documentation

b1e906d

ghost removed the waiting-response label Aug 19, 2019

katbyte approved these changes Aug 20, 2019

View reviewed changes

katbyte added 2 commits August 20, 2019 07:41

git merge master -S

bc8a52b

Fixed merge issues

72e1372

katbyte merged commit a1483fd into hashicorp:master Aug 20, 2019

katbyte added a commit that referenced this pull request Aug 20, 2019

Update CHANGELOG.md to include #3901

5b87dd8

chrcaleb mentioned this pull request Sep 3, 2019

azurerm_sql_failover_group does not recognize the param read_write_endpoint_failover_policy #4230

Closed

ghost locked and limited conversation to collaborators Sep 20, 2019

simongh deleted the sql-failover-groups branch August 21, 2020 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Resource: `azurerm_sql_failover_group` #3901

New Resource: `azurerm_sql_failover_group` #3901

simongh commented Jul 22, 2019 •

edited by katbyte

Loading

leewilson86 commented Jul 26, 2019

simongh commented Jul 26, 2019 •

edited

Loading

katbyte left a comment

simongh commented Aug 4, 2019

katbyte left a comment

katbyte Aug 5, 2019

simongh commented Aug 5, 2019

simongh commented Aug 5, 2019

katbyte left a comment

katbyte Aug 8, 2019

simongh commented Aug 9, 2019

katbyte left a comment

simongh commented Aug 10, 2019

katbyte commented Aug 10, 2019

simongh commented Aug 10, 2019

katbyte commented Aug 10, 2019

simongh commented Aug 11, 2019

simongh commented Aug 12, 2019

katbyte commented Aug 13, 2019

simongh commented Aug 17, 2019

katbyte commented Aug 18, 2019

simongh commented Aug 19, 2019

katbyte left a comment

katbyte commented Aug 20, 2019

simongh commented Aug 20, 2019 via email

ghost commented Aug 22, 2019

ghost commented Sep 20, 2019

-			info["location"] = *server.Location
+		if v := server.Location; v != nil {
+			info["location"] = *v
+		}

New Resource: azurerm_sql_failover_group #3901

New Resource: azurerm_sql_failover_group #3901

Conversation

simongh commented Jul 22, 2019 • edited by katbyte Loading

leewilson86 commented Jul 26, 2019

simongh commented Jul 26, 2019 • edited Loading

katbyte left a comment

Choose a reason for hiding this comment

simongh commented Aug 4, 2019

katbyte left a comment

Choose a reason for hiding this comment

katbyte Aug 5, 2019

Choose a reason for hiding this comment

simongh commented Aug 5, 2019

simongh commented Aug 5, 2019

katbyte left a comment

Choose a reason for hiding this comment

katbyte Aug 8, 2019

Choose a reason for hiding this comment

simongh commented Aug 9, 2019

katbyte left a comment

Choose a reason for hiding this comment

simongh commented Aug 10, 2019

katbyte commented Aug 10, 2019

simongh commented Aug 10, 2019

katbyte commented Aug 10, 2019

simongh commented Aug 11, 2019

simongh commented Aug 12, 2019

katbyte commented Aug 13, 2019

simongh commented Aug 17, 2019

katbyte commented Aug 18, 2019

simongh commented Aug 19, 2019

katbyte left a comment

Choose a reason for hiding this comment

katbyte commented Aug 20, 2019

simongh commented Aug 20, 2019 via email

ghost commented Aug 22, 2019

ghost commented Sep 20, 2019

New Resource: `azurerm_sql_failover_group` #3901

New Resource: `azurerm_sql_failover_group` #3901

simongh commented Jul 22, 2019 •

edited by katbyte

Loading

simongh commented Jul 26, 2019 •

edited

Loading