Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Support AWS Glue #1416

Closed
darrenhaken opened this issue Aug 15, 2017 · 36 comments
Closed

Feature request: Support AWS Glue #1416

darrenhaken opened this issue Aug 15, 2017 · 36 comments
Labels
new-resource Introduces a new resource. service/glue Issues and PRs that pertain to the glue service.

Comments

@darrenhaken
Copy link
Contributor

AWS Glue has now become GA https://aws.amazon.com/glue/

It would be nice to have this running through Terraform. Do you accept PR's as I could look to add it?

@Ninir
Copy link
Contributor

Ninir commented Aug 15, 2017

Hey @darrenhaken

It seems the service is available in the Go SDK and CRUDs operations are also available.
Feel free to open a PR for that, we'll be more than happy to review & help on that :)

@Ninir Ninir added the enhancement Requests to existing resources that expand the functionality or scope. label Aug 15, 2017
@darrenhaken
Copy link
Contributor Author

darrenhaken commented Aug 15, 2017 via email

@Ninir
Copy link
Contributor

Ninir commented Aug 15, 2017

Vendor dependencies

Generally speaking, you first need to vendor the dependencies, as exposed here: #1417.

Connection management

To be able to interact with the SDK and thus the API, you would need to hook the connection, in the aws/config.go file. Then, you would be able to get & use it in the resource.

Resource creation

A good start is the plugin provider documentation page, along with the core repository documentation.

At first you would need to create a HCL structure that matches the API. This is about creating a schema, which is an abstraction of the resource itself.

Tests

Adding acceptance tests is a good way of testing without the need of compiling yourself but reproducing a real-world use-case.
I encourage you to check what was already made by other contributors, which is a good start.

Once the test is written, run it:

make testacc TEST=./aws TESTARGS='-run=TestAccAWSResourceGlueClassifier_'

This will run only the test with name prefix being TestAccAWSResourceGlueClassifier_.

Documentation

The documentation is a required step for people to use it. It should be crystal clear about how to use each options, what is required, optional.
Again, I encourage you to check what was already done, will help a lot 😄

Global example

Check out this example about hooking into the SDK, resource, tests, documentation, etc.

Also, as per the 0.10 providers split, you may then need to build your local provider version using the README page to check it locally.

Last bit, if you could open a PR for the vendoring, and then 1 for each resource, it would be awesome.
For instance, 1 PR for the classifier + tests + doc, 1 for the devendpoint + tests + doc, etc.
If you need any review (even in a WIP), just ask :)

Hope all of this is good to you!
Have fun :) 👍

@drewsonne
Copy link
Contributor

drewsonne commented Jan 12, 2018

I'm (probably) going to work my way through the resources. If anyone has any wishes as to which glue resource they would want next after the above database resource, let me know.

I'm taking requests, but can't promise anything.

@darrenhaken
Copy link
Contributor Author

@drewsonne I'd like the crawlers next and complete the data catalogue part of the product.

Do you need any help working on it?

@bflad
Copy link
Contributor

bflad commented Jan 12, 2018

The aws_glue_catalog_database resource has been released in terraform-provider-aws version 1.7.0. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

@darrenhaken
Copy link
Contributor Author

darrenhaken commented Jan 12, 2018 via email

@darrenhaken
Copy link
Contributor Author

@drewsonne have you done any work on the crawlers? If not I will pick this up.

@drewsonne
Copy link
Contributor

drewsonne commented Jan 15, 2018

@darrenhaken I have not yet. My first plan was to build a test case with test steps in a similiar manner to the database catalogue. I might get onto this in the week, but let me know how you go and if you start work on this in another repo

@darrenhaken
Copy link
Contributor Author

@drewsonne I'll take a look at the Crawlers. Is the vendor code up to date and contains the AWS SDK API for it?

@drewsonne
Copy link
Contributor

Can't remember, but it's in the vendor json file

@drewsonne
Copy link
Contributor

@darrenhaken I've added a skeleton test case for the crawler here. drewsonne@2dcd3c6 . It's by no means exhaustive, but would definitely be a start.

Not sure how you want to do this, but that's a start for you.

@darrenhaken
Copy link
Contributor Author

@drewsonne thanks, I'll use that to start looking at it.

@darrenhaken
Copy link
Contributor Author

@drewsonne I'm pretty close to having the crawlers finished

@drewsonne
Copy link
Contributor

@darrenhaken sweet, I assume it's the "glue_crawlers" branch in your repo?

@darrenhaken
Copy link
Contributor Author

darrenhaken commented Jan 18, 2018

@drewsonne that's the one. I haven't pushed it because I've been wrangling some issues with defining a schema.

Have you done much with TF before? Maybe you can provide a pointer or two if I show you.

WIP has been pushed now if interested to see progress

@drewsonne
Copy link
Contributor

@darrenhaken I've got a bit of experience writing providers. What are the issues you've been wrangling with the schema specifically?

@darrenhaken
Copy link
Contributor Author

darrenhaken commented Jan 21, 2018

@drewsonne I'm working on creating the Crawler with the required fields Name, DatabaseName and Targets. See https://docs.aws.amazon.com/sdk-for-go/api/service/glue/#CreateCrawlerInput

Here is the struct definition for Targets which is a nested struct within CreateCrawlerInput:

type CrawlerTargets struct {

    // Specifies JDBC targets.
    JdbcTargets []*JdbcTarget `type:"list"`

    // Specifies Amazon S3 targets.
    S3Targets []*S3Target `type:"list"`
    // contains filtered or unexported fields
}

What I am trying to work out is what is the correct TF schema to map to the CrawlerTargets struct. I made an attempt in the acceptance test to do this but I didn't get any luck.

FYI the acceptance test isn't complete yet - you need to have an AWS account with an S3 bucket and an AWS Glue DB.

@drewsonne
Copy link
Contributor

drewsonne commented Jan 22, 2018

In these cases (and I think there's some precedence for this, but I can not remember where), I would flatten the schema a bit, eg:

"s3_targets": {
	Type:          schema.TypeList,
	Optional:      true,
	MinItems:      1,
	Elem: &schema.Resource{
		Schema: map[string]*schema.Schema{
			"path": {
				Type:     schema.TypeString,
				Required: true,
			},
			"exclusions": {
				Type:     schema.TypeSet,
				Optional: true,
				Elem:     schema.TypeString,
			},
		},
	},
},
"jdbc_targets": {
	Type:          schema.TypeList,
	Optional:      true,
	MinItems:      1,
	Elem: &schema.Resource{
		Schema: map[string]*schema.Schema{
			"connection_name": {
				Type:     schema.TypeString,
				Required: true,
			},
			"path": {
				Type:     schema.TypeString,
				Required: true,
			},
			"exclusions": {
				Type:     schema.TypeSet,
				Optional: true,
				Elem:     schema.TypeString,
			},
		},
	},
},

And then put them back into the schema structure back in the Read/Write/Update handlers. I think this is simpler as the two top-level attributes have very specific schemas, in comparison to either having two different types of objects (jdbc and s3) smushed into one schema, or having an extra layer in the config tree.

Hoping maybe @Ninir, @bflad, or @radeksimko would weigh in on this?

@darrenhaken
Copy link
Contributor Author

darrenhaken commented Jan 22, 2018

@drewsonne I'd thought about flattening the schema. The only drawback is that it affects Required: true fields, you only need s3_targets or jdbc_targets so declaring them as both being required doesn't make sense.

@radeksimko radeksimko added new-resource Introduces a new resource. service/glue Issues and PRs that pertain to the glue service. and removed enhancement Requests to existing resources that expand the functionality or scope. labels Jan 22, 2018
@bflad
Copy link
Contributor

bflad commented Jan 22, 2018

In general, we tend to prefer to follow the AWS API/schema unless the user experience can be heavily improved by doing our own implementation due to future maintenance costs.

While we cannot set a Required: true type schema definition to say require one of s3_targets or jdbc_targets (at the moment anyways -- its a decent request we could use in other places as well and probably worth submitting a feature request upstream into Terraform core), we can do the logic in create/update to essentially require one or the other.

e.g. something like this should do the trick

jdbc_targets, jdbc_targets_ok := d.GetOk("jdbc_targets")
s3_targets, s3_targets_ok := d.GetOk("s3_targets")
if !jdbc_targets_ok && !s3_targets_ok {
    return fmt.Errorf("jdbc_targets or s3_targets configuration is required")
}

@darrenhaken
Copy link
Contributor Author

darrenhaken commented Jan 22, 2018

@bflad that makes sense, I think deviating too much from the AWS API would only cause confusion in the long run. I'll look at combining both yours and @drewsonne suggestions.

@drewsonne
Copy link
Contributor

btw, @darrenhaken if you keep your git branch up to date, I can write some acceptance tests for you with the S3 bucket and the glue db and even run them.

@darrenhaken
Copy link
Contributor Author

darrenhaken commented Jan 25, 2018 via email

@darrenhaken
Copy link
Contributor Author

darrenhaken commented Jan 25, 2018 via email

@cemo
Copy link

cemo commented Feb 16, 2018

@darrenhaken what is the situation of your work? Any progress?

@darrenhaken
Copy link
Contributor Author

@cemo I’ll start looking at it again this week. I had to flip back to finishing off another TF PR which has been accepted now. Sorry for the delay!

@chwevans
Copy link

chwevans commented Mar 2, 2018

I'm interested in using this as well, is there any expected timeline?

@darrenhaken
Copy link
Contributor Author

I’ll add crawlers this week (hopefully) with the required fields added. I will link the repo/branch.

@darrenhaken
Copy link
Contributor Author

@drewsonne did you still want to work on this?

@darrenhaken
Copy link
Contributor Author

darrenhaken commented Mar 14, 2018

I've been working on this some more and I have an acceptance test with the following TF:

resource "aws_glue_catalog_crawler" "test" {
	  name = "test"
	  database_name = "db_name"
	  role = "${aws_iam_role.glue.arn}"
	  s3_target {
		path = "s3://bucket"
	  }
	}
	
	resource "aws_iam_role_policy_attachment" "aws-glue-service-role-default-policy-attachment" {
  		policy_arn = "arn:aws:iam::aws:policy/AWSGlueServiceRole"
  		role = "${aws_iam_role.glue.arn}"
	}
	
	resource "aws_iam_role" "glue" {
  		name = "AWSGlueServiceRoleDefault"
  		assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "glue.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
	}

I receive the following error:

=== RUN   TestAccAWSGlueCrawler_full
--- FAIL: TestAccAWSGlueCrawler_full (13.25s)
	testing.go:513: Step 0 error: Error applying: 2 error(s) occurred:

		* aws_iam_role_policy_attachment.aws-glue-service-role-default-policy-attachment: 1 error(s) occurred:

		* aws_iam_role_policy_attachment.aws-glue-service-role-default-policy-attachment: [WARN] Error attaching policy arn:aws:iam::aws:policy/AWSGlueServiceRole to IAM Role arn:aws:iam::{account_id}:role/AWSGlueServiceRoleDefault: ValidationError: The specified value for roleName is invalid. It must contain only alphanumeric characters and/or the following: +=,.@_-
			status code: 400, request id: 43da944e-27a7-11e8-8481-452f34f141d6
		* aws_glue_catalog_crawler.test: 1 error(s) occurred:

		* aws_glue_catalog_crawler.test: error creating Glue crawler: InvalidInputException: Service is unable to assume role arn:aws:iam::{account_id}:role/AWSGlueServiceRoleDefault. Please verify role's TrustPolicy
			status code: 400, request id: 4405c2e7-27a7-11e8-9a99-111706a35ce0
FAIL
FAIL	github.com/terraform-providers/terraform-provider-aws/aws	13.293s
make: *** [testacc] Error 1

Any ideas?

@bflad
Copy link
Contributor

bflad commented Mar 15, 2018

This error is being generated from the AWS API:

		* aws_iam_role_policy_attachment.aws-glue-service-role-default-policy-attachment: [WARN] Error attaching policy arn:aws:iam::aws:policy/AWSGlueServiceRole to IAM Role arn:aws:iam::697329683179:role/AWSGlueServiceRoleDefault: ValidationError: The specified value for roleName is invalid. It must contain only alphanumeric characters and/or the following: +=,.@_-
			status code: 400, request id: 43da944e-27a7-11e8-8481-452f34f141d6

Relevant code:

https://github.com/terraform-providers/terraform-provider-aws/blob/cc36c1d6c312b0e2cc024f97bb35ba6b59c3f8c2/aws/resource_aws_iam_role_policy_attachment.go#L41-L44

It looks like we're missing some validation to ensure its an IAM role name being passed in instead of a IAM role ARN (or properly handle this situation by parsing the name out of the ARN). We should probably make a separate issue for that. 😉

You can fix your Terraform configuration by changing the role attribute to use aws_iam_role.role.name instead of aws_iam_role.role.arn for now 👍

resource "aws_iam_role_policy_attachment" "aws-glue-service-role-default-policy-attachment" {
  policy_arn = "arn:aws:iam::aws:policy/AWSGlueServiceRole"
  role       = "${aws_iam_role.glue.name}"
}

@darrenhaken
Copy link
Contributor Author

I've tried the name and I still had the same issue. I'll be in the office in a couple of hours, I'll push my WIP on my fork if you wouldn't mind taking a look? The error is raised by Amazon around the crawler input and something about trust policy.

Any help is appreciated :-)

@darrenhaken
Copy link
Contributor Author

darrenhaken commented Mar 15, 2018

@bflad here is the changed code:

resource "aws_glue_catalog_crawler" "test" {
	  name = "test"
	  database_name = "db_name"
	  role = "${aws_iam_role.glue.name}"
	  s3_target {
		path = "s3://bucket"
	  }
	}
	
	resource "aws_iam_role_policy_attachment" "aws-glue-service-role-default-policy-attachment" {
  		policy_arn = "arn:aws:iam::aws:policy/AWSGlueServiceRole"
  		role = "${aws_iam_role.glue.name}"
	}
	
	resource "aws_iam_role" "glue" {
  		name = "AWSGlueServiceRoleDefault"
  		assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "glue.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
	}

Error is same as the original question and mentions:

Service is unable to assume role arn:aws:iam::{account}:role/AWSGlueServiceRoleDefault. Please verify role's TrustPolicy

You can see the code at:
git@github.com:darrenhaken/terraform-provider-aws.git and the branch is glue_crawlers

@bflad
Copy link
Contributor

bflad commented Mar 22, 2018

Hi, everyone! 👋 I'm going to split this work into more manageable issues so that there can be a clear definition of done for various support. Looking at the API, here are what look like relevant resources and data sources for Terraform to manage with the AWS Glue service:

Please follow along the split issues and note any that you would like to contribute in the new issues. Personally, I will be picking up connections and jobs shortly. Thanks!

@bflad bflad closed this as completed Mar 22, 2018
@ghost
Copy link

ghost commented Apr 7, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Apr 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
new-resource Introduces a new resource. service/glue Issues and PRs that pertain to the glue service.
Projects
None yet
Development

No branches or pull requests

7 participants