Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Netkan for SQS mode #2798

Merged
merged 2 commits into from
Jun 25, 2019
Merged

Conversation

HebaruSan
Copy link
Member

@HebaruSan HebaruSan commented Jun 23, 2019

Motivation

See #2789; we want to refactor the NetKAN-bot to use AWS SQS so it can be managed and iterated more easily.

Changes

This pull request builds upon #2788 and should not be merged until after that one is merged. Click on the second commit to view the diff of the changes that are unique to this PR.

Now a new command format is supported. Note that you must set the credentials environment variables for this to work, values available upon request from @techman83:

export AWS_ACCESS_KEY_ID='(access key here)'
export AWS_SECRET_ACCESS_KEY='(secret key here)'
export AWS_REGION='us-west-2'
mono netkan.exe --queues InboundDev.fifo,OutboundDev.fifo

This new command will authenticate with AWS SQS and retrieve inflation requests from the input queue in this format:

Payload: JSON contents of a .netkan file
Attr Releases: Number of releases to retrieve (optional, 1 if omitted)

Such requests may be generated for testing with this Python script, which also requires the above environment variables and accepts a path to a .netkan file as its parameter:

This JSON is then inflated via existing Netkan logic. A message is then sent to the output queue in this format:

Payload: JSON contents of the generated .ckan file
Attr ModIdentifier: Identifier of the module
Attr Staged: True if the netkan uses staging, false for immediate commit to master
Attr Success: True if successful, false if there were errors
Attr ErrorMessage: Error message if any, otherwise this attribute is omitted
Attr CheckTime: Timestamp when message was sent
Attr FileName: output-dir/identifier-version.ckan string explaining where to save the file

The output queue may be inspected with this Python script:

After this it will check the input queue again for more inflation requests, forever.

Fixes #2789.

Open questions / left to-do

This isn't 100% complete yet.

  • Merge Migrate Perl validation checks into netkan.exe #2788
  • Releases as an input attribute (to be confirmed with design committee first)
    • Releases as a parameter internally without re-creating the NetkanTranslator
  • Get region from environment or config file instead of hard coding in C#
  • Do we need a way to stop the loop? Currently it's a while(true)
  • Unknown unknowns

@HebaruSan HebaruSan added Enhancement New features or functionality Pull request In progress We're still working on this Netkan Issues affecting the netkan data Infrastructure Issues affecting everything around CKAN (the GitHub repos, build process, CI, ...) labels Jun 23, 2019
@HebaruSan HebaruSan requested a review from techman83 June 23, 2019 20:19
@HebaruSan HebaruSan changed the title Feature/netkan sqs Refactor Netkan for SQS mode Jun 23, 2019
@HebaruSan HebaruSan force-pushed the feature/netkan-sqs branch from 4315c5c to 6d7134f Compare June 23, 2019 20:26
@DasSkelett
Copy link
Member

If I spot this right, when the queue is empty getNetkan() returns null, and the if-clause makes the loop repeat immediately.
What about a short break, to not spam out requests (and maybe further save some cpu credits)?

@HebaruSan
Copy link
Member Author

The break/delay happens during the client.ReceiveMessage call already; it will sit there for 20 seconds before returning unless a message comes in. It's not obvious how that works without seeing it; I'll PM you the credentials on the forum for testing.

@DasSkelett
Copy link
Member

Thanks. And I should read the documentation now ^^

@DasSkelett
Copy link
Member

The queue names are also hardcoded atm, IMO it makes sense to outsource them too (they have to be changed before merge anyway).
They aren't listed as supported env-vars, so either create own env-vars and read them, or with a config file, or whatsoever.

@HebaruSan
Copy link
Member Author

Well we're going to have to hard-code something. Either we know the queue names and use them to look up the URLs, or we know some other keys and use them to look up the names and use them to look up the URLs, etc. I'll defer to @techman83 on how this is typically handled.

@DasSkelett
Copy link
Member

DasSkelett commented Jun 23, 2019

Okay.
Just tested a failing netkan (WoodSpaceprogram).
First, the log message at line 50 in Queuehandler.cs, Sending failure: Could not find Wood Space Program entry in zipfile to install is a bit misleading, it sounds like the send process failed, which isn't the case at that point.
I would word it Sending inflation failure: ... or alike.

Second, sending fails indeed: Send failed: The request must contain the parameter MessageBody.
So SendMessageRequest doesn't like a real null an empty string for MessageBody.

@HebaruSan
Copy link
Member Author

Good points, I'm going to try deleting all the messages so I can see new ones...

@HebaruSan
Copy link
Member Author

I have no idea why Amazon decided an empty body was so unacceptable that it should throw an exception; it invites silly things like sending "{}" as the body. Which is what we do now.

@DasSkelett
Copy link
Member

Oh, I thought it was a null object that was passed in SendMessageRequest(), but it was an empty string. That's really picky, yep.

Otherwise, the PR looks fine for me. I'm impressed that you were so fast.

@techman83
Copy link
Member

Well we're going to have to hard-code something. Either we know the queue names and use them to look up the URLs, or we know some other keys and use them to look up the names and use them to look up the URLs, etc. I'll defer to @techman83 on how this is typically handled.

I think we should pass the queue names in as CLI argments or environment variables, I think it's acceptable to look up the queue URLs when constructing the class.

Releases

I have a few ideas, but I think we should open a separate feature. There is a lot to do in the first pass, but we'll be in a much better place to rapidly get changes into the wild.

Code review

I've had a none C# dev look over the code and I think this is excellent! Likely some extra requirements will shake out as we use it in anger, but this definitely meets and surpasses the MVP.

Netkan/Processors/QueueHandler.cs Outdated Show resolved Hide resolved
Netkan/Processors/QueueHandler.cs Show resolved Hide resolved
Netkan/Processors/QueueHandler.cs Show resolved Hide resolved
@HebaruSan
Copy link
Member Author

HebaruSan commented Jun 24, 2019

InboundDev.fifo isn't working for me anymore. Nothing shows up after I run send_netkan.py. Did we break it?

EDIT: Working again now. Odd.

@HebaruSan HebaruSan force-pushed the feature/netkan-sqs branch from 8034d9a to 6db85bc Compare June 24, 2019 19:13
@HebaruSan
Copy link
Member Author

HebaruSan commented Jun 24, 2019

Changes in latest commit:

  • --sqs flag replaced with --queues InboundDev.fifo,OutboundDev.fifo to specify queue names
  • Retrieve and process up to 10 messages at once
  • If input message contains Releases attribute, will process that many releases
  • Delete messages after processing rather than after receiving to make killing the process safer

@HebaruSan HebaruSan force-pushed the feature/netkan-sqs branch from 6db85bc to 6efde20 Compare June 24, 2019 20:17
@HebaruSan
Copy link
Member Author

Pushed just now:

  • Get the region from the environment. Turns out this works, but you need to use AWS_REGION instead of AWS_DEFAULT_REGION; instructions in PR text updated.

@HebaruSan HebaruSan force-pushed the feature/netkan-sqs branch from 6efde20 to c1b6fe6 Compare June 24, 2019 20:32
@techman83
Copy link
Member

techman83 commented Jun 25, 2019

InboundDev.fifo isn't working for me anymore. Nothing shows up after I run send_netkan.py. Did we break it?

EDIT: Working again now. Odd.

That's likely based around our discussion on the MessageDeduplicationId, my little script hashes the body. I think this is likely the correct for generating content for the inbound queue in production. But for testing, a random string would be better.

Changes in latest commit:

  • --sqs flag replaced with --queues InboundDev.fifo,OutboundDev.fifo to specify queue names
  • Retrieve and process up to 10 messages at once
  • If input message contains Releases attribute, will process that many releases
  • Delete messages after processing rather than after receiving to make killing the process safer

Those are some awesome additions! I think for now we should let this shake out and I'm happy for it to be merged when ready and make any improvements as needed. My primary motivator for this re-architecture is to make deploying updates to production much quicker and easier.

We can almost fully automate the process, meaning:

  • Won't need to wait for an indexing run to finish
  • Copy a package to a box, install it
  • Deal with complicated test failures
  • Get side tracked waiting and forget to do it

Pushed just now:

  • Get the region from the environment. Turns out this works, but you need to use AWS_REGION instead of AWS_DEFAULT_REGION; instructions in PR text updated.

I see. Looks like a bunch of the SDKs use AWS_REGION, just not the CLI or the boto3 library. Good catch :-)

@HebaruSan HebaruSan force-pushed the feature/netkan-sqs branch from c1b6fe6 to 6c67e6a Compare June 25, 2019 16:55
@HebaruSan
Copy link
Member Author

The last blocker for this PR is code review for dependency #2788, which we need to ensure that the generated metadata is valid. If anybody is interested in having these changes merged, consider taking a look at that PR.

@HebaruSan HebaruSan removed the In progress We're still working on this label Jun 25, 2019
@DasSkelett
Copy link
Member

Yeah, I didn't really jump on #2788 because it's line-wise a lot, but I will look at it now.

@HebaruSan
Copy link
Member Author

Yeah, I didn't really jump on #2788 because it's line-wise a lot, but I will look at it now.

Thanks!! Yes, it looks like a lot, but it has the advantage of being made up of many much smaller pieces, each of which is pretty simple.

@HebaruSan HebaruSan force-pushed the feature/netkan-sqs branch from 6c67e6a to 5b97364 Compare June 25, 2019 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New features or functionality Infrastructure Issues affecting everything around CKAN (the GitHub repos, build process, CI, ...) Netkan Issues affecting the netkan data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NetKAN Bot Re-Architecture
3 participants