Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metric emitter lua module #2611

Merged
merged 1 commit into from
Jun 14, 2018

Conversation

fmejia97
Copy link
Contributor

@fmejia97 fmejia97 commented Jun 6, 2018

What this PR does / why we need it:
This PR adds the metric_emitter lua module that emits UDP messages with NGINX stats to the controller.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 6, 2018
@aledbf
Copy link
Member

aledbf commented Jun 6, 2018

@fmejia97 fmejia97 force-pushed the add-metric-emitter-module branch from a735193 to d8398b4 Compare June 6, 2018 18:19

table.insert(_M.queue, jsonPayload)

local ok, err = ngx.timer.at(0, flush_queue)
Copy link
Member

@aledbf aledbf Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the behavior of ngx.timer.at when nginx is receiving more than 1024 rps? (the default for pending timers)
https://github.com/openresty/lua-nginx-module#lua_max_pending_timers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will return nil, together with an error (from the documentation you linked)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aledbf we chatted about this, we'll be addressing it. Great catch!

We need to follow the way we did it in our fork - where we essentially queue up request logs and pop it all off at a timer event that runs every second.

@fmejia97 will be addressing that today :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aledbf Also worth mentioning - we use a nginx module to expose the active number of running timers; possibly worth merging this upstream to https://github.com/openresty/lua-nginx-module? Wdyt?

Copy link
Member

@aledbf aledbf Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdyt?

That sounds like a good idea 👍

Copy link
Member

@ElvinEfendi ElvinEfendi Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmejia97 @andrewlouis93 why don't you fork lift defer.lua from our fork and just do defer.to_timer_phase(<you function that sends the json payload>) in this middleware

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did :) @ElvinEfendi

@codecov-io
Copy link

codecov-io commented Jun 6, 2018

Codecov Report

Merging #2611 into master will decrease coverage by 0.13%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2611      +/-   ##
==========================================
- Coverage   40.83%   40.69%   -0.14%     
==========================================
  Files          75       75              
  Lines        5123     5123              
==========================================
- Hits         2092     2085       -7     
- Misses       2750     2756       +6     
- Partials      281      282       +1
Impacted Files Coverage Δ
cmd/nginx/main.go 23.66% <0%> (-5.35%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 764bcd5...966e9f5. Read the comment docs.

end

function _M.call()
-- Create JSON Metrics Payload --
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to explain the obvious

upstreamStatus = ngx.var.upstream_status or "-",
namespace = ngx.var.namespace or "-",
ingress = ngx.var.ingress_name or "-",
service = ngx.var.service_name or "-",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you avoid spacing like this? later on when we add another line that's longer we then will have to adjust all the others

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this list come from? From VTS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stats on the list were given by @aledbf on the kubernetes slack channel. This does not yet cover all VTS metrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmejia97 feel free to add all the fields you think are required to match the VTS module. I just only added the minimum to start the work with the UDP server

local assert = assert

local _M = {
queue = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queue can be local, no? why does it have to be exported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed anymore since we are going to use the defer module to delegate deferring callbacks (with queue optimization) to the timer phase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmejia97 could you use this version of the module instead: https://github.com/Shopify/ingress/blob/master/rootfs/etc/nginx/lua/util/defer.lua?

This doesn't export things that don't need to be exported

@ElvinEfendi
Copy link
Member

rootfs/etc/nginx/lua/metric_emitter.lua

What about monitor.lua?

local jsonPayload = cjson.encode({
host = ngx.var.host or "-",
status = ngx.var.status or "-",
remoteAddr = ngx.var.realip_remote_addr or "-",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing. ngx.var.realip_remote_addr is the original client IP, but remoteAddr indicates client IP (obtained from X-Frowarded-For if set). I'd say ngx.var.remote_addr is more valuable since it will be the true client IP.

Maybe have both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll add both to avoid confusion.

bytesSent = tonumber(ngx.var.bytes_sent) or -1,
protocol = ngx.var.server_protocol or "-",
method = ngx.var.request_method or "-",
path = ngx.var.uri or "-",
Copy link
Member

@ElvinEfendi ElvinEfendi Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not key it as uri instead of path to have better clarity and consistency with the rest of the mappings?

method = ngx.var.request_method or "-",
path = ngx.var.uri or "-",
requestLength = tonumber(ngx.var.request_length) or -1,
requestDuration = tonumber(ngx.var.request_time) or -1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, requestTime

path = ngx.var.uri or "-",
requestLength = tonumber(ngx.var.request_length) or -1,
requestDuration = tonumber(ngx.var.request_time) or -1,
upstreamName = ngx.var.upstream or "-",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ngx.var.upstream does not seem like a standard Nginx variable, are we setting it ourselves somewhere?

Copy link
Contributor Author

@fmejia97 fmejia97 Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, that was a mistake. It should've been ngx.var.proxy_upstream_name.

requestDuration = tonumber(ngx.var.request_time) or -1,
upstreamName = ngx.var.upstream or "-",
upstreamIP = ngx.var.upstream_addr or "-",
upstreamResponseTime = tonumber(ngx.var.upstream_response_time) or -1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this gonna work? when would tonumber return nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns nil when it fails to convert its argument to a number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil or -1 will return -1. The UDP listener expects some fields to be of type float64, and it will error when calling Unmarshal if the value of the field is not a number.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 8, 2018
@fmejia97
Copy link
Contributor Author

fmejia97 commented Jun 8, 2018

@ElvinEfendi @andrewlouis93 Addressed your comments 👍

@fmejia97 fmejia97 force-pushed the add-metric-emitter-module branch from b9e8f43 to 6c282df Compare June 8, 2018 14:28
@andrewloux
Copy link
Contributor

@fmejia97 We're still not matching all the fields published by the VTS module. I think this is the last thing we need to do before I think this can be merged in.

@ElvinEfendi
Copy link
Member

ElvinEfendi commented Jun 11, 2018

Code looks good to me! Can you add unit test for monitor.call? After that this is ready to 🚢

Regarding to matching VTS metrics I have not checked that - I'll wait for @andrewlouis93 to approve it. But we can always revisit that part as long as we have the basics.

return true
end

return _M
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline

@fmejia97
Copy link
Contributor Author

@andrewlouis93 @aledbf @ElvinEfendi Added metrics for connections(active|reading|writing|waiting) and total requests.

@andrewloux
Copy link
Contributor

Nice work @fmejia97 just some quick unit tests left now :)

@fmejia97
Copy link
Contributor Author

fmejia97 commented Jun 12, 2018

Added unit tests for monitor.lua and defer.lua. @ElvinEfendi @andrewlouis93 @aledbf

@fmejia97 fmejia97 force-pushed the add-metric-emitter-module branch 2 times, most recently from 571ee57 to a97baf9 Compare June 12, 2018 04:54
get_phase = function() return "timer" end,
var = {}
}
_G.ngx = _ngx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant indentation

describe("Monitor", function()
local monitor = require("monitor")
describe("encode_nginx_stats()", function()
ngx.var =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ElvinEfendi encode_nginx_stats() creates a JSON encoded structure using the current NGNX context stats (from ngx.var). In this test case, I'm setting a specific environment to test the method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I see you are doing it below but ngx.var = does not make sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh woops 😄didn't realize this. Removing 👍

describe("encode_nginx_stats()", function()
ngx.var =
it("successfuly encodes the current stats of nginx to JSON", function()
local nginxEnvironment = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use underscore naming convention when writing Lua code

describe("Defer", function()
describe("to_timer_phase", function()
it("executes passed callback immediately if called on timer phase", function()
assert.equal(true, defer.to_timer_phase(function() return 1 end))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this assert that the function was executed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Create a table here - and have the callback modify the table's value. E.g. Initialize defer.call_count = 0, and then in the callback - increment that value by 1.

Your assertion can check for that value having been increased.

The problem with this right now is that your test will pass regardless of whether or not the callback passed in to to_timer_phase is executed.

@ElvinEfendi
Copy link
Member

why are you putting the tests into defer/ and monitor/ folders?

assert(s:close())
end

function _M.encode_nginx_stats()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you test the public .call method with proper assertions this will be covered as well and you won't need to export this method just for testing purposes.

You can export only if you justify that testing .call is unnecessarily complex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ElvinEfendi My initial approach was to test .call method. Yesterday I spent a fair bit of the day trying to figure out how to make it work but couldn't find a clean / non-hacky way of testing it. Mostly due to the nature of the module: 1) Generate JSON struct form nginx context variables. 2) Create a timer with send_data callback. 3) Send the JSON struct as a UDP message. The main logic of the module is building the JSON structure (then use the interface of the other modules which are already tested). After pairing with Andrew we decided that this would be the cleanest way of going forward.

service = "test-app",
}

assert.are.same(decodedJSONStats,expectedJSONStats)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is some nice syntactic sugar 🍬

@aledbf
Copy link
Member

aledbf commented Jun 13, 2018

@fmejia97 @andrewlouis93 @ElvinEfendi what's missing to merge this?

Copy link
Contributor

@andrewloux andrewloux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🚀 🚀

@andrewloux
Copy link
Contributor

@aledbf this looks good, let's ship this.

@aledbf
Copy link
Member

aledbf commented Jun 14, 2018

@fmejia97 please squash this in fewer commits and we are ready to go

@fmejia97 fmejia97 force-pushed the add-metric-emitter-module branch from dc43bdf to 966e9f5 Compare June 14, 2018 02:54
@fmejia97
Copy link
Contributor Author

@aledbf Squashed commits ✅

@aledbf
Copy link
Member

aledbf commented Jun 14, 2018

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 14, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aledbf, fmejia97

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 14, 2018
@k8s-ci-robot k8s-ci-robot merged commit 43cf56c into kubernetes:master Jun 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants