Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jolokia discovery autodiscover provider #7141

Merged
merged 9 commits into from
Jun 4, 2018

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented May 18, 2018

Jolokia Discovery as an autodiscover provider.

Jolokia discovery works using UDP multicast, services can expose their jolokia endpoints by joining a multicast group and replying a probe when requested. In this implementation, metricbeat sends the query periodically (every 10 seconds by default) to the multicast group and keeps a list of known instances. Unknown instances are reported as started the first time they are seen, and known instances that stop responding during a time (~30 seconds by default) are considered stopped. The probe is sent using a list of configured interfaces.

I was not sure if doing it as an autodiscover provider was the proper way to do it, but at the end having templates for this is pretty handy. To support Jolokia Discovery as a plain module/metricset would have required some tricky host-less configurations.

I thought about not having generic templates and having just specific Jolokia/JMX configurations as by now it mostly makes sense to use it to configure the jmx metricset. But this way we can use the same templates here, and it can also be a starting point for more creative uses, as collecting logs apart of metrics (if paths can be derived from jolokia info), or to configure future metricsets based on Jolokia (imagine for example metricsets in the kafka or tomcat modules that collect metrics using Jolokia).

It works with a configuration like this one:

metricbeat.autodiscover:
  providers:
    - type: jolokia
      interfaces:
      - name: br*
        interval: 5s
        grace_period: 10s
      - name: en*
        interval: 60s
      templates:
      - condition:
          contains:
            jolokia.server.product: "tomcat"
        config:
        - module: jolokia
          metricsets: ["jmx"]
          hosts: "${data.jolokia.url}"
          namespace: test
          jmx.mappings:
          - mbean: "java.lang:type=Runtime"
            attributes:
            - attr: Uptime
              field: uptime

With this configuration it generates events containing these fields:

    "jolokia": {
      "test": {
        "uptime": 17396313
      },
      "url": "http://172.18.0.3:8778/jolokia",
      "agent": {
        "id": "172.18.0.3-6-47ef4541-servlet",
        "version": "1.5.0"
      },
      "secured": false,
      "server": {
        "product": "tomcat",
        "vendor": "Apache",
        "version": "7.0.86"
      }
    },

@jsoriano jsoriano added in progress Pull request is currently in progress. discuss Issue needs further discussion. libbeat Metricbeat Metricbeat labels May 18, 2018
@jsoriano jsoriano requested review from ruflin and exekias May 18, 2018 18:06
p.bus.Publish(event)
}

func (p *Provider) Stop() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method Provider.Stop should have comment or be unexported

}, nil
}

func (p *Provider) Start() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method Provider.Start should have comment or be unexported

discovery *Discovery
}

func AutodiscoverBuilder(bus bus.Bus, c *common.Config) (autodiscover.Provider, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function AutodiscoverBuilder should have comment or be unexported

autodiscover.Registry.AddProvider("jolokia", AutodiscoverBuilder)
}

type Provider struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type Provider should have comment or be unexported

logp.Err("failed to update agent without id: " + err.Error())
return
}
agentId, ok := v.(string)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var agentId should be agentID

stop chan struct{}
}

func (d *Discovery) Start() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method Discovery.Start should have comment or be unexported

Message common.MapStr
}

type Discovery struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type Discovery should have comment or be unexported

return event
}

type Instance struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type Instance should have comment or be unexported

Message common.MapStr
}

func (e *Event) BusEvent() bus.Event {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method Event.BusEvent should have comment or be unexported

"url": c.Str("url"),
}

type Event struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type Event should have comment or be unexported

Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick look, this is looking awesome!! great work here, and perfect use of autodiscover. Do you see any use for it outside Metricbeat? If not, I would consider moving it to Metricbeat, and register it only there.

It would be really nice to see a blog post about this provider once it's in :)

for {
n, _, err := conn.ReadFrom(b)
if err != nil {
if !err.(net.Error).Timeout() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably you need a type check/switch here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle err here is always going to be nil or of type net.Error, but yes, just in case it can be a good idea to properly check the type :)

@@ -29,6 +29,8 @@ func TestFetch(t *testing.T) {
}

func TestData(t *testing.T) {
compose.EnsureUp(t, "jolokia")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how was this passing tests before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we are not running tests with -data in CI, and probably when manually running it, the service is also manually started.

@ruflin
Copy link
Contributor

ruflin commented May 22, 2018

+1 on moving forward with this one.

@jsoriano
Copy link
Member Author

@exekias regarding using it only in metricbeat, yes, I think that in principle it has mostly sense to be used in metricbeat, and more specifically in the jolokia jmx metricset :)
But as I mention in the description of the PR, I'd leave it open to creative uses, for example an application could have its logs in /var/log/apps/${data.jolokia.agent.id}.log and an specific prospector could be created dynamically for it, or filebeat modules could be dynamically enabled depending on the server.product.

p.bus.Publish(event)
}

func (p *Provider) Stop() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method Provider.Stop should have comment or be unexported

Message common.MapStr
}

type Discovery struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type Discovery should have comment or be unexported

GracePeriod time.Duration `config:"grace_period"`
}

func (c InterfaceConfig) WithDefaults() InterfaceConfig {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method InterfaceConfig.WithDefaults should have comment or be unexported

Templates template.MapperSettings `config:"templates"`
}

type InterfaceConfig struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type InterfaceConfig should have comment or be unexported

@jsoriano
Copy link
Member Author

Interval and grace period configuration at the interface level now.

@jsoriano jsoriano force-pushed the jolokia-discovery branch 3 times, most recently from d947b5b to 9121257 Compare May 24, 2018 12:00
@jsoriano
Copy link
Member Author

jsoriano commented May 24, 2018

I have been playing with jolokia and the possibilities for autoconfiguration are quite amazing, basically anything that is configurable in the JVM is exposed, so for example from a Tomcat process we can obtain:

  • Catalina:type=Engine/baseDir: "/usr/apache-tomcat-7.0.86"
  • Catalina:host=localhost,name=AccessLogValve,type=Valve/directory: "logs"
  • Catalina:host=localhost,name=AccessLogValve,type=Valve/prefix: "localhost_access_log."
  • Catalina:host=localhost,name=AccessLogValve,type=Valve/pattern: "%h %l %u %t \"%r\" %s %b"

With that we'd know the format of the logs, and the path where they are stored.

In any case by now I'm going to get Discovery support merged and we can think on more possibilities in the future.

@jsoriano jsoriano force-pushed the jolokia-discovery branch from 8fba519 to 9b0e0ab Compare May 25, 2018 11:07

["source","yaml",subs="attributes"]
-------------------------------------------------------------------------------
metricbeat.autodiscover:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be filebeat.autodisover

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed


// WithDefaults returns an InterfaceConfig with default values
func (c InterfaceConfig) WithDefaults() InterfaceConfig {
if c.Interval == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use ucfg here somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took a while to me to understand how I could do it, but done :)


These are the available fields on every event:

* jolokia.agent.id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these fields also added to each event indirectly created through auto discovery?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are added, I think that by the Autodiscover framework.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsoriano jsoriano force-pushed the jolokia-discovery branch 3 times, most recently from d517672 to 2b3778e Compare May 28, 2018 11:08

type InterfaceConfigs []*InterfaceConfig

func (cs *InterfaceConfigs) Unpack(from interface{}) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method InterfaceConfigs.Unpack should have comment or be unexported

GracePeriod time.Duration `config:"grace_period" validate:"positive,nonzero"`
}

type InterfaceConfigs []*InterfaceConfig

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type InterfaceConfigs should have comment or be unexported

@jsoriano jsoriano force-pushed the jolokia-discovery branch 4 times, most recently from 6c2790a to 97df6e3 Compare May 28, 2018 11:26
@jsoriano
Copy link
Member Author

jenkins, test this again, please

@jsoriano jsoriano force-pushed the jolokia-discovery branch from 97df6e3 to 80166dd Compare May 28, 2018 16:16
@jsoriano jsoriano force-pushed the jolokia-discovery branch from 48759f0 to 948253d Compare May 29, 2018 10:20

// Message contains the information of a Jolokia Discovery message
var messageSchema = s.Schema{
"agent": s.Object{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is not the agent that ships the data but the jolokia agent that discovered the data. Just trying to get my head around this for ECS: https://github.com/elastic/ecs#agent The agent on the top level in these cases would still be metricbeat or filebeat, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is about the jolokia agent, under the jolokia namespace.

@ruflin
Copy link
Contributor

ruflin commented May 29, 2018

Should we add a quick example to the autodiscover part in the reference file? At the same time I think it's time to link to the auto discovery docs from our reference and config files to get all details there.

Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments

// And then one after each interval
ticker := time.NewTicker(i.Interval)
defer ticker.Stop()
cases = append(cases, reflect.SelectCase{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not particularly a fan of using reflect, it makes the code difficult to read, perhaps this could be solved with a goroutine per interface?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better without reflect, yes :) I traslated directly the select of the previous implementation to a dynamic select.

}
agentID, ok := v.(string)
if len(agentID) == 0 || !ok {
logp.Err("empty agent?")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be more verbose

@jsoriano
Copy link
Member Author

@ruflin, do you mean adding an example to the reference config of JMX?

@jsoriano jsoriano force-pushed the jolokia-discovery branch 2 times, most recently from 65ad910 to 5daa22c Compare May 29, 2018 21:56
@jsoriano jsoriano force-pushed the jolokia-discovery branch from 5daa22c to 41c6e10 Compare May 29, 2018 21:56
@ruflin
Copy link
Contributor

ruflin commented May 30, 2018

@jsoriano Yes, similar to what we have here for the docker autodiscovery: https://github.com/elastic/beats/blob/master/metricbeat/_meta/common.reference.yml#L34 But TBH I think it's better we start to add links to our config files to the docs.

In general I wonder if autodiscovery should be documented in one place and show example for each Beat or if it should be document in each beat and there only have the examples for this Beat. (cc @dedemorton )

@dedemorton
Copy link
Contributor

@ruflin First I'd consider the audience. Since the person who will set up autodiscovery is also the person who will set up the config file, I'd leave the content where it is for now (single-sourced in libbeat and reused in all the beats docs).

I already have an item on my to-do list to improve the docs about running on containers. I think we bury too much info in the "configuring" sections. I'd like to pull all the topics about running beats on containers and put them together in one section since the target audience is clear. It's on my to-do list, but maybe I need to pop the priority?

@ruflin
Copy link
Contributor

ruflin commented May 31, 2018

@dedemorton I think the problem here is that the examples for Filebeat and Metricbeat are potentially different so it can't be just reused. This here is not necessarly related to docker / K8s. For the priorities : you know best :-)

@dedemorton
Copy link
Contributor

@ruflin Not sure I'm following you completely. The way the documentation is set up, the include statement already pulls in the example that's relevant to the beat: include::../../{beatname_lc}/docs/autodiscover-jolokia-config.asciidoc[]

Maybe you're thinking about this from the perspective of how you will link to the doc examples from within the reference yml?

(I get your point about Jolokia and docker.... I guess I should understand more about Jolokia before I make broad statements. :-P)

@ruflin
Copy link
Contributor

ruflin commented Jun 1, 2018

@dedemorton I think I missed the part that we already include beat specific configs.

@jsoriano You will need an other rebase.

@jsoriano
Copy link
Member Author

jsoriano commented Jun 1, 2018

Branch updated.

@ruflin @dedemorton do you think I should change something in the docs before merging? or we can merge and revisit this later?

@ruflin
Copy link
Contributor

ruflin commented Jun 1, 2018

@jsoriano No change needed now.

@exekias
Copy link
Contributor

exekias commented Jun 4, 2018

jenkins, retest this please (WFG)

@exekias exekias merged commit 5e50828 into elastic:master Jun 4, 2018
@beirtipol
Copy link

@jsoriano - thanks for turning this around. Any eta for this going live?

@jsoriano
Copy link
Member Author

@beirtipol in principle this feature will be released as experimental on Beats 6.4.

@beirtipol
Copy link

@jsoriano thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. libbeat Metricbeat Metricbeat review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants