Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add short ID matching to add_docker_metadata #6172

Merged
merged 1 commit into from
Feb 19, 2018
Merged

Add short ID matching to add_docker_metadata #6172

merged 1 commit into from
Feb 19, 2018

Conversation

boaz0
Copy link
Contributor

@boaz0 boaz0 commented Jan 25, 2018

closes #6092

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing @ripcurld0! This is an elegant solution to #6092 🎉

Could you please add some tests with long IDs?

Also please add a CHANGELOG.asciidoc entry

@boaz0
Copy link
Contributor Author

boaz0 commented Jan 25, 2018

@exekias sure! That's what I meant to do (hence "WIP" in the title).
No problem, I will update CHANGELOG.asciidoc too

@exekias exekias added the in progress Pull request is currently in progress. label Jan 25, 2018
@exekias
Copy link
Contributor

exekias commented Jan 25, 2018

Awesome, I totally missed the WIP part 😇

@@ -125,7 +128,7 @@ func NewWatcherWithClient(client Client, cleanupTimeout time.Duration) (*watcher
// Container returns the running container with the given ID or nil if unknown
func (w *watcher) Container(ID string) *Container {
w.RLock()
container := w.containers[ID]
container := w.containers[ID[:truncateLen]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should make this a config option or check always for the truncated id. I assume also with 12 digits the likelyhood of overlap is very low.

Copy link
Contributor

@exekias exekias Jan 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting point, in theory, chances of collision are around 0.000059%, which makes it very unlikely. That said, someone reported one here: moby/moby#28260 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really interesting. I didn't think about it, because I thought it was very-very-very-very uncommon to get a collision. I guess we should consider printing a warning message if we get more than one containers for a specific ID. Thanks for noticing this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought was that by default we should use the full id as there are probably reason the id is that long. The shorter id my assumption is more for user consumption but as it seems it's also around it some other places.

If a short id is found / provided it should be best effort to match it to a long one with something like contains. The problem is that this could become pretty inefficient as it's not a direct match in the array which makes the implementation you did pretty nice.

It would be great if we could find an efficient method to internally store the full id but still match the short id when needed.

Thought: What if we store both?

Copy link
Contributor Author

@boaz0 boaz0 Jan 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have another map map[string]*Container that contains short ID as a key.
When we detect that the given ID is of length 12 then we use that map otherwise use the original one.

Or actually, you can store both the short-ID and the long one at the same map and reduce the if len(id) > 12 check.

The disadvantage here is obviously a space-time trade-off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about keeping current code and make truncate length configurable? That would support the short id use case, but you have to enable it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine by me. @ruflin WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the short IDs are consistently 12 characters then adding a secondary entry to the map SGTM. But if we need to do a prefix search where the prefixes are of arbitrary length then maybe consider a different data structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewkroh there are going to be only two kind of IDs:

  • short ID (length between 1 and 64 theoretically)
  • ID (length of 64)

In other words, the length of a key in the map is 64 or 12. You won't find a key that its length is 23.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also just add a second entry to the map so on the lookup side it still works as before. I would not expect too much overhead as this map will probably not contain more then 1000 entries?

If it becomes an overhead, I would introduce a config option to enable short id's and have it not store in the map if it's off, but still use the same map if it's on. But we can do that if it's causes any issue.

// For example, by default container's hostname is set to the truncated container ID.
// To allow matching by truncated and non-truncated container IDs the map keys
// are the truncated version.
w.containers[event.Actor.ID[:truncateLen]] = container
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we store both, I assume this line would be changed to:

w.containers[event.Actor.ID] = container
w.containers[event.Actor.ID[:truncateLen]] = container

@ruflin
Copy link
Contributor

ruflin commented Feb 6, 2018

@ripcurld0 Do you have any time on your end to push this forward? I think it would be a great feature to have in.

@boaz0
Copy link
Contributor Author

boaz0 commented Feb 6, 2018

@ruflin I am going to push changes today.
Sorry for taking me long. 😊

}

func NewWatcherWithClient(client Client, cleanupTimeout time.Duration) (*watcher, error) {
func NewWatcherWithClient(client Client, cleanupTimeout time.Duration, storeShortID bool) (*watcher, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function NewWatcherWithClient should have comment or be unexported
exported func NewWatcherWithClient returns unexported type *docker.watcher, which can be annoying to use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually true, we might want to return "Watcher" (the interface) but I am afraid there are function that are not included in the interface which other packages would like to use.

@@ -77,10 +81,10 @@ type Client interface {
Events(ctx context.Context, options types.EventsOptions) (<-chan events.Message, <-chan error)
}

type WatcherConstructor func(host string, tls *TLSConfig) (Watcher, error)
type WatcherConstructor func(host string, tls *TLSConfig, storeShortID bool) (Watcher, error)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type WatcherConstructor should have comment or be unexported

}, nil
}

// Container returns the running container with the given ID or nil if unknown
func (w *watcher) Container(ID string) *Container {
w.RLock()
container := w.containers[ID]
_, ok := w.deleted[ID]
if container == nil {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Unlock won't be called in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱 thanks!

@exekias exekias dismissed their stale review February 9, 2018 09:51

code was reworked

@boaz0
Copy link
Contributor Author

boaz0 commented Feb 12, 2018

I ran some manual tests on my computer.

preparations

  • I ran ElasticSearch and Kibana (both version 6) inside containers:
$ docker run -d --name es -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.1.2
$ docker run -d --name kb -p 5601:5601 --link es:elasticsearch docker.elastic.co/kibana/kibana:6.1.2
  • Then, I created logs and config directories:
$ cd ~
$ mkdir {config,logs}

In logs all the containers log files are going to be stored and in config I am going to store filebeat.yml.

Lets start with the following filebeat.yml:

filebeat.prospectors:
    - type: log
      paths:
          - '/home/ripcurld0/logs/*/*'
      processors:
          - add_docker_metadata:
              match_source: false

output.elasticsearch:
    hosts: ['localhost:9200']

This should read all log files in logs and send them to ES.

  • Build a container to write data to a file.
    Since I can't just:
$ docker run -v ${HOME}/logs/:/tmp/logs busybox echo "data" > /tmp/logs/${HOSTNAME}/file.log

or

$ docker run -v ${HOME}/logs/:/tmp/logs busybox sh -c "echo data > /tmp/logs/${HOSTNAME}/file.log"

Because in the first one it will pipe it to my machine /tmp/logs/${HOSTNAME} and in the second one although it will put it in the container's /tmp/logs the hostname will be my machine hostname.

Thus, I had to build an image.

  1. created ~/dockerfile
$ mkdir ~/dockerfile
$ cd ~/dockerfile
  1. create Dockerfile:
FROM busybox
ADD ./init.sh /init.sh
RUN chmod +x /init.sh
ENTRYPOINT ["/init.sh"]
  1. create the init.sh script
#!/bin/sh
mkdir -p /tmp/logs/${HOSTNAME}
while [[ true ]]; do
    date >> /tmp/logs/${HOSTNAME}/file.log
    sleep 1s
done
  1. build image
$ docker build -t fbgen ~/dockerfile/.
  1. run a container
docker run  -d --rm -v /home/vagrant/logs:/tmp/logs fbgen

testing

  • Compiled my code:
$ cd ~/go/src/github.com/elastic/beats/filebeat
$ make clean && make filebeat
  • Running my local filebeat with the given config
$ ./filebeat -e -c ~/configs/filebeat.yml -d "publish"

The results:

2018-02-12T12:57:43.138Z        DEBUG   [publish]       pipeline/processor.go:275       Publish event: {                                                                                       
  "@timestamp": "2018-02-12T12:57:43.138Z",                                                                                                                                                    
  "@metadata": {                               
    "beat": "filebeat",                        
    "type": "doc",                             
    "version": "7.0.0-alpha1"                  
  },                                           
  "beat": {                                    
    "name": "elastic",                         
    "hostname": "elastic",                     
    "version": "7.0.0-alpha1"                  
  },                                           
  "source": "/home/ripcurld0/logs/196f8a55cb7d/data.txt",                                        
  "offset": 39266,                             
  "message": "Mon Feb 12 12:57:42 UTC 2018",   
  "prospector": {                              
    "type": "log"                              
  },                                           
  "event": {                                   
    "type": "log"                              
  }                                            
}
  • This seems right, now lets add docker_metadata:
cat ~/configs/filebeat.yml
filebeat.prospectors:
    - type: log
      paths:
          - '/home/ripcurld0/logs/*/*'
      processors:
          - add_docker_metadata:
              match_source: true
              match_source_index: 3

output.elasticsearch:
    hosts: ['localhost:9200']

Notice I am not enabling the short id options. So I expect that the published data won't have container's information.

  • Restarting filebeat and the results are the same:
2018-02-12T13:57:19.444Z        DEBUG   [publish]       pipeline/processor.go:275       Publish event: {
  "@timestamp": "2018-02-12T13:57:19.444Z",                                                    
  "@metadata": {                                                                               
    "beat": "filebeat",                                                                        
    "type": "doc",                                                                                                                                                                             
    "version": "7.0.0-alpha1"                                                                                                                                                                  
  },                                                                                           
  "beat": {                                                                                    
    "name": "elastic",                                                                         
    "hostname": "elastic",                                                                     
    "version": "7.0.0-alpha1"                                                                                                                                                                    },                                                                                                                                                                                             "source": "/home/ripcurld0/logs/196f8a55cb7d/file.log",                                                                                                                                          "offset": 19546,                                                                                                                                                                               "message": "Mon Feb 12 13:57:18 UTC 2018",                                                                                                                                                   
  "event": {                                                                                   
    "type": "log"                                                                                                                                                                              
  },                                           
  "prospector": {
    "type": "log"
  }
}

But to be sure this is because the container ID 196f8a55cb7d wasn't found I had to add "add_docker_metadata" into the -d option:

$ ./filebeat -e -c ~/configs/filebeat.yml -d "publish,add_docker_metadata"

and I do get the following:

2018-02-12T14:00:53.480Z        DEBUG   [add_docker_metadata]   add_docker_metadata/add_docker_metadata.go:169  Container not found: cid=196f8a55cb7d                                 [22/9786]
2018-02-12T14:00:53.480Z        DEBUG   [publish]       pipeline/processor.go:275       Publish event: {
....

that means that everything works like expected in add_docker_metadata as much as I can imagine.

  • Enabling short id:
cat ~/configs/filebeat.yml
filebeat.prospectors:
    - type: log
      paths:
          - '/home/ripcurld0/logs/*/*'
      processors:
          - add_docker_metadata:
              match_source: true
              match_source_index: 3
              match_short_id: true

output.elasticsearch:
    hosts: ['localhost:9200'] 
  • Running again filebeat (with add_docker_meta in -d)
    The results are:
2018-02-12T14:05:56.854Z        DEBUG   [publish]       pipeline/processor.go:275       Publish event: {                                                                                       
  "@timestamp": "2018-02-12T14:05:56.854Z",    
  "@metadata": {                               
    "beat": "filebeat",                        
    "type": "doc",                                                                             
    "version": "7.0.0-alpha1"                                                                  
  },                                                                                           
  "event": {                                                                                   
    "type": "log"                                                                              
  },                                                                                           
  "prospector": {                                                                              
    "type": "log"                                                                              
  },                                                                                           
  "docker": {                                                                                  
    "container": {                                                                             
      "id": "196f8a55cb7dce6953a9f79bb137511dac12edc787d2be00861accc080ae949b",                
      "image": "fbgen",                                                                        
      "name": "heuristic_tesla"                                                                
    }                                                                                          
  },                                                                                           
  "beat": {                                                                                    
    "name": "elastic",                                                                         
    "hostname": "elastic",                                                                     
    "version": "7.0.0-alpha1"                                                                  
  },                                                                                           
  "message": "Mon Feb 12 14:05:56 UTC 2018",                                                   
  "source": "/home/ripcurld0/logs/196f8a55cb7d/file.log",                                        
  "offset": 34539                              
}                                              

As you can see here no error is shown on add_docker_metadata which is great.
In addition to that docker.container has the container information (name, image and the long ID).

Notes

if you have other ideas, feel free to let me know.
I used similar configs to @chrisregnier configuration.

I think this is now more than WIP, although any design and implementation feedback/review is welcome!

(If we can have more eyes on this, that will be super awesome)
@ruflin @andrewkroh @exekias @chrisregnier

@boaz0 boaz0 changed the title [WIP] Do a prefix match on container ID in docker watcher Add short ID matching to add_docker_metadata Feb 12, 2018
Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉, could you update docs and add a changelog entry?

@exekias
Copy link
Contributor

exekias commented Feb 12, 2018

ok to test

@boaz0
Copy link
Contributor Author

boaz0 commented Feb 12, 2018

sure @exekias

@ruflin ruflin removed the in progress Pull request is currently in progress. label Feb 13, 2018
container short ID are a common way to represent containers.
The short ID length is 12 characters while the long one is 64
which makes it more human readable.

Previous to this patch, if the folder name was set to
the containers' short IDs the match would have failed.
For example, putting the container logs in ./container/${HOSTNAME}
since ${HOSTNAME} inside the container is set to its short ID by
default then the docker metadata won't be added to the log lines.

The user has to explicetly specify short ID match by adding
setting "match_short_id" to true in the configuration file.

If "match_short_id" is not given in the configuration or is set
to false, this feature is disabled.

Signed-off-by: Boaz Shuster <ripcurld.github@gmail.com>
@chrisregnier
Copy link

This is fantastic! Thanks @ripcurld0

Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for taking the time to implement this!

@exekias exekias merged commit ef25ecf into elastic:master Feb 19, 2018
@boaz0
Copy link
Contributor Author

boaz0 commented Feb 19, 2018

Thanks a lot @exekias @ruflin @chrisregnier @andrewkroh for your review.
Looking forward to contribute more to this amazing project! 🎉 🌈 🍰

@boaz0 boaz0 deleted the fix_6092 branch February 19, 2018 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] add_docker_metadata should accept short ids for containers
7 participants