Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Kata-runtime: fluentd log parsing #985

Closed
mcastelino opened this issue Dec 6, 2018 · 7 comments
Closed

Kata-runtime: fluentd log parsing #985

mcastelino opened this issue Dec 6, 2018 · 7 comments

Comments

@mcastelino
Copy link
Contributor

Description of problem

When parsing kata-runtime logs in the journal with fluentd we see an error.

Kata is setup using
https://github.com/clearlinux/cloud-native-setup/tree/master/clr-k8s-examples

And kata is launched using

https://github.com/clearlinux/cloud-native-setup/blob/master/clr-k8s-examples/tests/test-deploy-kata.yaml

The fluentd log filters are setup via the filter below which is current disabled.

https://github.com/clearlinux/cloud-native-setup/blob/master/clr-k8s-examples/3-efk/fluentd-es-configmap.yaml#L333

Expected result

fluentd should be able to parse the kata component logs and show up in Kibana

Actual result

error_class=Fluent::Plugin::ConcatFilter::TimeoutError error="Timeout flush: kata-containers:default" location=nil tag="kata-containers" time=2018-12-05 23:00:38.681639080 +0000 record={"transport"=>"syslog", "priority"=>"6", "syslog_identifier"=>"kata-runtime", "uid"=>"0", "gid"=>"0", "comm"=>"kata-runtime", "exe"=>"/usr/bin/kata-runtime", "cap_effective"=>"3fffffffff", "boot_id"=>"4f66d23c4ee34a9da060c3f38decf242", "machine_id"=>"5ac9126cbadf4b7f8fec2de0499aa3a1", "hostname"=>"clr-01", "systemd_slice"=>"system.slice", "systemd_cgroup"=>"/system.slice/crio.service", "systemd_unit"=>"crio.service", "systemd_invocation_id"=>"816746216f5c413bb09a1ba3fbb490be", "syslog_pid"=>"20384", "message"=>"time=\"2018-12-05T22:56:08.118478582Z\" level=info msg=\"loaded configuration\" arch=amd64 command=kill file=/usr/share/defaults/kata-containers/configuration.toml format=TOML name=kata-runtime pid=20384 source=runtimetime=\"2018-12-05T22:56:08.118783187Z\" level=info arch=amd64 arguments=\"\\\"kill e7c989838600ab33da0abd65c9f60ee9c4e293cf3000ba97b71ee98ade31ea5b TERM\\\"\" command=kill commit= name=kata-runtime pid=20384 source=runtime version=1.3.1time=\"2018-12-05T22:56:08.11893917Z\" level=info msg=\"fetch sandbox\" arch=amd64 command=kill container=e7c989838600ab33da0abd65c9f60ee9c4e293cf3000ba97b71ee98ade31ea5b name=kata-runtime pid=20384 source=virtcontainerstime=\"2018-12-05T22:56:08.121436131Z\" level=info msg=\"endpoint unmarshalled\" arch=amd64 command=kill container=e7c989838600ab33da0abd65c9f60ee9c4e293cf3000ba97b71ee98ade31ea5b endpoint=\"{{ca8d0db9-72ae-412d-ac74-4379aea38f20 br0_kata {eth0 02:00:ca:fe:00:00 [10.244.0.80/32 eth0]} {tap0_kata 8e:48:4e:a5:1d:53 []} 2 [0xc00000e170] [0xc00000e178]} {{{3 1500 0 eth0 8e:48:4e:a5:1d:53 up|broadcast|multicast 69699 83 0 <nil>  0xc0002680c0 0 0xc000037440 ether <nil> up 0 0 0} veth} [10.244.0.80/32 eth0 fe80::8c48:4eff:fea5:1d53/64] [{Ifindex: 3 Dst: <nil> Src: <nil> Gw: 169.254.1.1 Flags: [] Table: 254} {Ifindex: 3 Dst: 169.254.1.1/32 Src: <nil> Gw: <nil> Flags: [] Table: 254} {Ifindex: 3 Dst: fe80::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254} {Ifindex: 3 Dst: ff00::/8 Src: <nil> Gw: <nil> Flags: [] Table: 254}] {[]  [] []}} false virtual }\" endpoint-type=virtual name=kata-runtime pid=20384 source=virtcontainers subsystem=networktime=\"2018-12-05T22:56:08.122298435Z\" level=info msg=\"release sandbox\" arch=amd64 command=kill container=e7c989838600ab33da0abd65c9f60ee9c4e293cf3000ba97b71ee98ade31ea5b name=kata-runtime pid=20384 sandbox=1823ac75b01f637a1b60894c803735ee0025b5ccf70e2a10ddbcecdf864ee6f9 source=virtcontainers subsystem=sandboxtime=\"2018-12-05T22:56:08.122385682Z\" level=info msg=\"fetch sandbox\" arch=amd64 command=kill container=e7c989838600ab33da0abd65c9f60ee9c4e293cf3000ba97b71ee98ade31ea5b name=kata-runtime pid=20384 sandbox=1823ac75b01f637a1b60894c803735ee0025b5ccf70e2a10ddbcecdf864ee6f9 source=virtcontainerstime=\"2018-12-05T22:56:08.124017537Z\" level=info msg=\"endpoint unmarshalled\" arch=amd64 command=kill container=e7c989838600ab33da0abd65c9f60ee9c4e293cf3000ba97b71ee98ade31ea5b endpoint=\"{{ca8d0db9-72ae-412d-ac74-4379aea38f20 br0_kata {eth0 02:00:ca:fe:00:00 [10.244.0.80/32 eth0]} {tap0_kata 8e:48:4e:a5:1d:53 []} 2 [0xc00000e260] [0xc00000e268]} {{{3 1500 0 eth0 8e:48:4e:a5:1d:53 

(replace this text with the output of the kata-collect-data.sh script, after
you have reviewed its content to ensure it does not contain any private
information).

@mcastelino
Copy link
Contributor Author

/cc @jodh-intel @krsna1729

@krsna1729
Copy link

@jodh-intel running this on the node works

sudo journalctl SYSLOG_IDENTIFIER=kata-runtime + SYSLOG_IDENTIFIER=kata-proxy + SYSLOG_IDENTIFIER=kata-shim

Also about 100 lines of logs are showing up as a single message in the fluentd container running on the node with concat error above. Wondering if this is related to multi-line, flush interval etc., settings like shown here
https://github.com/clearlinux/cloud-native-setup/blob/master/clr-k8s-examples/3-efk/fluentd-es-configmap.yaml#L196

@mcastelino
Copy link
Contributor Author

mcastelino commented Dec 6, 2018

@jodh-intel @krsna1729 yes the node works properly. You may be right about the multiline issue. Also it is possible that kernel logs get in between the multi line logs?

@jodh-intel
Copy link
Contributor

The kernel boot messages are captured in the proxy log and could upset fluentd. That's currently an unsolved problem though: kata-containers/agent#255.

@grahamwhaley
Copy link
Contributor

@jodh-intel - OOI, are the kernel logs inter-mingled with the other logs still an issue? I see kata-containers/agent#255 is still open for instance :-(

@jodh-intel
Copy link
Contributor

@grahamwhaley - yep - still a problem afaik.

@grahamwhaley
Copy link
Contributor

Minor update. In order to see the fluentd config yaml that was referenced originally, you need to check out the v1 tag of that repo - I'll paste the yaml here for easier reference...

    # <source>
    #   @type systemd
    #   path /var/log/journal
    #   matches [{"SYSLOG_IDENTIFIER": "kata-runtime"}, {"SYSLOG_IDENTIFIER": "kata-proxy"}, {"SYSLOG_IDENTIFIER": "kata-shim"}]
    #   tag kata-containers
    #   <entry>
    #     fields_strip_underscores true
    #     fields_lowercase true
    #   </entry>
    # </source>

I'm going to see if I can reproduce this, and contemplate if/what/anything can be done, probably at the kata end. Our logs should be ingestible...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants