Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Icinga 2 reconnects in a loop for self-signed certificates #7680

Closed
lazyfrosch opened this issue Dec 3, 2019 · 4 comments · Fixed by #7686
Closed

Icinga 2 reconnects in a loop for self-signed certificates #7680

lazyfrosch opened this issue Dec 3, 2019 · 4 comments · Fixed by #7686
Labels
area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working ref/NC
Milestone

Comments

@lazyfrosch
Copy link
Contributor

I see lots of reconnects in an environment, where I have lots of agents that haven't been set up correctly and the certificate is not signed yet, or by another CA.

Icinga 2 reconnects until the open file limit is reached (16384) !!

This might be related to #7532

For now I had to turn of the master connecting to anything...

ref/NC/622991

lsof

$ (for pid in $(pidof icinga2); do lsof -p $pid; done) | wc -l
16527

$ (for pid in $(pidof icinga2); do lsof -p $pid; done) | grep TCP | awk '{ print $9 }' | cut -d'>' -f2 | sort | uniq -c
     84 win19.company.com:5665
     84 win24.company.com:5665
     84 win25.company.com:5665
     84 win26.company.com:5665
     85 win27.company.com:5665
     86 win44.company.com:5665
     84 win47.company.com:5665
     84 win49.company.com:5665
...

log

[2019-12-03 10:21:43 +0100] information/ApiListener: Finished reconnecting to endpoint 'win53.company.com' via host '10.10.104.20' and port '5665'
[2019-12-03 10:21:52 +0100] information/ApiListener: Reconnecting to endpoint 'win53.company.com' via host '10.10.104.20' and port '5665'
[2019-12-03 10:21:53 +0100] warning/ApiListener: Certificate validation failed for endpoint 'win53.company.com': code 18: self signed certificate
[2019-12-03 10:21:53 +0100] information/ApiListener: New client connection for identity 'win53.company.com' to [10.10.104.20]:5665 (certificate validation failed: code 18: self signed certificate)
[2019-12-03 10:21:53 +0100] information/ApiListener: Finished reconnecting to endpoint 'win53.company.com' via host '10.10.104.20' and port '5665'
[2019-12-03 10:22:02 +0100] information/ApiListener: Reconnecting to endpoint 'win53.company.com' via host '10.10.104.20' and port '5665'
[2019-12-03 10:22:12 +0100] information/ApiListener: Reconnecting to endpoint 'win53.company.com' via host '10.10.104.20' and port '5665'
[2019-12-03 10:22:22 +0100] information/ApiListener: Reconnecting to endpoint 'win53.company.com' via host '10.10.104.20' and port '5665'
[2019-12-03 10:22:32 +0100] information/ApiListener: Reconnecting to endpoint 'win53.company.com' via host '10.10.104.20' and port '5665'
[2019-12-03 10:22:42 +0100] information/ApiListener: Reconnecting to endpoint 'win53.company.com' via host '10.10.104.20' and port '5665'
[2019-12-03 10:22:52 +0100] information/ApiListener: Reconnecting to endpoint 'win53.company.com' via host '10.10.104.20' and port '5665'
[2019-12-03 10:23:02 +0100] information/ApiListener: Reconnecting to endpoint 'win53.company.com' via host '10.10.104.20' and port '5665'

Your Environment

Include as many relevant details about the environment you experienced the problem in

  • Version used (icinga2 --version): 2.11.2-1
  • Operating System and version: SLES 12.4
@lazyfrosch
Copy link
Contributor Author

[2019-12-03 10:57:24 +0100] critical/ApiListener: Cannot accept new connection: Too many open files
[2019-12-03 10:57:24 +0100] critical/ApiListener: Cannot accept new connection: Too many open files
[2019-12-03 10:57:24 +0100] critical/ApiListener: Cannot accept new connection: Too many open files
[2019-12-03 10:57:24 +0100] critical/ApiListener: Cannot accept new connection: Too many open files

@lazyfrosch lazyfrosch added area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working labels Dec 3, 2019
@lazyfrosch
Copy link
Contributor Author

@lippserd This could be very deeply related to #7532

@dnsmichi
Copy link
Contributor

dnsmichi commented Dec 3, 2019

I think that's related to #7654 and #7650 - can you confirm @mcktr?

mcktr added a commit that referenced this issue Dec 3, 2019
This closes the agent connection when the certificate sign requests
waits for CA approval.

refs #7680
@mcktr
Copy link
Member

mcktr commented Dec 3, 2019

Unfortunately this is not related to the TLS context issue.

This issue is about connections which are held open when the Agent/Client is waiting for the certificate sign approval. Every 10 seconds the reconnect timer jumps in and opens a connection, but won't close it.

You can reproduce this with the following setup:

Master:

sudo docker run -ti -h deb10i2m1 -p 5665:5665 debian:buster /bin/bash

apt-get update && apt-get upgrade -y && apt-get install wget gnupg2 ca-certificates vim apt-transport-https -y && echo "deb https://packages.icinga.com/debian/ icinga-buster main" > /etc/apt/sources.list.d/icinga.list && wget -O - https://packages.icinga.com/icinga.key | apt-key add - && apt-get update && apt-get install icinga2 monitoring-plugins -y && /usr/lib/icinga2/prepare-dirs

apt-get install lsof

icinga2 node setup --master --disable-confd

vim /etc/icinga2/zones.conf

object Endpoint "deb10i2m1" {
}

object Zone "master" {
	endpoints = [ "deb10i2m1" ]
}

object Zone "global-templates" {
	global = true
}

object Zone "director-global" {
	global = true
}

object Endpoint "deb10i2a1" {
	host = "172.17.0.3"
}

object Zone "deb10i2a1" {
	parent = "master"
	endpoints = [ "deb10i2a1" ]
}

service icinga2 start

Agent:

sudo docker run -ti -h deb10i2a1 debian:buster /bin/bash

apt-get update && apt-get upgrade -y && apt-get install wget gnupg2 ca-certificates vim apt-transport-https -y && echo "deb https://packages.icinga.com/debian/ icinga-buster main" > /etc/apt/sources.list.d/icinga.list && wget -O - https://packages.icinga.com/icinga.key | apt-key add - && apt-get update && apt-get install icinga2 monitoring-plugins -y && /usr/lib/icinga2/prepare-dirs

icinga2 node wizard

Welcome to the Icinga 2 Setup Wizard!

We will guide you through all required configuration details.

Please specify if this is an agent/satellite setup ('n' installs a master setup) [Y/n]:  

Starting the Agent/Satellite setup routine...

Please specify the common name (CN) [deb10i2a1]: 

Please specify the parent endpoint(s) (master or satellite) where this node should connect to:
Master/Satellite Common Name (CN from your master/satellite node): deb10i2m1

Do you want to establish a connection to the parent node from this node? [Y/n]:
Please specify the master/satellite connection information:
Master/Satellite endpoint host (IP address or FQDN): 172.17.0.2
Master/Satellite endpoint port [5665]: 

Add more master/satellite endpoints? [y/N]: 
Parent certificate information:

 Subject:     CN = deb10i2m1
 Issuer:      CN = Icinga CA
 Valid From:  Dec  3 19:58:26 2019 GMT
 Valid Until: Nov 29 19:58:26 2034 GMT
 Fingerprint: A6 2F 9A 3D F1 45 B7 4D DE EA 41 29 52 3E 3D B4 B3 8B 83 0B 

Is this information correct? [y/N]: y

Please specify the request ticket generated on your Icinga 2 master (optional).
 (Hint: # icinga2 pki ticket --cn 'deb10i2a1'): 

No ticket was specified. Please approve the certificate signing request manually
on the master (see 'icinga2 ca list' and 'icinga2 ca sign --help' for details).
Please specify the API bind host/port (optional):
Bind Host []: 
Bind Port []: 

Accept config from parent node? [y/N]: y
Accept commands from parent node? [y/N]: y

Reconfiguring Icinga...
Disabling feature notification. Make sure to restart Icinga 2 for these changes to take effect.
Enabling feature api. Make sure to restart Icinga 2 for these changes to take effect.

Local zone name [deb10i2a1]: 
Parent zone name [master]: 

Default global zones: global-templates director-global
Do you want to specify additional global zones? [y/N]: 

Do you want to disable the inclusion of the conf.d directory [Y/n]: 
Disabling the inclusion of the conf.d directory...

Done.

Now restart your Icinga 2 daemon to finish the installation!

icinga2 daemon

On the master we started the Icinga 2 daemon in the background so we can monitor the open files/connections:

(for pid in $(pidof icinga2); do lsof -p $pid; done) | wc -l

You will notice that the open files will increase over time (every ~10s by one).

When you close the Icinga 2 daemon on the Agent you will also notice that the connections on the master got closed:

[...]
[2019-12-03 20:10:53 +0000] warning/JsonRpcConnection: API client disconnected for identity 'deb10i2a1'
[2019-12-03 20:10:53 +0000] warning/JsonRpcConnection: API client disconnected for identity 'deb10i2a1'
[2019-12-03 20:10:53 +0000] warning/JsonRpcConnection: API client disconnected for identity 'deb10i2a1'
[2019-12-03 20:10:53 +0000] warning/JsonRpcConnection: API client disconnected for identity 'deb10i2a1'
[2019-12-03 20:10:53 +0000] warning/JsonRpcConnection: API client disconnected for identity 'deb10i2a1'

(If you wait longer there will be more closed connections).

@lazyfrosch lazyfrosch added this to the 2.12.0 milestone Apr 7, 2020
Al2Klimov pushed a commit that referenced this issue Sep 11, 2020
This closes the agent connection when the certificate sign requests
waits for CA approval.

refs #7680
@Al2Klimov Al2Klimov modified the milestones: 2.12.0, 2.11.6 Sep 11, 2020
N-o-X pushed a commit that referenced this issue Oct 13, 2020
This closes the agent connection when the certificate sign requests
waits for CA approval.

refs #7680
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working ref/NC
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants