-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Agents are intermittently showing as off-line in Kibana Fleet #21025
Comments
Pinging @elastic/ingest-management (Team:Ingest Management) |
@nchaulet Could it be linked to the performance changes we did? |
It could be linked yes looks like the same events are send again and again, maybe the change I made to have a timeout of 5 minutes is not working here. I am doing some test to check what is happening here |
Just did a test against https://kibana.endpoint.elastic.dev/ and my agent is well reported as online @EricDavisX Do you know how those agents are runned? somewhere on server running as a service? and do we have logs of these agents? |
I do know! our wiki page at /display/DEV/Endpoint+and+Ingest+Nightly+Dev+Demo+Server has details. it will show something like this in ansible: - name: Create install directory
file:
path: "{{ install_dir_linux }}"
mode: "0755"
state: directory
- name: Set download url
set_fact:
agent_url: "{{ snapshots.json | json_query('packages.\"' + agent_handle_linux + '.tar.gz\".url') }}"
- name: Download and Extract Agent zip
unarchive:
remote_src: yes
src: "{{ agent_url }}"
dest: "{{ install_dir_linux }}"
- name: Enroll the agent
become: yes
shell: "{{ install_dir_linux }}/{{ agent_handle_linux }}/elastic-agent enroll -f https://{{ kibana_username }}:{{ kibana_password }}@kibana.{{ domain_name }}:443 {{ enroll_token }}"
- name: Create the service file
template:
dest: /etc/systemd/system/fleet-agent.service
src: fleet-agent.service.j2
mode: '0644'
register: service_file
- name: reload systemd configs to pickup changes
systemd:
daemon_reload: yes
when: service_file.changed
- name: restart fleet-agent service
systemd:
name: fleet-agent.service
state: restarted
enabled: yes It has worked prior, and seems still working, to start agent (just not sure how long they'll stay up?) |
also @nchaulet if you used a 7.8 Agent its totally cheating. :) Can we confirm again and keep researching with a full 8.0 env? Its helpful to know tho that the older Agent works, it means the problem is indeed maybe on the Agent side. |
Got a repro locally with a timeout, my bad I did not check with @michalpristas or @blakerouse what is the timeout for the request checkin, @michalpristas how complicated it is to modify the timeout for the checkin request? (it's set to 5 minutes on kibana side).
While we have a proper fix this can be fixed by adding this to
|
@nchaulet not complicated at all will prepare a PR. |
@nchaulet if I understand correctly this issue should be assigned to @michalpristas ? |
[Fleet] - Agents are reported as going off-line as seen in Kibana Fleet, I'm seeing lots of timestamps in the Agent Activity log with the same time. Is it somehow expected? Seems strange. Some screenshots are below...
tested on:
https://kibana.endpoint.elastic.dev/app/ingestManager#/fleet/agents/946f0178-31e6-4e2d-be4b-556320bb55e0
as of now, its running code from Sept 3 (now is Sept 8th).
edavis-mbp:kibana_elastic edavis$ git show -s 60986d4f8202016c98409c2926ccf29d9d2ee7e0
commit 60986d4f8202016c98409c2926ccf29d9d2ee7e0
Author: Yuliia Naumenko jo.naumenko@gmail.com
Date: Thu Sep 3 13:07:23 2020 -0700
maybe related to e2e test cited bug (logged against Stand-alone mode, but maybe its bigger than we knew) #20992
the 'type ahead' to get more / better info from the logs thru ingest is not working currently, logged separately. I can dig in to get more logs from the agent hosts later, if help is needed - but no need for it to sit idle waiting on me, so I'm dropping it into the system.
seem to impact both Agents that have Endpoint and those that don't. But all Endpoint integrations seem up and alive in the Security app, so Agent must be basically ok!?
screenshots:
timestamps are repeated...
impacting both endpoint + non-endpoint enabled hosts:
The text was updated successfully, but these errors were encountered: