[Ingest Management] Can't update the system package #82580

mtojek · 2020-11-04T09:00:18Z

Hi,

it looks like the following command doesn't work as intended (or in the specific condition):

$ curl -u elastic:${PASSWORD} -k -X POST https://<host>:443/api/fleet/epm/packages/system-0.9.0 -H 'kibana-xsf: blah' -H 'kbn-xsrf: blah' -H 'Content-Type: application/json' -d '{ "force": true }'

Result:

{"statusCode":502,"error":"Bad Gateway","message":"'404 Not Found' error response from package registry at https://epr-staging.elastic.co/package/system/0.7.0/"}

Expected result - the system package "0.7.0" may not exist, but Kibana should install the selected one, which is available.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-11-04T09:00:21Z

Pinging @elastic/ingest-management (Team:Ingest Management)

ruflin · 2020-11-04T09:31:34Z

@neptunian Could you have a look at the above and comment on the expected behaviour?

skh · 2020-11-04T11:27:19Z

My current theory is that the update failed and the call to the registry for the old package happened during rollback.

This is a known issue and will be fixed with #81110

Were there other errors in the Kibana log before the 404?

mtojek · 2020-11-04T11:36:52Z

Not sure how to check logs this cloud environment, maybe @kuisathaverat can help here. AFAIK nothing was found during the investigation yesterday.

kuisathaverat · 2020-11-04T11:46:58Z

I'll send you the instructions on slack

skh · 2020-11-04T11:56:17Z

There are errors of this type before the 404s that looks related:

09:44:18.000
kibana.log
{ Error: Saved object [dashboard/system-0d3f2380-fa78-11e6-ae9b-81e5311e8cab] not found
    at Function.createGenericNotFoundError (/usr/share/kibana/src/core/server/saved_objects/service/lib/errors.js:136:37)
    at SavedObjectsRepository.delete (/usr/share/kibana/src/core/server/saved_objects/service/lib/repository.js:574:46)
    at process._tickCallback (internal/process/next_tick.js:68:7)
  data: null,
  isBoom: true,
  isServer: false,
  output:
   { statusCode: 404,
     payload:
      { statusCode: 404,
        error: 'Not Found',
        message:
         'Saved object [dashboard/system-0d3f2380-fa78-11e6-ae9b-81e5311e8cab] not found' },
     headers: {} },
  reformat: [Function],
  typeof: [Function: notFound],
  [Symbol(SavedObjectsClientErrorCode)]: 'SavedObjectsClient/notFound' }

Also for search/system-eb0039f0-fa7f-11e6-a1df-a78bd7504d38 and dashboard/system-277876d0-fa2c-11e6-bbd3-29c986c96e5a

kuisathaverat · 2020-11-04T11:59:33Z

I see tons of logs like this one

{"type":"log","@timestamp":"2020-11-04T11:55:25+00:00","tags":["info","plugins","ingestManager"],"pid":6,"message":"Custom registry url is an experimental feature and is unsupported."}

and when I access to fleet these two

{"type":"log","@timestamp":"2020-11-04T11:57:36+00:00","tags":["error","plugins","ingestManager"],"pid":6,"message":"[cluster_block_exception] index [.transform-internal-005] blocked by: [FORBIDDEN/8/index write (api)]; response from /_transform/endpoint.metadata_current-default-0.16.1: {\"error\":{\"root_cause\":[{\"type\":\"cluster_block_exception\",\"reason\":\"index [.transform-internal-005] blocked by: [FORBIDDEN/8/index write (api)];\"}],\"type\":\"runtime_exception\",\"reason\":\"runtime_exception: Failed to persist transform configuration\",\"caused_by\":{\"type\":\"cluster_block_exception\",\"reason\":\"index [.transform-internal-005] blocked by: [FORBIDDEN/8/index write (api)];\"}},\"status\":500}"}

{"type":"error","@timestamp":"2020-11-04T11:56:50+00:00","tags":[],"pid":6,"level":"error","error":{"message":"Internal Server Error","name":"Error","stack":"Error: Internal Server Error\n at HapiResponseAdapter.toError (/usr/share/kibana/src/core/server/http/router/response_adapter.js:132:19)\n at HapiResponseAdapter.toHapiResponse (/usr/share/kibana/src/core/server/http/router/response_adapter.js:86:19)\n at HapiResponseAdapter.handle (/usr/share/kibana/src/core/server/http/router/response_adapter.js:81:17)\n at Router.handle (/usr/share/kibana/src/core/server/http/router/router.js:164:34)\n at process._tickCallback (internal/process/next_tick.js:68:7)"},"url":"https://dev-next-oblt.elastic.dev/api/fleet/setup","message":"Internal Server Error"}

kuisathaverat · 2020-11-04T12:01:30Z

I have checked the status of the Elasticsearch cluster is red this can explain why can not write that index

{
  "cluster_name" : "XXXXXXXXXX",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 12,
  "number_of_data_nodes" : 6,
  "active_primary_shards" : 768,
  "active_shards" : 1168,
  "relocating_shards" : 0,
  "initializing_shards" : 11,
  "unassigned_shards" : 238,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 26,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 773,
  "active_shards_percent_as_number" : 82.42766407904023
}

kuisathaverat · 2020-11-04T12:37:43Z

with the cluster in yellow, we have the same error

07:34:35.000
kibana.log
[cluster_block_exception] index [.transform-internal-005] blocked by: [FORBIDDEN/8/index write (api)]; response from /_transform/endpoint.metadata_current-default-0.16.1: {"error":{"root_cause":[{"type":"cluster_block_exception","reason":"index [.transform-internal-005] blocked by: [FORBIDDEN/8/index write (api)];"}],"type":"runtime_exception","reason":"runtime_exception: Failed to persist transform configuration","caused_by":{"type":"cluster_block_exception","reason":"index [.transform-internal-005] blocked by: [FORBIDDEN/8/index write (api)];"}},"status":500}

kuisathaverat · 2020-11-04T12:38:27Z

The index is frozen, I think we reported it before, it does not have alias neither ILM

after unfroze the index the issue is gonne

skh · 2020-11-04T14:12:30Z

As the symptom in the original description will be addressed by #81110 , can we close this one?

mtojek · 2020-11-04T14:19:07Z

Actually it's up to the team. I didn't dive deeper into the Kibana issue, but I assume that the package won't disappear once installed, right? It isn't a temporary cache?

If you decide to close this issue, I suggest to prioritize the other one, because it's a relatively common use case to replace staged packages with new versions (e.g. accumulate many snapshots until we have a major version. Snapshots will disappear once the package is promoted).

EDIT:

I'm not quite sure about the root cause of #82580 (comment) , it looks like it's a bit different and related with endpoint. /cc @jonathan-buttner

nnamdifrankie · 2020-11-04T15:32:52Z

@kuisathaverat

{"type":"log","@timestamp":"2020-11-04T11:57:36+00:00","tags":["error","plugins","ingestManager"],"pid":6,"message":"[cluster_block_exception] index [.transform-internal-005] blocked by: [FORBIDDEN/8/index write (api)]; response from /_transform/endpoint.metadata_current-default-0.16.1: {\"error\":{\"root_cause\":[{\"type\":\"cluster_block_exception\",\"reason\":\"index [.transform-internal-005] blocked by: [FORBIDDEN/8/index write (api)];\"}],\"type\":\"runtime_exception\",\"reason\":\"runtime_exception: Failed to persist transform configuration\",\"caused_by\":{\"type\":\"cluster_block_exception\",\"reason\":\"index [.transform-internal-005] blocked by: [FORBIDDEN/8/index write (api)];\"}},\"status\":500}"}

{"type":"error","@timestamp":"2020-11-04T11:56:50+00:00","tags":[],"pid":6,"level":"error","error":{"message":"Internal Server Error","name":"Error","stack":"Error: Internal Server Error\n at HapiResponseAdapter.toError (/usr/share/kibana/src/core/server/http/router/response_adapter.js:132:19)\n at HapiResponseAdapter.toHapiResponse (/usr/share/kibana/src/core/server/http/router/response_adapter.js:86:19)\n at HapiResponseAdapter.handle (/usr/share/kibana/src/core/server/http/router/response_adapter.js:81:17)\n at Router.handle (/usr/share/kibana/src/core/server/http/router/router.js:164:34)\n at process._tickCallback (internal/process/next_tick.js:68:7)"},"url":"https://dev-next-oblt.elastic.dev/api/fleet/setup","message":"Internal Server Error"}

In regards to this error I reached out to the ML team and here is their response of a likely cause

afaik this happens when the disk gets full. Similar issues happen for .kibana. Does the problem persist? I think in former versions of ES you had to manually unblock an index after an out of diskspace, but they introduced a fix for that which automatically makes indexes writable again after disk space is available again

I also recall in the channel that there are a lot of documents on the server, perhaps we should try resize the host and disk or clearing up disk space on the machine.

ph · 2020-11-04T15:37:49Z

This issue starts to be confusing, the initial description of this issue seems to be because the package is gone. as @skh mentioned this will be addressed by #81110.

So I think I would close that. @neptunian Can you double check?

The installation problem as mentioned in @nnamdifrankie should be in another issue.

nnamdifrankie · 2020-11-04T16:45:16Z

@kuisathaverat

Please can you create a ticket for the frozen index issue. In that ticket we should evaluate the physical health of the server, e.g. disk space e.t.c. I did not setup that server so I will defer to you on proceeding.

kuisathaverat · 2020-11-04T16:54:21Z

@kuisathaverat

{"type":"log","@timestamp":"2020-11-04T11:57:36+00:00","tags":["error","plugins","ingestManager"],"pid":6,"message":"[cluster_block_exception] index [.transform-internal-005] blocked by: [FORBIDDEN/8/index write (api)]; response from /_transform/endpoint.metadata_current-default-0.16.1: {\"error\":{\"root_cause\":[{\"type\":\"cluster_block_exception\",\"reason\":\"index [.transform-internal-005] blocked by: [FORBIDDEN/8/index write (api)];\"}],\"type\":\"runtime_exception\",\"reason\":\"runtime_exception: Failed to persist transform configuration\",\"caused_by\":{\"type\":\"cluster_block_exception\",\"reason\":\"index [.transform-internal-005] blocked by: [FORBIDDEN/8/index write (api)];\"}},\"status\":500}"}

{"type":"error","@timestamp":"2020-11-04T11:56:50+00:00","tags":[],"pid":6,"level":"error","error":{"message":"Internal Server Error","name":"Error","stack":"Error: Internal Server Error\n at HapiResponseAdapter.toError (/usr/share/kibana/src/core/server/http/router/response_adapter.js:132:19)\n at HapiResponseAdapter.toHapiResponse (/usr/share/kibana/src/core/server/http/router/response_adapter.js:86:19)\n at HapiResponseAdapter.handle (/usr/share/kibana/src/core/server/http/router/response_adapter.js:81:17)\n at Router.handle (/usr/share/kibana/src/core/server/http/router/router.js:164:34)\n at process._tickCallback (internal/process/next_tick.js:68:7)"},"url":"https://dev-next-oblt.elastic.dev/api/fleet/setup","message":"Internal Server Error"}

In regards to this error I reached out to the ML team and here is their response of a likely cause

afaik this happens when the disk gets full. Similar issues happen for .kibana. Does the problem persist? I think in former versions of ES you had to manually unblock an index after an out of diskspace, but they introduced a fix for that which automatically makes indexes writable again after disk space is available again

I also recall in the channel that there are a lot of documents on the server, perhaps we should try resize the host and disk or clearing up disk space on the machine.

This cluster has about 5TB of disk space we are using about 3TB, so I think we are far away to fill the disk, also if we run out of disk space everything blows up (I know from experience), something else has to trigger this index freeze

kuisathaverat · 2020-11-04T16:56:57Z

@kuisathaverat

Please can you create a ticket for the frozen index issue. In that ticket we should evaluate the physical health of the server, e.g. disk space e.t.c. I did not setup that server so I will defer to you on proceeding.

@nnamdifrankie
On which repo should I create the issue? Do I have to add any labels to the issue?

kevinlog · 2020-11-04T17:06:34Z

@kuisathaverat

On which repo should I create the issue? Do I have to add any labels to the issue?

You can create it in the Kibana public repo and add label "Team:Onboarding and Lifecycle Mgt"

neptunian · 2020-11-05T17:46:42Z

This issue starts to be confusing, the initial description of this issue seems to be because the package is gone. as @skh mentioned this will be addressed by #81110.

So I think I would close that. @neptunian Can you double check?

Agreed that if there is an error installing some version of a package it will try to rollback. There should have been log messages describing the problem, that a rollback was being attempted, and that it failed.

mtojek · 2020-11-05T18:49:43Z

I understand that there should be an error stored in logs, but on the other hand the one reported in the REST response is really confusing. The user tries to install 0.9.0 and receives a response reporting a problem with non-existing 0.7.0. Do you think we can improve the error message?

neptunian · 2020-11-06T14:17:00Z

I understand that there should be an error stored in logs, but on the other hand the one reported in the REST response is really confusing. The user tries to install 0.9.0 and receives a response reporting a problem with non-existing 0.7.0. Do you think we can improve the error message?

I tested the scenario out and the response I get is:

{
    "statusCode": 500,
    "error": "Internal Server Error",
    "message": "blee blah blah"
}

The response was the error that caused the rollback to begin with. I'm not sure this is the best message either but I don't get a mention about the other package.

The logs looked like:

server    log   [08:53:11.536] [error][ingestManager][plugins] Error: blee blah blah
    at _installPackage (/Users/sandy/dev/elastic/kibana/x-pack/plugins/ingest_manager/server/services/epm/packages/_install_package.ts:105:37)
    at process._tickCallback (internal/process/next_tick.js:68:7)
server    log   [08:53:11.541] [error][ingestManager][plugins] rolling back to nginx-0.2.3 after error installing nginx-0.2.4
server    log   [09:06:34.146] [error][ingestManager][plugins] failed to uninstall or rollback package after installation error RegistryResponseError: '400 Bad Request' error response from package registry at https://epr-snapshot.elastic.co/package/nginx/0.2.3
server   error  [08:53:10.220]  Error: Internal Server Error
    at HapiResponseAdapter.toError (/Users/sandy/dev/elastic/kibana/src/core/server/http/router/response_adapter.ts:132:19)
    at HapiResponseAdapter.toHapiResponse (/Users/sandy/dev/elastic/kibana/src/core/server/http/router/response_adapter.ts:82:19)
    at HapiResponseAdapter.handle (/Users/sandy/dev/elastic/kibana/src/core/server/http/router/response_adapter.ts:77:17)
    at Router.handle (/Users/sandy/dev/elastic/kibana/src/core/server/http/router/router.ts:273:34)

mtojek · 2020-11-06T14:20:01Z

I think we're talking about different errors. Please look at the one I posted in the issue description (HTTP 502).

mtojek added the Team:Fleet Team label for Observability Data Collection Fleet team label Nov 4, 2020

mtojek mentioned this issue Nov 4, 2020

Fix: bring back system 0.7.0 elastic/package-storage#598

Merged

ph assigned neptunian Nov 4, 2020

kuisathaverat mentioned this issue Nov 5, 2020

.transform-internal-XXX index if frozen #82735

Open

ph unassigned neptunian Feb 15, 2021

ph closed this as completed Mar 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ingest Management] Can't update the system package #82580

[Ingest Management] Can't update the system package #82580

mtojek commented Nov 4, 2020 •

edited

Loading

elasticmachine commented Nov 4, 2020

ruflin commented Nov 4, 2020

skh commented Nov 4, 2020

mtojek commented Nov 4, 2020

kuisathaverat commented Nov 4, 2020

skh commented Nov 4, 2020 •

edited

Loading

kuisathaverat commented Nov 4, 2020

kuisathaverat commented Nov 4, 2020 •

edited

Loading

kuisathaverat commented Nov 4, 2020

kuisathaverat commented Nov 4, 2020 •

edited

Loading

skh commented Nov 4, 2020

mtojek commented Nov 4, 2020 •

edited

Loading

nnamdifrankie commented Nov 4, 2020

ph commented Nov 4, 2020

nnamdifrankie commented Nov 4, 2020

kuisathaverat commented Nov 4, 2020

kuisathaverat commented Nov 4, 2020 •

edited

Loading

kevinlog commented Nov 4, 2020

neptunian commented Nov 5, 2020 •

edited

Loading

mtojek commented Nov 5, 2020

neptunian commented Nov 6, 2020

mtojek commented Nov 6, 2020

[Ingest Management] Can't update the system package #82580

[Ingest Management] Can't update the system package #82580

Comments

mtojek commented Nov 4, 2020 • edited Loading

elasticmachine commented Nov 4, 2020

ruflin commented Nov 4, 2020

skh commented Nov 4, 2020

mtojek commented Nov 4, 2020

kuisathaverat commented Nov 4, 2020

skh commented Nov 4, 2020 • edited Loading

kuisathaverat commented Nov 4, 2020

kuisathaverat commented Nov 4, 2020 • edited Loading

kuisathaverat commented Nov 4, 2020

kuisathaverat commented Nov 4, 2020 • edited Loading

skh commented Nov 4, 2020

mtojek commented Nov 4, 2020 • edited Loading

nnamdifrankie commented Nov 4, 2020

ph commented Nov 4, 2020

nnamdifrankie commented Nov 4, 2020

kuisathaverat commented Nov 4, 2020

kuisathaverat commented Nov 4, 2020 • edited Loading

kevinlog commented Nov 4, 2020

neptunian commented Nov 5, 2020 • edited Loading

mtojek commented Nov 5, 2020

neptunian commented Nov 6, 2020

mtojek commented Nov 6, 2020

mtojek commented Nov 4, 2020 •

edited

Loading

skh commented Nov 4, 2020 •

edited

Loading

kuisathaverat commented Nov 4, 2020 •

edited

Loading

kuisathaverat commented Nov 4, 2020 •

edited

Loading

mtojek commented Nov 4, 2020 •

edited

Loading

kuisathaverat commented Nov 4, 2020 •

edited

Loading

neptunian commented Nov 5, 2020 •

edited

Loading