Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConfigPackageUtility::CreatePackage("_api") broken #7173

Closed
Elias481 opened this issue May 9, 2019 · 8 comments · Fixed by #7178
Closed

ConfigPackageUtility::CreatePackage("_api") broken #7173

Elias481 opened this issue May 9, 2019 · 8 comments · Fixed by #7178
Assignees
Labels
area/api REST API blocker Blocks a release or needs immediate attention core/build-fix Follow-up fix, not released yet
Milestone

Comments

@Elias481
Copy link
Contributor

Elias481 commented May 9, 2019

Describe the bug

After fresh installation the the _api package cannot be created.

To Reproduce

  1. do a fresh installation of current snapshot version or remove all content from Configuration::DataDir + "/api/packages/" while icinga2 daemon ist stopped
  2. ensure API setup is done
  3. oberve logfiles, try to schedule downtime, etc.

Expected behavior

Initial stage is created

Your Environment

  • Version used (icinga2 --version): current snapshot
  • Operating System and version: different
  • Enabled features (icinga2 feature list): api checker mainlog notification

Additional context

Observe logs (happens also for example downtime without external requests):

[2019-05-09 12:30:38 +0200] critical/Downtime: Config package broken: Error: Cannot detect active stage for package '_api'. Broken config package, check the troubleshooting documentation.

[2019-05-09 12:30:38 +0200] critical/ThreadPool: Exception thrown in event handler:
Error: Could not create downtime.

        (0) icinga2: void boost::throw_exception<boost::exception_detail::error_info_injector<std::runtime_error> >(boost::exception_detail::error_info_injector<std::runtime_error> const&) (+0xb5) [0x922d05]
        (1) icinga2: void boost::exception_detail::throw_exception_<std::runtime_error>(std::runtime_error const&, char const*, char const*, int) (+0x4b) [0x922d9b]
        (2) icinga2: icinga::Downtime::AddDowntime(boost::intrusive_ptr<icinga::Checkable> const&, icinga::String const&, icinga::String const&, double, double, bool, icinga::String const&, double, icinga::String const&, icinga::String const&, icinga::String const&, boost::intrusive_ptr<icinga::MessageOrigin> const&) (+0x9a8) [0xbe7058]
        (3) icinga2: icinga::ScheduledDowntime::CreateNextDowntime() (+0x35e) [0xc4efae]
        (4) icinga2: icinga::ScheduledDowntime::TimerProc() (+0x15c) [0xc4fe2c]
        (5) icinga2: boost::signals2::detail::signal_impl<void (icinga::Timer const* const&), boost::signals2::optional_last_value<void>, int, std::less<int>, boost::function<void (icinga::Timer const* const&)>, boost::function<void (boost::signals2::connection const&, icinga::Timer const* const&)>, boost::signals2::mutex>::operator()(icinga::Timer const* const&) (+0x1ef) [0x9f3f1f]
        (6) icinga2: icinga::Timer::Call() (+0x1b) [0x9f186b]
        (7) icinga2: icinga::ThreadPool::Post<std::function<void ()> >(std::function<void ()>, icinga::SchedulerPolicy)::{lambda()#1}::operator()() const (+0x19) [0xa05a39]
        (8) icinga2: boost::asio::detail::executor_op<boost::asio::detail::work_dispatcher<icinga::ThreadPool::Post<std::function<void ()> >(std::function<void ()>, icinga::SchedulerPolicy)::{lambda()#1}>, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, std::allocator<void>*, boost::system::error_code const&, unsigned long) (+0x9d) [0xa05c9d]
        (9) icinga2: boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() (+0x3bb) [0x9ef88b]
        (10) icinga2: boost_asio_detail_posix_thread_function (+0xf) [0x9ef47f]
        (11) libpthread.so.0: <unknown function> (+0x82de) [0x7fe1039c32de]
        (12) libc.so.6: clone (+0x43) [0x7fe1036f3993]

Try to create downtime via API:

[eohm@rhel8b-vm icinga2]$ curl -k -s -u root:fcbe228074e7f84f -H 'Accept: application/json' -X POST 'https://localhost:5665/v1/actions/schedule-downtime' -d '{ "start_time": 1556279961, "end_time": 1556289961, "duration": 1000, "author": "icingaadmin", "comment": "maintenance", "pretty": true, "type": "Service", "filter": true }'|jq
{
  "results": [
    {
      "code": 500,
      "status": "Action execution failed: 'Error: Could not create downtime.\n'."
    },
    {
      "code": 500,
      "status": "Action execution failed: 'Error: Could not create downtime.\n'."
    },
    {
      "code": 500,
      "status": "Action execution failed: 'Error: Could not create downtime.\n'."
    },
    {
      "code": 500,
      "status": "Action execution failed: 'Error: Could not create downtime.\n'."
    },
    {
      "code": 500,
      "status": "Action execution failed: 'Error: Could not create downtime.\n'."
    },
    {
      "code": 500,
      "status": "Action execution failed: 'Error: Could not create downtime.\n'."
    },
    {
      "code": 500,
      "status": "Action execution failed: 'Error: Could not create downtime.\n'."
    },
    {
      "code": 500,
      "status": "Action execution failed: 'Error: Could not create downtime.\n'."
    },
    {
      "code": 500,
      "status": "Action execution failed: 'Error: Could not create downtime.\n'."
    },
    {
      "code": 500,
      "status": "Action execution failed: 'Error: Could not create downtime.\n'."
    },
    {
      "code": 500,
      "status": "Action execution failed: 'Error: Could not create downtime.\n'."
    }
  ]
}

This one is logged only as follows in the logs:

[2019-05-09 12:34:45 +0200] information/HttpServerConnection: Request: POST /v1/actions/schedule-downtime (from [127.0
.0.1]:47056), user: root, agent: curl/7.61.1).
[2019-05-09 12:34:45 +0200] critical/Downtime: Config package broken: Error: Cannot detect active stage for package '_
api'. Broken config package, check the troubleshooting documentation.

[2019-05-09 12:34:45 +0200] critical/Downtime: Config package broken: Error: Cannot detect active stage for package '_
api'. Broken config package, check the troubleshooting documentation.

[2019-05-09 12:34:45 +0200] critical/Downtime: Config package broken: Error: Cannot detect active stage for package '_
api'. Broken config package, check the troubleshooting documentation.

[2019-05-09 12:34:45 +0200] critical/Downtime: Config package broken: Error: Cannot detect active stage for package '_
api'. Broken config package, check the troubleshooting documentation.

[2019-05-09 12:34:45 +0200] critical/Downtime: Config package broken: Error: Cannot detect active stage for package '_
api'. Broken config package, check the troubleshooting documentation.

[2019-05-09 12:34:45 +0200] critical/Downtime: Config package broken: Error: Cannot detect active stage for package '_
api'. Broken config package, check the troubleshooting documentation.

[2019-05-09 12:34:45 +0200] critical/Downtime: Config package broken: Error: Cannot detect active stage for package '_api'. Broken config package, check the troubleshooting documentation.

[2019-05-09 12:34:45 +0200] critical/Downtime: Config package broken: Error: Cannot detect active stage for package '_api'. Broken config package, check the troubleshooting documentation.

[2019-05-09 12:34:45 +0200] critical/Downtime: Config package broken: Error: Cannot detect active stage for package '_api'. Broken config package, check the troubleshooting documentation.

[2019-05-09 12:34:45 +0200] critical/Downtime: Config package broken: Error: Cannot detect active stage for package '_api'. Broken config package, check the troubleshooting documentation.

[2019-05-09 12:34:45 +0200] critical/Downtime: Config package broken: Error: Cannot detect active stage for package '_api'. Broken config package, check the troubleshooting documentation.

[2019-05-09 12:34:45 +0200] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:47056)

Cause

There is a kind of dead-loop (not really an infinite loop) in ConfigPackageUtility::CreatePackage which is triggered for the _api Package when Package not exists:
ConfigPackageUtility::CreatePackage calls ConfigPackageUtility::WritePackageConfig which in turn calls ConfigPackageUtility::GetActiveStage at the very beginning which is forced to ConfigPackageUtility::GetActiveStageFromFile which fails when no such file exists.

The ApiListener::CheckApiPackageIntegrity cannot help here, of cause.

I assume this issue came in context of #7119 / #7150 .

@dnsmichi dnsmichi self-assigned this May 9, 2019
@dnsmichi dnsmichi added this to the 2.11.0 milestone May 9, 2019
@dnsmichi dnsmichi added the area/api REST API label May 9, 2019
@Elias481
Copy link
Contributor Author

Elias481 commented May 9, 2019

What I forgot to mention in this context. Without API beeing enabled there are also backtraces in the logs, instead only complaining about api not available. (Also from the downtimes.)
I don't know whether someone runs Icinga2 without API, but having a single line message for such cases instead of backtrace would be better for such a case.. (Anyway I don't know whether it is ok that downtimes are not working without api enabled.)
So if You are working anyway on that You could consider to change that behaviour (as this isn't really an excpetion but just an Error).

Example:

[2019-05-09 12:06:53 +0200] critical/Downtime: Config package broken: Error: No ApiListener instance configured.

[2019-05-09 12:06:53 +0200] critical/ThreadPool: Exception thrown in event handler:
Error: Could not create downtime.

        (0) icinga2: icinga::Downtime::AddDowntime(boost::intrusive_ptr<icinga::Checkable> const&, icinga::String const&, icinga::String const&, double, double, bool, icinga::String const&,
 double, icinga::String const&, icinga::String const&, icinga::String const&, boost::intrusive_ptr<icinga::MessageOrigin> const&) (+0xafc) [0x55d3d1b1177c]
        (1) icinga2: icinga::ScheduledDowntime::CreateNextDowntime() (+0x530) [0x55d3d1b583a0]
        (2) icinga2: icinga::ScheduledDowntime::TimerProc() (+0x148) [0x55d3d1b58e48]
        (3) icinga2: <unknown function> (+0x2ddc11) [0x55d3d19b5c11]
        (4) icinga2: icinga::Timer::Call() (+0x2d) [0x55d3d1965fed]
        (5) icinga2: <unknown function> (+0x2a3cda) [0x55d3d197bcda]
        (6) icinga2: boost::asio::detail::executor_op<boost::asio::detail::work_dispatcher<icinga::ThreadPool::Post<std::function<void ()> >(std::function<void ()>, icinga::SchedulerPolicy)
::{lambda()#1}>, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, std::allocator<void>*, boost::system::error_code const&, unsigned long) (+0xb1) [0x55d3d
19979f1]
        (7) icinga2: <unknown function> (+0x251b2b) [0x55d3d1929b2b]
        (8) icinga2: <unknown function> (+0x2e53a5) [0x55d3d19bd3a5]
        (9) icinga2: boost_asio_detail_posix_thread_function (+0xf) [0x55d3d19200ef]
        (10) libpthread.so.0: <unknown function> (+0x76db) [0x7f3aeb2476db]
        (11) libc.so.6: clone (+0x3f) [0x7f3aec56988f]

@dnsmichi
Copy link
Contributor

dnsmichi commented May 9, 2019

Hmmm, the API becomes are hard requirement in the future, since we will remove the external command pipe. Still, adding exceptions in places where they have never been tested for years, turns out to be a bad idea. Thanks for finding this, I will tackle this soon.

@dnsmichi
Copy link
Contributor

dnsmichi commented May 9, 2019

Orrrrrrrr. I totally forgot that SDs use the config package too. Crap.

@Elias481
Copy link
Contributor Author

Elias481 commented May 9, 2019

Yes would be better if they would work on a box without api configuration and only static configuration as long as they are part of the static configuration. (For the uncommon use case to use Icinga2 as standalone monitoring agent.)

@dnsmichi
Copy link
Contributor

dnsmichi commented May 9, 2019

Hmmm, the config packages are a bit broken by design. Reading the package's active stage from disk must not throw exceptions since it is a valid use case, e.g. when creating a package for the first time. The later steps then involve creating a new stage and activating it, essentially what's done with subsequent API requests. The _api package is special, and must be created on startup at the soonest point possible though with a special note to the user when this really is broken.

@dnsmichi
Copy link
Contributor

dnsmichi commented May 9, 2019

Will continue tomorrow, pushed a branch.

@dnsmichi dnsmichi added core/build-fix Follow-up fix, not released yet blocker Blocks a release or needs immediate attention labels May 10, 2019
dnsmichi pushed a commit that referenced this issue May 10, 2019
This partially reverts #7150 and avoids exceptions
inside the flow. Each time an empty active stage
is detected, Icinga tries to repair it from the
the given directory tree.

Also, the code now takes into account that it should
create the package storage on startup, whether within
the API object, or if disabled, inside the application.

Caching the active stages for packages in memory
only is in effect with the API feature being enabled.
This is useful for other deployed config packages,
not only the internal one.

fixes #7173
refs #7150
refs #7119
fixes #6959
@Elias481
Copy link
Contributor Author

Fine, thanks. But still one issue in that context remains.
It's not affecting fresh setups but f the "icinga2/api/packages/_api" folder exists but is empty it is not initialized/fixed automatically.
That can be fixed easily by removing the empty folder, but probably should go into the troublehooting-docs (where You anyway inroduced a typo that could be fixed It also tries to repair the broken package, and lots a new message - or shall it mean that there are now lots of new messages because of the repair function and such?)

@dnsmichi
Copy link
Contributor

I thought of purging an empty _api directory, but this thing cannot happen without manual interaction from the user. So if that's really happening, a manual rmdir and restart will fix it anyways.

lots should read as logs, I'll fix that, thanks. Rough week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api REST API blocker Blocks a release or needs immediate attention core/build-fix Follow-up fix, not released yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants