Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that runtime created API objects survive a restart #7150

Merged
merged 5 commits into from
Apr 30, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 88 additions & 1 deletion doc/15-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -747,7 +747,7 @@ $ curl -k -s -u root:icinga -H 'Accept: application/json' -X DELETE 'https://loc
}
```

## REST API Troubleshooting: No Objects Found <a id="troubleshooting-api-no-objects-found"></a>
### REST API Troubleshooting: No Objects Found <a id="troubleshooting-api-no-objects-found"></a>

Please note that the `404` status with no objects being found can also originate
from missing or too strict object permissions for the authenticated user.
Expand All @@ -761,6 +761,93 @@ In order to analyse and fix the problem, please check the following:
- use an administrative account with full permissions to check whether the objects are actually there.
- verify the permissions on the affected ApiUser object and fix them.

### Missing Runtime Objects (Hosts, Downtimes, etc.) <a id="troubleshooting-api-missing-runtime-objects"></a>

Runtime objects consume the internal config packages shared with
the REST API config packages. Each host, downtime, comment, service, etc. created
via the REST API is stored in the `_api` package.

This includes downtimes and comments, which where sometimes stored in the wrong
directory path, because the active-stage file was empty/truncated/unreadable at
this point.

Wrong:

```
/var/lib/icinga2/api/packages/_api//conf.d/downtimes/1234-5678-9012-3456.conf
```

Correct:

```
/var/lib/icinga2/api/packages/_api/abcd-ef12-3456-7890/conf.d/downtimes/1234-5678-9012-3456.conf
```

At creation time, the object lives in memory but its storage is broken. Upon restart,
it is missing and e.g. a missing downtime will re-enable unwanted notifications.

`abcd-ef12-3456-7890` is the active stage name which wasn't correctly
read by the Icinga daemon. This information is stored in `/var/lib/icinga2/api/packages/_api/active-stage`.

2.11 now limits the direct active-stage file access (this is hidden from the user),
and caches active stages for packages in-memory.

Bonus on startup/config validation: Icinga now logs a critical message when a deployed
config package is broken.

```
icinga2 daemon -C

[2019-04-26 12:58:14 +0200] critical/ApiListener: Cannot detect active stage for package '_api'. Broken config package, check the troubleshooting documentation.
```

In order to fix the broken config package, and mark a deployed stage as active
again, carefully do the following steps with creating a backup before:

Navigate into the API package prefix.

```
cd /var/lib/icinga2/api/packages
```

Change into the broken package directory and list all directories and files
ordered by latest changes.

```
cd _api
ls -lahtr

drwx------ 4 michi wheel 128B Mar 27 14:39 ..
-rw-r--r-- 1 michi wheel 25B Mar 27 14:39 include.conf
-rw-r--r-- 1 michi wheel 405B Mar 27 14:39 active.conf
drwx------ 7 michi wheel 224B Mar 27 15:01 abcd-ef12-3456-7890
drwx------ 5 michi wheel 160B Apr 26 12:47 .
```

As you can see, the `active-stage` file is missing. When it is there, verify that its content
is set to the stage directory as follows.

If you have more than one stage directory here, pick the latest modified
directory. Copy the directory name `abcd-ef12-3456-7890` and
add it into a new file `active-stage`. This can be done like this:

```
echo "abcd-ef12-3456-7890" > active-stage
```

Re-run config validation.

```
icinga2 daemon -C
```

The validation should not show an error.

> **Note**
>
> The internal `_api` config package structure may change in the future. Do not modify
> things in there manually or with scripts unless guided here or asked by a developer.


## Certificate Troubleshooting <a id="troubleshooting-certificate"></a>

Expand Down
22 changes: 22 additions & 0 deletions doc/16-upgrading-icinga-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,28 @@ The deprecated `concurrent_checks` attribute in the [checker feature](09-object-
has no effect anymore if set. Please use the [MaxConcurrentChecks](17-language-reference.md#icinga-constants-global-config)
constant in [constants.conf](04-configuring-icinga-2.md#constants-conf) instead.

### REST API <a id="upgrading-to-2-11-api"></a>

#### Config Packages <a id="upgrading-to-2-11-api-config-packages"></a>

Deployed configuration packages require an active stage, with many previous
allowed. This mechanism is used by the Icinga Director as external consumer,
and Icinga itself for storing runtime created objects inside the `_api`
package.

This includes downtimes and comments, which where sometimes stored in the wrong
directory path, because the active-stage file was empty/truncated/unreadable at
this point.

2.11 makes this mechanism more stable and detects broken config packages.

```
[2019-04-26 12:58:14 +0200] critical/ApiListener: Cannot detect active stage for package '_api'. Broken config package, check the troubleshooting documentation.
```

In order to fix this, please follow [this troubleshooting entry](15-troubleshooting.md#troubleshooting-api-missing-runtime-objects).


## Upgrading to v2.10 <a id="upgrading-to-2-10"></a>

### Path Constant Changes <a id="upgrading-to-2-10-path-constant-changes"></a>
Expand Down
8 changes: 7 additions & 1 deletion lib/cli/daemoncommand.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,13 @@ int DaemonCommand::Run(const po::variables_map& vm, const std::vector<std::strin
}

/* Remove ignored Downtime/Comment objects. */
ConfigItem::RemoveIgnoredItems(ConfigObjectUtility::GetConfigDir());
try {
String configDir = ConfigObjectUtility::GetConfigDir();
ConfigItem::RemoveIgnoredItems(configDir);
} catch (const std::exception& ex) {
Log(LogNotice, "cli")
<< "Cannot clean ignored downtimes/comments: " << ex.what();
}

#ifndef _WIN32
struct sigaction sa;
Expand Down
10 changes: 9 additions & 1 deletion lib/remote/apilistener-configsync.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,15 @@ void ApiListener::UpdateConfigObject(const ConfigObject::Ptr& object, const Mess
params->Set("version", object->GetVersion());

if (object->GetPackage() == "_api") {
String file = ConfigObjectUtility::GetObjectConfigPath(object->GetReflectionType(), object->GetName());
String file;

try {
file = ConfigObjectUtility::GetObjectConfigPath(object->GetReflectionType(), object->GetName());
} catch (const std::exception& ex) {
Log(LogNotice, "ApiListener")
<< "Cannot sync object '" << object->GetName() << "': " << ex.what();
return;
}

std::ifstream fp(file.CStr(), std::ifstream::binary);
if (!fp)
Expand Down
86 changes: 86 additions & 0 deletions lib/remote/apilistener.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
#include "remote/endpoint.hpp"
#include "remote/jsonrpc.hpp"
#include "remote/apifunction.hpp"
#include "remote/configpackageutility.hpp"
#include "base/convert.hpp"
#include "base/defer.hpp"
#include "base/io-engine.hpp"
Expand Down Expand Up @@ -134,6 +135,9 @@ void ApiListener::OnConfigLoaded()
Log(LogWarning, "ApiListener", "Please read the upgrading documentation for v2.8: https://icinga.com/docs/icinga2/latest/doc/16-upgrading-icinga-2/");
}

/* Cache API packages and their active stage name. */
UpdateActivePackageStagesCache();

/* set up SSL context */
std::shared_ptr<X509> cert;
try {
Expand Down Expand Up @@ -267,6 +271,11 @@ void ApiListener::Start(bool runtimeCreated)
m_CleanupCertificateRequestsTimer->Start();
m_CleanupCertificateRequestsTimer->Reschedule(0);

m_ApiPackageIntegrityTimer = new Timer();
m_ApiPackageIntegrityTimer->OnTimerExpired.connect(std::bind(&ApiListener::CheckApiPackageIntegrity, this));
m_ApiPackageIntegrityTimer->SetInterval(300);
m_ApiPackageIntegrityTimer->Start();

OnMasterChanged(true);
}

Expand Down Expand Up @@ -1537,6 +1546,83 @@ Endpoint::Ptr ApiListener::GetLocalEndpoint() const
return m_LocalEndpoint;
}

void ApiListener::UpdateActivePackageStagesCache()
{
boost::mutex::scoped_lock lock(m_ActivePackageStagesLock);

for (auto package : ConfigPackageUtility::GetPackages()) {
String activeStage;

try {
activeStage = ConfigPackageUtility::GetActiveStageFromFile(package);
} catch (const std::exception& ex) {
Log(LogCritical, "ApiListener")
<< ex.what();
continue;
}

Log(LogNotice, "ApiListener")
<< "Updating cache: Config package '" << package << "' has active stage '" << activeStage << "'.";

m_ActivePackageStages[package] = activeStage;
}
}

void ApiListener::CheckApiPackageIntegrity()
{
boost::mutex::scoped_lock lock(m_ActivePackageStagesLock);

for (auto package : ConfigPackageUtility::GetPackages()) {
String activeStage;
try {
activeStage = ConfigPackageUtility::GetActiveStageFromFile(package);
} catch (const std::exception& ex) {
/* An error means that the stage is broken, try to repair it. */
auto it = m_ActivePackageStages.find(package);

if (it == m_ActivePackageStages.end())
continue;

String activeStageCached = it->second;

Log(LogInformation, "ApiListener")
<< "Repairing broken API config package '" << package
<< "', setting active stage '" << activeStageCached << "'.";

ConfigPackageUtility::SetActiveStageToFile(package, activeStageCached);
}
}
}

void ApiListener::SetActivePackageStage(const String& package, const String& stage)
{
boost::mutex::scoped_lock lock(m_ActivePackageStagesLock);
m_ActivePackageStages[package] = stage;
}

String ApiListener::GetActivePackageStage(const String& package)
{
boost::mutex::scoped_lock lock(m_ActivePackageStagesLock);

if (m_ActivePackageStages.find(package) == m_ActivePackageStages.end())
BOOST_THROW_EXCEPTION(ScriptError("Package " + package + " has no active stage."));

return m_ActivePackageStages[package];
}

void ApiListener::RemoveActivePackageStage(const String& package)
{
/* This is the rare occassion when a package has been deleted. */
boost::mutex::scoped_lock lock(m_ActivePackageStagesLock);

auto it = m_ActivePackageStages.find(package);

if (it == m_ActivePackageStages.end())
return;

m_ActivePackageStages.erase(it);
}

void ApiListener::ValidateTlsProtocolmin(const Lazy<String>& lvalue, const ValidationUtils& utils)
{
ObjectImpl<ApiListener>::ValidateTlsProtocolmin(lvalue, utils);
Expand Down
14 changes: 14 additions & 0 deletions lib/remote/apilistener.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,11 @@ class ApiListener final : public ObjectImpl<ApiListener>
static Value ConfigUpdateObjectAPIHandler(const MessageOrigin::Ptr& origin, const Dictionary::Ptr& params);
static Value ConfigDeleteObjectAPIHandler(const MessageOrigin::Ptr& origin, const Dictionary::Ptr& params);

/* API config packages */
void SetActivePackageStage(const String& package, const String& stage);
String GetActivePackageStage(const String& package);
void RemoveActivePackageStage(const String& package);

static Value HelloAPIHandler(const MessageOrigin::Ptr& origin, const Dictionary::Ptr& params);

static void UpdateObjectAuthority();
Expand Down Expand Up @@ -119,13 +124,16 @@ class ApiListener final : public ObjectImpl<ApiListener>
Timer::Ptr m_ReconnectTimer;
Timer::Ptr m_AuthorityTimer;
Timer::Ptr m_CleanupCertificateRequestsTimer;
Timer::Ptr m_ApiPackageIntegrityTimer;

Endpoint::Ptr m_LocalEndpoint;

static ApiListener::Ptr m_Instance;

void ApiTimerHandler();
void ApiReconnectTimerHandler();
void CleanupCertificateRequestsTimerHandler();
void CheckApiPackageIntegrity();

bool AddListener(const String& node, const String& service);
void AddConnection(const Endpoint::Ptr& endpoint);
Expand Down Expand Up @@ -175,6 +183,12 @@ class ApiListener final : public ObjectImpl<ApiListener>
void SendRuntimeConfigObjects(const JsonRpcConnection::Ptr& aclient);

void SyncClient(const JsonRpcConnection::Ptr& aclient, const Endpoint::Ptr& endpoint, bool needSync);

/* API Config Packages */
mutable boost::mutex m_ActivePackageStagesLock;
std::map<String, String> m_ActivePackageStages;

void UpdateActivePackageStagesCache();
};

}
Expand Down
Loading