Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't generate a mutex for each Locked<T>, share one per object #10117

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Al2Klimov
Copy link
Member

@Al2Klimov Al2Klimov commented Aug 20, 2024

This reduces RAM usage per object by sizeof(mutex)*(FIELDS-1).

The atomic_load(const std::shared_ptr<T>*) idea would, without doubt, also consume less RAM. However, it would scale worse, the more objects you have, with a fixed number of mutexes. Neither one SpinMutex per property, nor one std::mutex per object suffer from this limitation.

@cla-bot cla-bot bot added the cla/signed label Aug 20, 2024
@icinga-probot icinga-probot bot added the core/evaluate Analyse/Evaluate features and problems label Aug 20, 2024
@julianbrost
Copy link
Contributor

This misses part of what #10113 asked for:

  1. Figure out how much of an effect this has on the total memory use of Icinga 2.

We have a vague "Icinga 2 uses more memory in newer versions", so how much does this affect the total memory use of the whole process? Might this be the answer or is it just a small puzzle piece?

Also, what made you choose adding that m_FieldsMutex and passing it around over what I suggested in #10113? I'm not a big fan of how random code (see the changes to lib/icingadb/ now has to access some internal attribute generated by mkclass. A nice aspect of the current implementation is that it's very easy to reason that it won't deadlock, all the magic happens here:

template<typename T>
class Locked
{
public:
inline T load() const
{
std::unique_lock<std::mutex> lock(m_Mutex);
return m_Value;
}
inline void store(T desired)
{
std::unique_lock<std::mutex> lock(m_Mutex);
m_Value = std::move(desired);
}
private:
mutable std::mutex m_Mutex;
T m_Value;
};

Whereas with this PR, exposing the underlying mutex, this gets more spread out in the code base.

@Al2Klimov
Copy link
Member Author

Also, what made you choose adding that m_FieldsMutex and passing it around over what I suggested in #10113?

sharing the mutex between objects, this would reduce the memory requirements

The more fields, especially objects, share one mutex, the slower they can proceed. I prefer 1+ mutex per separate object.

I'm not a big fan of how random code (see the changes to lib/icingadb/ now has to access some internal attribute generated by mkclass. A nice aspect of the current implementation is that it's very easy to reason that it won't deadlock, all the magic happens here:

template<typename T>
class Locked
{
public:
inline T load() const
{
std::unique_lock<std::mutex> lock(m_Mutex);
return m_Value;
}
inline void store(T desired)
{
std::unique_lock<std::mutex> lock(m_Mutex);
m_Value = std::move(desired);
}
private:
mutable std::mutex m_Mutex;
T m_Value;
};

Whereas with this PR, exposing the underlying mutex, this gets more spread out in the code base.

You're right. One wrong move on the mutex and... 🙈

@Al2Klimov Al2Klimov changed the title Don't generate a mutex for each Locked<T>, share one per object Locked<T>: optimistically use SpinMutex to consume less RAM Aug 26, 2024
Comment on lines 44 to 68
/**
* Classic spinlock mutex based on std::atomic_flag.
*
* @ingroup base
*/
class SpinMutex
{
public:
SpinMutex() = default;
SpinMutex(const SpinMutex&) = delete;
SpinMutex(SpinMutex&&) = delete;
SpinMutex& operator=(const SpinMutex&) = delete;
SpinMutex& operator=(SpinMutex&&) = delete;

void lock() noexcept
{
while (m_State.test_and_set(std::memory_order_acquire)) {
}
}

void unlock() noexcept
{
m_State.clear(std::memory_order_release);
}

private:
std::atomic_flag m_State = ATOMIC_FLAG_INIT;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think a spinlock is a good idea here?

Let me quote cppreference.com for std::atomic_flag:

A spinlock mutex demo can be implemented in userspace using an atomic_flag. Do note that spinlock mutexes are extremely dubious in practice.

And we even had problems in Icinga 2 in the past due to a spinlock, see #8507/#8488. So I have doubts that would be a good idea this time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think a spinlock is a good idea here?

std::mutex is 5 pointers large on x64, whereas std::atomic_flag at most one.

spinlock mutexes are extremely dubious in practice.

But the lock scope here is very small, isn't it? Not to mention how unlikely you'll hit the slow path at all in our case...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the lock scope here is very small, isn't it?

Depends on what's inside. If it's an intrusive pointer, it just needs to increment the reference count, if it's a string though, it has to copy it, including allocating memory for it.

Not to mention how unlikely you'll hit the slow path at all in our case...

How did you conclude it's unlikely? #9364 was done for a reason, there were random crashes because. Crash means concurrent access, so that will go into the slow path. Also note that not every concurrent access resulted in a crash, so there will be more "hitting the slow path" than there were crashes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change hidden in a big rebase push related to this conversation? If so, please comment why you think this resolves it. Otherwise, what was the motivation for that change and how will you address this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the lock scope here is very small, isn't it?

Depends on what's inside. If it's an intrusive pointer, it just needs to increment the reference count, if it's a string though, it has to copy it, including allocating memory for it.

You're right. Especially the latter case can take a few more CPU cycles. So instead of wasting them on pure spinning in a waiting thread, the latter now says "after you" while something's locked. This shall bring the mutex holder in foreground and let it finish and unlock.

@Al2Klimov Al2Klimov changed the title Locked<T>: optimistically use SpinMutex to consume less RAM Locked<T>: optimistically use SpinMutex for intrusive_ptr to consume less RAM Sep 24, 2024
@julianbrost
Copy link
Contributor

Please use a search engine of your choice to search for the keywords "userspace" and "spinlock". Seems like the majority isn't too positive about that idea. And as already said, Icinga 2 already had a negative experience with spinlocks itself. What makes you believe that now is a good time to use a spinlock in userspace?

By the way, Linus Torvalds' rant post (the one linked from cppreference.com) even has a follow-up from him on yield.

Apart from that, the current version of the PR seems to restrict the use of that suggested spinlock to just intrusive_ptr, so each string attribute would still use its own std::mutex. So the memory use will be somewhere in the middle (I haven't looked into the distribution of attribute types), but why do it that way if it can be done better?

The atomic_load(const std::shared_ptr<T>*) idea would, without doubt, also consume less RAM. However, it would scale worse, the more objects you have, with a fixed number of mutexes.

However, while there may be a large number of objects, the number of CPU cores and thereby threads we start is typically much smaller. And that actually limits the total contention. There won't be a thread for each single object trying to lock it.

This reduces RAM usage per object by sizeof(mutex)*(FIELDS-1).
@Al2Klimov Al2Klimov changed the title Locked<T>: optimistically use SpinMutex for intrusive_ptr to consume less RAM Don't generate a mutex for each Locked<T>, share one per object Sep 26, 2024
Copy link
Member Author

@Al2Klimov Al2Klimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, while there may be a large number of objects, the number of CPU cores and thereby threads we start is typically much smaller. And that actually limits the total contention. There won't be a thread for each single object trying to lock it.

This sounds nice in theory, but I'd prefer a solution much more predicable, especially for large setups. I mean my one mutex per object idea.

Comment on lines +65 to +79
/**
* Wraps std::mutex, so that only Locked<T> can (un)lock it.
*
* The latter tiny lock scope is enforced this way to prevent deadlocks while passing around mutexes.
*
* @ingroup base
*/
class LockedMutex
{
template<class T>
friend class Locked;

private:
std::mutex m_Mutex;
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of how random code (see the changes to lib/icingadb/ now has to access some internal attribute generated by mkclass. A nice aspect of the current implementation is that it's very easy to reason that it won't deadlock, all the magic happens here:

template<typename T>
class Locked
{
public:
inline T load() const
{
std::unique_lock<std::mutex> lock(m_Mutex);
return m_Value;
}
inline void store(T desired)
{
std::unique_lock<std::mutex> lock(m_Mutex);
m_Value = std::move(desired);
}
private:
mutable std::mutex m_Mutex;
T m_Value;
};

Whereas with this PR, exposing the underlying mutex, this gets more spread out in the code base.

I've fixed it. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla/signed core/evaluate Analyse/Evaluate features and problems
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants