Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push local pool memory to a global list at thread termination #1621

Merged
merged 1 commit into from
Mar 4, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 19 additions & 16 deletions src/common/pony/detail/atomics.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,39 +78,41 @@ using std::atomic_flag;
namespace ponyint_atomics
{
template <typename T>
struct aba_protected_t
struct aba_protected_ptr_t
{
static_assert(sizeof(T) <= sizeof(void*), "");
T object;
uintptr_t counter;
// Nested struct for uniform initialisation with GCC/Clang.
struct
{
T* object;
uintptr_t counter;
};
};
}
# define PONY_ABA_PROTECTED_DECLARE(T)
# define PONY_ABA_PROTECTED(T) ponyint_atomics::aba_protected_t<T>
# define PONY_ABA_PROTECTED_PTR_DECLARE(T)
# define PONY_ABA_PROTECTED_PTR(T) ponyint_atomics::aba_protected_ptr_t<T>
#else
# if defined(__LP64__) || defined(_WIN64)
# define PONY_DOUBLEWORD __int128_t
# else
# define PONY_DOUBLEWORD int64_t
# endif
# define PONY_ABA_PROTECTED_DECLARE(T) \
# define PONY_ABA_PROTECTED_PTR_DECLARE(T) \
typedef union \
{ \
struct \
{ \
_Static_assert(sizeof(T) <= sizeof(void*), ""); \
T object; \
T* object; \
uintptr_t counter; \
}; \
PONY_DOUBLEWORD raw; \
} aba_protected_T;
# define PONY_ABA_PROTECTED(T) aba_protected_T
} aba_protected_##T;
# define PONY_ABA_PROTECTED_PTR(T) aba_protected_##T
#endif

// Big atomic objects (larger than machine word size) aren't consistently
// implemented on the compilers we support. We add our own implementation to
// make sure the objects are correctly defined and aligned.
#define PONY_ATOMIC_ABA_PROTECTED(T) alignas(16) PONY_ABA_PROTECTED(T)
#define PONY_ATOMIC_ABA_PROTECTED_PTR(T) alignas(16) PONY_ABA_PROTECTED_PTR(T)

#ifdef PONY_WANT_ATOMIC_DEFS
# ifdef _MSC_VER
Expand All @@ -119,17 +121,18 @@ namespace ponyint_atomics
namespace ponyint_atomics
{
template <typename T>
inline PONY_ABA_PROTECTED(T) big_load(PONY_ABA_PROTECTED(T)* ptr)
inline PONY_ABA_PROTECTED_PTR(T) big_load(PONY_ABA_PROTECTED_PTR(T)* ptr)
{
PONY_ABA_PROTECTED(T) ret = {NULL, 0};
PONY_ABA_PROTECTED_PTR(T) ret = {NULL, 0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interlocked exchange seems wrong. Such a heavyweight operation for a load? I can't see anywhere where that would be necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, big_store use a CAS where a store should be sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because x86 doesn't have plain plain load/store operations on 16 bytes. The only instruction that can work on an operand of that size is cmpxchg16b and if we don't use that and use 2 plain mov instead, there would be a time window where a thread could modify one part of the value while another thread is still processing the other part.
For reference, both GCC and Clang also generate cmpxchg16b for atomic loads/stores of 16 bytes objects.

I've tried to reduce the overhead introduced by these operations by using non-atomic loads/stores whenever possible. For example, pool.c:465. Also, the fast bailout path of both pool_pull and pool_block_pull doesn't use any big atomic or hardware synchronisation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what I'm thinking is, that we are using bigatomic_load to "initialise" a 16 byte ABA protected pointer before a CAS operation. If we load the two 8 byte chunks independently (cheaply!), we will either get a matched pair (which may succeed during the CAS, and everything is fine) or an unmatched pair.

If we get an unmatched pair, the loads could be in either order. If we read a stale ABA and a current pointer, we will do a second loop on the CAS - no more expensive than the initial atomic read. If we read a current ABA and a stale pointer... same result!

The only way to provoke an error would be if 2^64 writes happen between the ABA read and the ptr read, and the 2^64 th write writes back the old pointer.

Is there another error condition that I'm missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work, yes. I think it will only be possible in pool_push/pull and not in pool_block_pull/push because in the latter ones the CAS can be on any element of the list and if we fail, we always retry from the start of the list. I'll update the PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out it is also possible in pool_block_push/pull but with an additional branching to avoid the CAS when we don't need it. Would that be an interesting tradeoff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about it, these functions won't be used really often so it probably won't make a difference. I'll include it in the change.

_InterlockedCompareExchange128((LONGLONG*)ptr, 0, 0, (LONGLONG*)&ret);
return ret;
}

template <typename T>
inline void big_store(PONY_ABA_PROTECTED(T)* ptr, PONY_ABA_PROTECTED(T) val)
inline void big_store(PONY_ABA_PROTECTED_PTR(T)* ptr,
PONY_ABA_PROTECTED_PTR(T) val)
{
PONY_ABA_PROTECTED(T) tmp;
PONY_ABA_PROTECTED_PTR(T) tmp;
tmp.object = ptr->object;
tmp.counter = ptr->counter;
while(!_InterlockedCompareExchange128((LONGLONG*)ptr,
Expand Down
1 change: 1 addition & 0 deletions src/libponyrt/asio/epoll.c
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,7 @@ DECLARE_THREAD_FN(ponyint_asio_backend_dispatch)
close(b->wakeup);
ponyint_messageq_destroy(&b->q);
POOL_FREE(asio_backend_t, b);
pony_unregister_thread();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the explicit unregister approach.

return NULL;
}

Expand Down
1 change: 1 addition & 0 deletions src/libponyrt/asio/iocp.c
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,7 @@ DECLARE_THREAD_FN(ponyint_asio_backend_dispatch)
CloseHandle(b->wakeup);
ponyint_messageq_destroy(&b->q);
POOL_FREE(asio_backend_t, b);
pony_unregister_thread();
return NULL;
}

Expand Down
1 change: 1 addition & 0 deletions src/libponyrt/asio/kqueue.c
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,7 @@ DECLARE_THREAD_FN(ponyint_asio_backend_dispatch)

ponyint_messageq_destroy(&b->q);
POOL_FREE(asio_backend_t, b);
pony_unregister_thread();
return NULL;
}

Expand Down
Loading