Skip to content

Commit

Permalink
[SYCL] Improve events cleanup for immediate command lists (#7715)
Browse files Browse the repository at this point in the history
Currently we call zeEventQueryStatus every time before removing an event
from an immediate command list.

There are several improvements which can be done:
1. Currently we synchronize immediate command lists in synchronize()
under queue lock. So we can just remove all events from all command
lists right there without checking the status.
2. If we know that some event is completed and queue is in-order then we
can cleanup all preceding events in the command list including that event
without checking the status. Currently we don't check for completed events
in different command lists, this is an area for improvement. Introduced new
CleanupEventsInImmCmdLists function to do that. Additionally made this
function autonomous so that it can cleanup events in the immediate command
lists of the queue for any case. Made resetCommandLists to call 
CleanupEventsInImmCmdLists to have single point of entry.
3. We can avoid calling zeEventQueryStatus by checking LastCommandEvent
for in-order queues and by checking Completed flag.
4. Also if we end up walking over command list and checking status then
break early as soon as first non-completed event is found.
  • Loading branch information
againull authored Dec 13, 2022
1 parent 4c68638 commit 179ffa1
Show file tree
Hide file tree
Showing 2 changed files with 187 additions and 55 deletions.
217 changes: 168 additions & 49 deletions sycl/plugins/level_zero/pi_level_zero.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -626,6 +626,7 @@ inline static pi_result createEventAndAssociateQueue(
(*Event)->Queue = Queue;
(*Event)->CommandType = CommandType;
(*Event)->IsDiscarded = IsInternal;
(*Event)->CommandList = CommandList;
// Discarded event doesn't own ze_event, it is used by multiple pi_event
// objects. We destroy corresponding ze_event by releasing events from the
// events cache at queue destruction. Event in the cache owns the Level Zero
Expand Down Expand Up @@ -1016,10 +1017,10 @@ bool _pi_queue::isPriorityHigh() const {
return ((this->Properties & PI_EXT_ONEAPI_QUEUE_PRIORITY_HIGH) != 0);
}

pi_result
_pi_queue::resetCommandList(pi_command_list_ptr_t CommandList,
bool MakeAvailable,
std::vector<pi_event> &EventListToCleanup) {
pi_result _pi_queue::resetCommandList(pi_command_list_ptr_t CommandList,
bool MakeAvailable,
std::vector<pi_event> &EventListToCleanup,
bool CheckStatus) {
bool UseCopyEngine = CommandList->second.isCopy(this);

// Immediate commandlists do not have an associated fence.
Expand All @@ -1033,25 +1034,37 @@ _pi_queue::resetCommandList(pi_command_list_ptr_t CommandList,
}

auto &EventList = CommandList->second.EventList;
// Check if standard commandlist
if (CommandList->second.ZeFence != nullptr) {
// Check if standard commandlist or fully synced in-order queue.
// If one of those conditions is met then we are sure that all events are
// completed so we don't need to check event status.
if (!CheckStatus || CommandList->second.ZeFence != nullptr ||
(isInOrderQueue() && !LastCommandEvent)) {
// Remember all the events in this command list which needs to be
// released/cleaned up and clear event list associated with command list.
std::move(std::begin(EventList), std::end(EventList),
std::back_inserter(EventListToCleanup));
EventList.clear();
} else {
} else if (!isDiscardEvents()) {
// For immediate commandlist reset only those events that have signalled.
// If events in the queue are discarded then we can't check their status.
for (auto it = EventList.begin(); it != EventList.end();) {
std::scoped_lock<pi_shared_mutex> EventLock((*it)->Mutex);
ze_result_t ZeResult =
ZE_CALL_NOCHECK(zeEventQueryStatus, ((*it)->ZeEvent));
if (ZeResult == ZE_RESULT_SUCCESS) {
EventListToCleanup.push_back(std::move((*it)));
it = EventList.erase(it);
} else {
it++;
}
(*it)->Completed ? ZE_RESULT_SUCCESS : ZE_CALL_NOCHECK(zeEventQueryStatus, ((*it)->ZeEvent));
// Break early as soon as we found first incomplete event because next
// events are submitted even later. We are not trying to find all
// completed events here because it may be costly. I.e. we are checking
// only elements which are most likely completed because they were
// submitted earlier. It is guaranteed that all events will be eventually
// cleaned up at queue sync/release.
if (ZeResult == ZE_RESULT_NOT_READY)
break;

if (ZeResult != ZE_RESULT_SUCCESS)
return mapError(ZeResult);

EventListToCleanup.push_back(std::move((*it)));
it = EventList.erase(it);
}
}

Expand Down Expand Up @@ -1287,9 +1300,93 @@ CleanupEventListFromResetCmdList(std::vector<pi_event> &EventListToCleanup,
return PI_SUCCESS;
}

// Reset signalled command lists in the queue and put them to the cache of
// command lists. A caller must not lock the queue mutex.
pi_result resetCommandLists(pi_queue Queue) {
/// @brief Cleanup events in the immediate lists of the queue.
/// @param Queue Queue where events need to be cleaned up.
/// @param QueueLocked Indicates if the queue mutex is locked by caller.
/// @param QueueSynced 'true' if queue was synchronized before the
/// call and no other commands were submitted after synchronization, 'false'
/// otherwise.
/// @param CompletedEvent Hint providing an event which was synchronized before
/// the call, in case of in-order queue it allows to cleanup all preceding
/// events.
/// @return PI_SUCCESS if successful, PI error code otherwise.
static pi_result CleanupEventsInImmCmdLists(pi_queue Queue,
bool QueueLocked = false,
bool QueueSynced = false,
pi_event CompletedEvent = nullptr) {
// Handle only immediate command lists here.
if (!Queue || !Queue->Device->useImmediateCommandLists())
return PI_SUCCESS;

std::vector<pi_event> EventListToCleanup;
{
std::unique_lock<pi_shared_mutex> QueueLock(Queue->Mutex, std::defer_lock);
if (!QueueLocked)
QueueLock.lock();
// If queue is locked and fully synchronized then cleanup all events.
// If queue is not locked then by this time there may be new submitted
// commands so we can't do full cleanup.
if (QueueLocked &&
(QueueSynced || (Queue->isInOrderQueue() &&
(CompletedEvent == Queue->LastCommandEvent ||
!Queue->LastCommandEvent)))) {
Queue->LastCommandEvent = nullptr;
for (auto &&It = Queue->CommandListMap.begin();
It != Queue->CommandListMap.end(); ++It) {
PI_CALL(Queue->resetCommandList(It, true, EventListToCleanup,
/* CheckStatus */ false));
}
} else if (Queue->isInOrderQueue() && CompletedEvent) {
// If the queue is in-order and we have information about completed event
// then cleanup all events in the command list preceding to CompletedEvent
// including itself.

// Check that the comleted event has associated command list.
if (!(CompletedEvent->CommandList &&
CompletedEvent->CommandList.value() != Queue->CommandListMap.end()))
return PI_SUCCESS;

auto &CmdListEvents =
CompletedEvent->CommandList.value()->second.EventList;
auto CompletedEventIt =
std::find(CmdListEvents.begin(), CmdListEvents.end(), CompletedEvent);
if (CompletedEventIt != CmdListEvents.end()) {
// We can cleanup all events prior to the completed event in this
// command list and completed event itself.
// TODO: we can potentially cleanup more events here by finding
// completed events on another command lists, but it is currently not
// implemented.
std::move(std::begin(CmdListEvents), CompletedEventIt + 1,
std::back_inserter(EventListToCleanup));
CmdListEvents.erase(CmdListEvents.begin(), CompletedEventIt + 1);
}
} else {
// Fallback to resetCommandList over all command lists.
for (auto &&It = Queue->CommandListMap.begin();
It != Queue->CommandListMap.end(); ++It) {
PI_CALL(Queue->resetCommandList(It, true, EventListToCleanup,
/* CheckStatus */ true));
}
}
}
PI_CALL(CleanupEventListFromResetCmdList(EventListToCleanup, QueueLocked));
return PI_SUCCESS;
}

/// @brief Reset signalled command lists in the queue and put them to the cache
/// of command lists. Also cleanup events associated with signalled command
/// lists. Queue must not be locked by the caller.
/// @param Queue Queue where we look for signalled command lists and cleanup
/// events.
/// @return PI_SUCCESS if successful, PI error code otherwise.
static pi_result resetCommandLists(pi_queue Queue) {
// Handle immediate command lists here, they don't need to be reset and we
// only need to cleanup events.
if (Queue->Device->useImmediateCommandLists()) {
PI_CALL(CleanupEventsInImmCmdLists(Queue));
return PI_SUCCESS;
}

// We need events to be cleaned up out of scope where queue is locked to avoid
// nested locks, because event cleanup requires event to be locked. Nested
// locks are hard to control and can cause deadlocks if mutexes are locked in
Expand All @@ -1303,21 +1400,19 @@ pi_result resetCommandLists(pi_queue Queue) {
// signalled, then the command list & fence are reset and command list is
// returned to the command list cache. All events associated with command
// list are cleaned up if command list was reset.
std::scoped_lock<pi_shared_mutex> Lock(Queue->Mutex);
std::unique_lock<pi_shared_mutex> QueueLock(Queue->Mutex);
for (auto &&it = Queue->CommandListMap.begin();
it != Queue->CommandListMap.end(); ++it) {
// Immediate commandlists don't use a fence but still need reset.
if (it->second.ZeFence == nullptr) {
PI_CALL(Queue->resetCommandList(it, true, EventListToCleanup));
} else {
// It is possible that the fence was already noted as signalled and
// reset. In that case the ZeFenceInUse flag will be false.
if (it->second.ZeFenceInUse) {
ze_result_t ZeResult =
ZE_CALL_NOCHECK(zeFenceQueryStatus, (it->second.ZeFence));
if (ZeResult == ZE_RESULT_SUCCESS)
PI_CALL(Queue->resetCommandList(it, true, EventListToCleanup));
}
// Immediate commandlists don't use a fence and are handled separately
// above.
assert(it->second.ZeFence != nullptr);
// It is possible that the fence was already noted as signalled and
// reset. In that case the ZeFenceInUse flag will be false.
if (it->second.ZeFenceInUse) {
ze_result_t ZeResult =
ZE_CALL_NOCHECK(zeFenceQueryStatus, (it->second.ZeFence));
if (ZeResult == ZE_RESULT_SUCCESS)
PI_CALL(Queue->resetCommandList(it, true, EventListToCleanup));
}
}
}
Expand Down Expand Up @@ -3969,8 +4064,10 @@ pi_result piQueueFinish(pi_queue Queue) {
}
}
// Reset signalled command lists and return them back to the cache of
// available command lists.
resetCommandLists(Queue);
// available command lists. Events in the immediate command lists are cleaned
// up in synchronize().
if (!Queue->Device->useImmediateCommandLists())
resetCommandLists(Queue);
return PI_SUCCESS;
}

Expand Down Expand Up @@ -5771,6 +5868,7 @@ pi_result _pi_event::reset() {
WaitList = {};
RefCountExternal = 0;
RefCount.reset();
CommandList = std::nullopt;

if (!isHostVisible())
HostVisibleEvent = nullptr;
Expand Down Expand Up @@ -6103,6 +6201,7 @@ static pi_result CleanupCompletedEvent(pi_event Event, bool QueueLocked) {
// code that preceded this implementation.
while (!EventsToBeReleased.empty()) {
pi_event DepEvent = EventsToBeReleased.front();
DepEvent->Completed = true;
EventsToBeReleased.pop_front();

pi_kernel DepEventKernel = nullptr;
Expand Down Expand Up @@ -6169,25 +6268,40 @@ pi_result piEventsWait(pi_uint32 NumEvents, const pi_event *EventList) {
std::unordered_set<pi_queue> Queues;
for (uint32_t I = 0; I < NumEvents; I++) {
{
std::shared_lock<pi_shared_mutex> EventLock(EventList[I]->Mutex);
if (!EventList[I]->hasExternalRefs())
die("piEventsWait must not be called for an internal event");
{
std::shared_lock<pi_shared_mutex> EventLock(EventList[I]->Mutex);
if (!EventList[I]->hasExternalRefs())
die("piEventsWait must not be called for an internal event");

if (!EventList[I]->Completed) {
auto HostVisibleEvent = EventList[I]->HostVisibleEvent;
if (!HostVisibleEvent)
die("The host-visible proxy event missing");
if (!EventList[I]->Completed) {
auto HostVisibleEvent = EventList[I]->HostVisibleEvent;
if (!HostVisibleEvent)
die("The host-visible proxy event missing");

ze_event_handle_t ZeEvent = HostVisibleEvent->ZeEvent;
zePrint("ZeEvent = %#lx\n", pi_cast<std::uintptr_t>(ZeEvent));
ZE_CALL(zeHostSynchronize, (ZeEvent));
ze_event_handle_t ZeEvent = HostVisibleEvent->ZeEvent;
zePrint("ZeEvent = %#lx\n", pi_cast<std::uintptr_t>(ZeEvent));
ZE_CALL(zeHostSynchronize, (ZeEvent));
EventList[I]->Completed = true;
}
}
if (auto Q = EventList[I]->Queue) {
if (Q->Device->useImmediateCommandLists() && Q->isInOrderQueue())
// Use information about waited event to cleanup completed events in
// the in-order queue.
CleanupEventsInImmCmdLists(EventList[I]->Queue,
/* QueueLocked */ false,
/* QueueSynced */ false, EventList[I]);
else {
// NOTE: we are cleaning up after the event here to free resources
// sooner in case run-time is not calling piEventRelease soon enough.
CleanupCompletedEvent(EventList[I]);
// For the case when we have out-of-order queue or regular command
// lists its more efficient to check fences so put the queue in the
// set to cleanup later.
Queues.insert(Q);
}
}
if (auto Q = EventList[I]->Queue)
Queues.insert(Q);
}
// NOTE: we are cleaning up after the event here to free resources
// sooner in case run-time is not calling piEventRelease soon enough.
CleanupCompletedEvent(EventList[I]);
}

// We waited some events above, check queue for signaled command lists and
Expand Down Expand Up @@ -6579,7 +6693,8 @@ pi_result piEnqueueEventsWait(pi_queue Queue, pi_uint32 NumEventsInWaitList,
}
}

resetCommandLists(Queue);
if (!Queue->Device->useImmediateCommandLists())
resetCommandLists(Queue);

return PI_SUCCESS;
}
Expand Down Expand Up @@ -6849,6 +6964,11 @@ pi_result _pi_queue::synchronize() {
ZE_CALL(zeHostSynchronize, (zeEvent));
Event->Completed = true;
PI_CALL(piEventRelease(Event));

// Cleanup all events from the synced command list.
auto EventListToCleanup = std::move(ImmCmdList->second.EventList);
ImmCmdList->second.EventList.clear();
CleanupEventListFromResetCmdList(EventListToCleanup, true);
return PI_SUCCESS;
};

Expand All @@ -6865,7 +6985,6 @@ pi_result _pi_queue::synchronize() {
if (ZeQueue)
ZE_CALL(zeHostSynchronize, (ZeQueue));
}

LastCommandEvent = nullptr;

// With the entire queue synchronized, the active barriers must be done so we
Expand Down
25 changes: 19 additions & 6 deletions sycl/plugins/level_zero/pi_level_zero.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
#include <map>
#include <memory>
#include <mutex>
#include <optional>
#include <shared_mutex>
#include <string>
#include <sycl/detail/pi.h>
Expand Down Expand Up @@ -820,14 +821,23 @@ struct _pi_queue : _pi_object {
createCommandList(bool UseCopyEngine, pi_command_list_ptr_t &CommandList,
ze_command_queue_handle_t *ForcedCmdQueue = nullptr);

// Resets the Command List and Associated fence in the ZeCommandListFenceMap.
// If the reset command list should be made available, then MakeAvailable
// needs to be set to true. The caller must verify that this command list and
// fence have been signalled. The EventListToCleanup contains a list of events
// from the command list which need to be cleaned up.
/// @brief Resets the command list and associated fence in the map and removes
/// events from the command list.
/// @param CommandList The caller must verify that this command list and fence
/// have been signalled.
/// @param MakeAvailable If the reset command list should be made available,
/// then MakeAvailable needs to be set to true.
/// @param EventListToCleanup The EventListToCleanup contains a list of
/// events from the command list which need to be cleaned up.
/// @param CheckStatus Hint informing whether we need to check status of the
/// events before removing them from the immediate command list. This is
/// needed because immediate command lists are not associated with fences and
/// in general status of the event needs to be checked.
/// @return PI_SUCCESS if successful, PI error code otherwise.
pi_result resetCommandList(pi_command_list_ptr_t CommandList,
bool MakeAvailable,
std::vector<_pi_event *> &EventListToCleanup);
std::vector<pi_event> &EventListToCleanup,
bool CheckStatus = true);

// Returns true if an OpenCommandList has commands that need to be submitted.
// If IsCopy is 'true', then the OpenCommandList containing copy commands is
Expand Down Expand Up @@ -1286,6 +1296,9 @@ struct _pi_event : _pi_object {
// This list must be destroyed once the event has signalled.
_pi_ze_event_list_t WaitList;

// Command list associated with the pi_event.
std::optional<pi_command_list_ptr_t> CommandList;

// Tracks if the needed cleanup was already performed for
// a completed event. This allows to control that some cleanup
// actions are performed only once.
Expand Down

0 comments on commit 179ffa1

Please sign in to comment.