Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DML EP] Add BFC allocator #16634

Open
wants to merge 90 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 85 commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
f5a87a4
WIP
PatriceVignola Jan 18, 2023
707c1c9
WIP
PatriceVignola Jan 18, 2023
0619fa3
WIP
PatriceVignola Jan 18, 2023
6b62b72
WIP
PatriceVignola Jan 19, 2023
3f2910b
WIP
PatriceVignola Jan 19, 2023
25bb52d
WIP
PatriceVignola Jan 20, 2023
92f51a3
Remove sub allocator
PatriceVignola Jan 23, 2023
c0cbcae
WIP
PatriceVignola Jan 24, 2023
76328be
WIP
PatriceVignola Jan 25, 2023
7bd0983
WIP
PatriceVignola Jan 25, 2023
0c35fc2
WIP
PatriceVignola Jan 25, 2023
43c47b9
WIP
PatriceVignola Jan 25, 2023
d0eb5da
WIP
PatriceVignola Jan 25, 2023
3385d20
Add buffer region size alignment
PatriceVignola Jan 26, 2023
4e36efd
Merge branch 'main' of github.com:microsoft/onnxruntime into user/pav…
PatriceVignola Jan 26, 2023
7e5622d
WIP
PatriceVignola Jan 26, 2023
e6897c5
WIP
PatriceVignola Jan 26, 2023
2064baa
WIP
PatriceVignola Jan 27, 2023
b71a5ff
WIP
PatriceVignola Jan 27, 2023
06caff8
WIP
PatriceVignola Jan 28, 2023
e7667f1
WIP
PatriceVignola Jan 28, 2023
a95d434
WIP
PatriceVignola Jan 28, 2023
0729ea2
Fix
PatriceVignola Jan 30, 2023
544637f
Fix
PatriceVignola Jan 30, 2023
ea26855
WIP
PatriceVignola Jan 31, 2023
61dce2e
WIP
PatriceVignola Jan 31, 2023
b9b3fb8
WIP
PatriceVignola Jan 31, 2023
3854807
WIP
PatriceVignola Feb 1, 2023
93d931b
WIP
PatriceVignola Feb 1, 2023
96be36c
Merge branch 'main' of github.com:microsoft/onnxruntime into user/pav…
PatriceVignola Feb 2, 2023
f1cf166
Merge branch 'main' of github.com:microsoft/onnxruntime into user/pav…
PatriceVignola Feb 16, 2023
ef40991
Merge branch 'main' of github.com:microsoft/onnxruntime into user/pav…
PatriceVignola Apr 23, 2023
4e14147
Merge branch 'main' of github.com:microsoft/onnxruntime into user/pav…
PatriceVignola Apr 26, 2023
9c03955
Add hack to work around OOM errors with upload heaps
PatriceVignola Apr 26, 2023
a069e60
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
PatriceVignola Jun 29, 2023
e34abaf
Fix DFT
PatriceVignola Jul 6, 2023
00708a6
Register external allocator
PatriceVignola Jul 6, 2023
9927336
Fix DFT and STFT
PatriceVignola Jul 6, 2023
c20690f
Grid sample
PatriceVignola Jul 6, 2023
0bb5124
Fix WinML API
PatriceVignola Jul 7, 2023
6bc5049
Fix ImageTests.SynchronizeGPUWorkloads test failure
PatriceVignola Jul 7, 2023
14d1c96
Fix ConcurrencyTests.MultiThreadSingleSessionGpu
PatriceVignola Jul 8, 2023
a2809af
Add print statements for CopyBufferRegion
PatriceVignola Jul 11, 2023
2f8bff8
Add print statements for CopyBufferRegion
PatriceVignola Jul 11, 2023
c024d0a
Use Identity for the copy operator
PatriceVignola Jul 11, 2023
fef7df2
Add intermediate buffer for copying
PatriceVignola Jul 11, 2023
e0569c5
Remove aliasing
PatriceVignola Jul 12, 2023
568e550
Revert "Remove aliasing"
PatriceVignola Jul 12, 2023
943ac58
Re-add "Remove aliasing"
PatriceVignola Jul 12, 2023
587489d
Revert "Re-add "Remove aliasing""
PatriceVignola Jul 12, 2023
57d2f46
Remove aliasing
PatriceVignola Jul 12, 2023
7440e74
Fix mish test failure
PatriceVignola Jul 12, 2023
b2e65fc
Remove rest of Aliasing
PatriceVignola Jul 12, 2023
d5be4f1
Add BFC allocator
PatriceVignola Jul 13, 2023
5772339
Add BFC allocator API
PatriceVignola Jul 13, 2023
b06678a
Fix crash
PatriceVignola Jul 13, 2023
bf177f6
Fix prefast error
PatriceVignola Jul 13, 2023
cb2e420
Fix Bucketized allocator crash
PatriceVignola Jul 13, 2023
9c79b1b
Address prefast errors
PatriceVignola Jul 13, 2023
8f37e38
Fix destructors
PatriceVignola Jul 13, 2023
a67641c
Fix typo
PatriceVignola Jul 14, 2023
aa3a207
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
PatriceVignola Jul 29, 2023
e4e34e0
Fix build break
PatriceVignola Jul 29, 2023
0b4cee0
Fix lint errors
PatriceVignola Jul 30, 2023
e14797c
Fix iobinding crash
PatriceVignola Jul 30, 2023
c658755
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
PatriceVignola Jul 31, 2023
16c9524
Add aliasing support to DmlOperatorCopy
PatriceVignola Aug 1, 2023
a95505f
Use identity instead of 2 copies
PatriceVignola Aug 1, 2023
a28358e
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
PatriceVignola Aug 1, 2023
759442b
Enable copy-less I/O binding
PatriceVignola Aug 2, 2023
9f3e430
Fix nonzero coordinates operator
PatriceVignola Aug 2, 2023
2da8999
Fix If test crash
PatriceVignola Aug 3, 2023
26a94e1
Fix output binding crash
PatriceVignola Aug 3, 2023
31270e6
Fix test failures
PatriceVignola Aug 4, 2023
738efb7
Fix upload heap regression
PatriceVignola Aug 5, 2023
ac9e57e
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
PatriceVignola Aug 6, 2023
f64ed2b
Address PR comments
PatriceVignola Aug 6, 2023
216fc39
Fix indentation
PatriceVignola Aug 6, 2023
a25b40c
WIP
PatriceVignola Aug 8, 2023
1a0eaa6
WIP
PatriceVignola Aug 8, 2023
f98f2af
WIP
PatriceVignola Aug 8, 2023
c54b295
Address PR comments
PatriceVignola Aug 8, 2023
26b4e7e
Move allocation free outside of loop
PatriceVignola Aug 9, 2023
163fe5b
Fix linting errors
PatriceVignola Aug 9, 2023
2e4eb2c
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
PatriceVignola Aug 11, 2023
184940a
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
PatriceVignola Aug 15, 2023
e6ae058
Address PR comments
PatriceVignola Aug 16, 2023
b7e40e8
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
Aug 16, 2023
01d9bd2
Fix lint issues
PatriceVignola Aug 16, 2023
a774228
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
PatriceVignola Sep 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions include/onnxruntime/core/framework/execution_provider.h
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,13 @@ class IExecutionProvider {
return default_device_;
};

/**
* Return the appropriate OrtDevice object given OrtMemType that can be used directly by external callers.
*/
virtual OrtDevice GetExternalOrtDeviceByMemType(OrtMemType mem_type) const {
return GetOrtDeviceByMemType(mem_type);
};

/**
* Create Preferred allocators for the current Execution Provider
* This function is a stateless function which creates new instances of Allocator, without storing them in EP.
Expand Down
1 change: 1 addition & 0 deletions include/onnxruntime/core/framework/ortdevice.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ struct OrtDevice {
static const MemoryType CUDA_PINNED = 1;
static const MemoryType HIP_PINNED = 2;
static const MemoryType CANN_PINNED = 3;
static const MemoryType DML_EXTERNAL = 4;
};

constexpr OrtDevice(DeviceType device_type_, MemoryType memory_type_, DeviceId device_id_)
Expand Down
6 changes: 4 additions & 2 deletions onnxruntime/core/framework/allocator.cc
Original file line number Diff line number Diff line change
Expand Up @@ -150,9 +150,11 @@ ORT_API_STATUS_IMPL(OrtApis::CreateMemoryInfo, _In_ const char* name1, enum OrtA
onnxruntime::OpenVINO_GPU, type, OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DEFAULT, static_cast<OrtDevice::DeviceId>(id1)),
id1, mem_type1);
} else if (strcmp(name1, onnxruntime::DML) == 0) {
// Since EPs cannot have 2 allocators with the same OrtMemType and Memory ID,
// we use -1 as the memory ID to represent external allocations that don't have any allocator.
*out = new OrtMemoryInfo(
onnxruntime::DML, type, OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DEFAULT, static_cast<OrtDevice::DeviceId>(id1)),
id1, mem_type1);
onnxruntime::DML, type, OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DML_EXTERNAL, static_cast<OrtDevice::DeviceId>(id1)),
-1, mem_type1);
} else if (strcmp(name1, onnxruntime::HIP) == 0) {
*out = new OrtMemoryInfo(
onnxruntime::HIP, type, OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DEFAULT, static_cast<OrtDevice::DeviceId>(id1)), id1,
Expand Down
39 changes: 24 additions & 15 deletions onnxruntime/core/framework/bfc_arena.cc
Original file line number Diff line number Diff line change
Expand Up @@ -42,22 +42,8 @@ BFCArena::BFCArena(std::unique_ptr<IAllocator> resource_allocator,
stats_.bytes_limit = static_cast<int64_t>(total_memory);

arena_extend_strategy_ = arena_extend_strategy;
UpdateFirstAllocationShrinkageLogic();

PatriceVignola marked this conversation as resolved.
Show resolved Hide resolved
// We never want to shrink the initial allocation if the arena extend strategy is kNextPowerOfTwo.
// This could seem confusingly arbitrary but the rationale is as follows:
// The user selected initial allocation chunk is only valid for the arena extend strategy kNextPowerOfTwo
// and the user has likely chosen this initial value so that any ad-hoc arena extensions/shrinkages could potentially
// be avoided. So we do not consider the initial allocation for shrinkage whatever its usage status.
// On the other hand, if the arena extension strategy is kSameAsRequested, any initial chunk set by the user or otherwise,
// is moot and the arena will only extend based on the request size. In these cases, we consider any allocation for shrinkage
// if it is left unused (even if it is the first allocation).
if (arena_extend_strategy_ == ArenaExtendStrategy::kSameAsRequested) {
// Consider all allocation regions (including first allocation region) for shrinkage
consider_first_allocation_region_for_shrinkage_ = true;
} else { // arena_extend_strategy_ == kNextPowerOfTwo
// Do not consider the first allocation region for shrinkage
consider_first_allocation_region_for_shrinkage_ = false;
}
// Create a bunch of bins of various good sizes.

// We create bins to fit all possible ranges that cover the
Expand Down Expand Up @@ -91,6 +77,29 @@ BFCArena::~BFCArena() {
}
}

void BFCArena::UpdateFirstAllocationShrinkageLogic() {
// We never want to shrink the initial allocation if the arena extend strategy is kNextPowerOfTwo.
// This could seem confusingly arbitrary but the rationale is as follows:
// The user selected initial allocation chunk is only valid for the arena extend strategy kNextPowerOfTwo
// and the user has likely chosen this initial value so that any ad-hoc arena extensions/shrinkages could potentially
// be avoided. So we do not consider the initial allocation for shrinkage whatever its usage status.
// On the other hand, if the arena extension strategy is kSameAsRequested, any initial chunk set by the user or otherwise,
// is moot and the arena will only extend based on the request size. In these cases, we consider any allocation for shrinkage
// if it is left unused (even if it is the first allocation).
if (arena_extend_strategy_ == ArenaExtendStrategy::kSameAsRequested) {
// Consider all allocation regions (including first allocation region) for shrinkage
consider_first_allocation_region_for_shrinkage_ = true;
} else { // arena_extend_strategy_ == kNextPowerOfTwo
// Do not consider the first allocation region for shrinkage
consider_first_allocation_region_for_shrinkage_ = false;
}
}

void BFCArena::SetArenaExtendStrategy(ArenaExtendStrategy arena_extend_strategy) {
arena_extend_strategy_ = arena_extend_strategy;
UpdateFirstAllocationShrinkageLogic();
}

BFCArena::Chunk* BFCArena::ChunkFromHandle(ChunkHandle h) {
ORT_ENFORCE(h < chunks_.size());
return &(chunks_[h]);
Expand Down
8 changes: 8 additions & 0 deletions onnxruntime/core/framework/bfc_arena.h
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,11 @@ class BFCArena : public IAllocator {

~BFCArena() override;

// Allows the caller to change the arena extend strategy after the allocator is done initializing.
// For example, kSameAsRequested may be desirable in certain situations and kNextPowerOfTwo may be
// desirable in others.
void SetArenaExtendStrategy(ArenaExtendStrategy arena_extend_strategy);

// If size is 0, then this function returns either NULL,
// or a unique pointer value that can later be successfully
// passed to free(). Whatever, do not dereference that pointer
Expand Down Expand Up @@ -123,6 +128,9 @@ class BFCArena : public IAllocator {
private:
void DeallocateRawInternal(void* ptr);

// Updates whether the first allocation should be considered for shrinkage depending on the strategy type.
void UpdateFirstAllocationShrinkageLogic();

// A ChunkHandle is an index into the chunks_ vector in BFCAllocator
// kInvalidChunkHandle means an invalid chunk
using ChunkHandle = size_t;
Expand Down
13 changes: 13 additions & 0 deletions onnxruntime/core/framework/utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,19 @@ static Status BatchOrCopyMLValue(const SessionState& session_state,
return Status::OK();
}

#ifdef USE_DML
const bool bothValuesOnGPU = copy_info.source_device.Type() == OrtDevice::GPU && copy_info.target_device.Type() == OrtDevice::GPU;
const bool sourceIsDmlAlloc = copy_info.source_device.MemType() == OrtDevice::MemType::DEFAULT || copy_info.source_device.MemType() == OrtDevice::MemType::DML_EXTERNAL;
const bool targetIsInternalAlloc = copy_info.target_device.MemType() == OrtDevice::MemType::DEFAULT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target_is_internal_alloc since this is ORT code 🐫🐍 rather than the DML EP 🐫🐪.

const bool bothValuesOnSameDevice = copy_info.source_device.Id() == copy_info.target_device.Id();

// The DML EP supports binding external allocations directly, even if the memory types don't match, as long as they are on the same D3D12 device
if (bothValuesOnGPU && sourceIsDmlAlloc && targetIsInternalAlloc && bothValuesOnSameDevice) {
target_mlvalue = source_mlvalue;
return Status::OK();
}
#endif
PatriceVignola marked this conversation as resolved.
Show resolved Hide resolved

auto allocator = session_state.GetAllocator(copy_info.target_device);
if (!target_mlvalue.IsAllocated()) {
ORT_ENFORCE(allocator != nullptr, "Failed to find allocator for device ", copy_info.target_device.ToString());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ interface IMLOperatorRegistry;
#include "core/common/status.h"
#include "core/framework/data_transfer.h"
#include "IWinmlExecutionProvider.h"
#include "core/providers/dml/DmlExecutionProvider/src/DmlBufferRegion.h"

namespace onnxruntime
{
Expand All @@ -17,20 +18,14 @@ namespace onnxruntime
class KernelRegistry;
}

enum class AllocatorRoundingMode
{
Disabled = 0,
Enabled = 1,
};

namespace Dml
{
std::unique_ptr<onnxruntime::IExecutionProvider> CreateExecutionProvider(
IDMLDevice* dmlDevice,
ID3D12CommandQueue* commandQueue,
bool enableMetacommands = true);
bool enableMetacommands,
bool enableBfcAllocator);

ID3D12Resource* GetD3D12ResourceFromAllocation(onnxruntime::IAllocator* allocator, void* ptr);
void FlushContext(onnxruntime::IExecutionProvider* provider);
void ReleaseCompletedReferences(onnxruntime::IExecutionProvider* provider);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#include <optional>

#include "core/framework/op_kernel.h"
#include "core/providers/dml/DmlExecutionProvider/src/DmlBufferRegion.h"

struct AbstractOperatorDesc;
interface IMLOperatorTensor;
Expand All @@ -22,6 +23,11 @@ namespace onnxruntime
class Node;
}

namespace Dml
{
struct TaggedPointer;
}

namespace Windows::AI::MachineLearning::Adapter
{
interface __declspec(uuid("5b19a18a-5ed5-4df2-a363-21b89380a698"))
Expand All @@ -34,19 +40,9 @@ namespace Windows::AI::MachineLearning::Adapter
// the provider's underlying queues.
virtual void QueueReference(IUnknown *object) = 0;

virtual void GetShadowCopyIfRequired(
bool isInternalOperator,
IUnknown* data,
IUnknown** dataCopy) const = 0;

virtual void GetABIDataInterface(
bool isInternalOperator,
IUnknown* data,
IUnknown** abiData) const = 0;
virtual Dml::D3D12BufferRegion GetBufferRegion(void* opaquePointer, uint64_t size) const = 0;

virtual uint64_t TryGetPooledAllocationId(
IUnknown* data,
bool isInternalOperator) = 0;
virtual uint64_t GetUniqueId(void* opaquePointer) = 0;

virtual void GetABIExecutionInterfaceAndInvalidateState(
bool isInternalOperator,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -561,11 +561,17 @@ HRESULT STDMETHODCALLTYPE AbiCustomRegistry::RegisterOperatorKernel(
//
// For backward compatibility, this does not propagate errors for external operators
static_cast<void>(m_kernelRegistry->RegisterCustomKernel(create_info)); // ignore result
m_hasExternalOperators = true;
}

return S_OK;
}
ORT_CATCH_RETURN
}

bool STDMETHODCALLTYPE AbiCustomRegistry::HasExternalOperators() const noexcept
{
return m_hasExternalOperators;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ namespace WRL
}

namespace Windows::AI::MachineLearning::Adapter
{
{

using namespace Microsoft::WRL;

Expand Down Expand Up @@ -49,14 +49,16 @@ class AbiCustomRegistry : public WRL::Base<IMLOperatorRegistry, IMLOperatorRegis
IMLOperatorKernelFactory* operatorKernelFactory,
_In_opt_ IMLOperatorShapeInferrer* shapeInferrer) const noexcept override;

bool STDMETHODCALLTYPE HasExternalOperators() const noexcept override;

std::list<std::shared_ptr<onnxruntime::CustomRegistry>> GetRegistries()
{
std::list<std::shared_ptr<onnxruntime::CustomRegistry>> registries;
for (auto& registry : m_customRegistryOpsetVerMap)
{
registries.push_back(registry.second);
}

registries.push_back(m_kernelRegistry);

return registries;
Expand Down Expand Up @@ -86,15 +88,15 @@ class AbiCustomRegistry : public WRL::Base<IMLOperatorRegistry, IMLOperatorRegis

private:
static onnx::OpSchema ConvertOpSchema(
_In_z_ const char* domain,
_In_z_ const char* domain,
const MLOperatorSchemaDescription& abiSchema,
IMLOperatorTypeInferrer* typeInferrer,
IMLOperatorShapeInferrer* shapeInferrer);

static std::string ConvertFormalParameterType(const MLOperatorSchemaEdgeDescription& formalParameter);
static onnx::OpSchema::FormalParameterOption ConvertFormalParameterOption(MLOperatorParameterOptions options);
static void SetAttributesAndDefaults(onnx::OpSchema& schema, const MLOperatorSchemaDescription& abiSchema);

static AttributeMap GetDefaultAttributes(const MLOperatorKernelDescription* opKernel);

std::shared_ptr<onnxruntime::CustomRegistry> m_kernelRegistry;
Expand All @@ -107,6 +109,8 @@ class AbiCustomRegistry : public WRL::Base<IMLOperatorRegistry, IMLOperatorRegis
// Map between Lotus KernelDefs and extended data used during partitioning
mutable std::shared_ptr<InternalRegistrationInfoMap> m_internalRegInfoMap;

mutable bool m_hasExternalOperators = false;

};

} // namespace Windows::AI::MachineLearning::Adapter
Loading
Loading