-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualcomm AI Engine Direct - add program validation #4297
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,6 +58,7 @@ | |
from executorch.exir import ExirExportedProgram | ||
from executorch.exir.backend.compile_spec_schema import CompileSpec | ||
from executorch.exir.lowered_backend_module import LoweredBackendModule | ||
from executorch.exir.program._program import _get_updated_graph_signature | ||
from torch._decomp import core_aten_decompositions as torch_core_aten_decompositions | ||
from torch.export.exported_program import ExportedProgram | ||
from torch.fx import passes | ||
|
@@ -223,7 +224,12 @@ def capture_program( | |
core_ep.transform(ConvertBinaryOpsWithScalar()) | ||
edge_ep = core_ep.to_edge(qnn_edge_config()) | ||
_transform(edge_ep.exported_program) | ||
|
||
# Since QDQ nodes are stripped, update graph signature again to validate program | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not related to this diff - is it possible to run a mix of fp ops and quantized ops? Does it support well? The reason I'm asking is because we're removing all q/dq ops and the insert back to the i/o. It may limit us to do mixed dtypes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently mixed precision only supports for quantized ops since compiler spec for HTP precision (quantized or fp16) is on graph level granularity. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To do so, it would be great if the framework interface can provide runtime option (like having an argument in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah we've been discussing how to pass the runtime option at the interface. Ideally passed it by the backend context. Question: do you need it to be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
edge_ep.exported_program._graph_signature = _get_updated_graph_signature( | ||
edge_ep.exported_program.graph_signature, | ||
edge_ep.exported_program.graph_module, | ||
) | ||
edge_ep.exported_program._validate() | ||
return edge_ep | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -352,14 +352,7 @@ def to_backend( | |
ExportedProgram: The input program, with some portions targeted for delegation. | ||
""" | ||
# Use fake program, with FakeTensors in the state dict, to avoid copying large constant values. | ||
# Fall back to deepcopy if no fake mode is found. TODO(T182910699): Remove this fallback. | ||
try: | ||
fake_edge_program = get_fake_program(edge_program) | ||
except Exception as e: | ||
logging.warning( | ||
f"Error in get_fake_program for graph {edge_program.graph_module}, fallback to deepcopy: {e}" | ||
) | ||
fake_edge_program = copy.deepcopy(edge_program) | ||
fake_edge_program = get_fake_program(edge_program) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change can be separate - we may need to test it with broader tests in case some paths still rely on the fallback. Glad to see qnn path can work with the fake edge program! I believe the RAM usage will go down quite a bit now. |
||
partitioner_result = partitioner_instance(fake_edge_program) | ||
tagged_exported_program = partitioner_result.tagged_exported_program | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -84,25 +84,16 @@ option(EXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT | |
) | ||
|
||
if(EXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT) | ||
# Add the host project. We build this separately so that we can generate | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also thanks for fixing the sdk build! cc: @Olivia-liu @tarun292 |
||
# headers on the host during the build, even if we're cross-compiling the | ||
# flatcc runtime to a different architecture. | ||
|
||
# lint_cmake: -readability/wonkycase | ||
ExternalProject_Add( | ||
flatcc_project | ||
PREFIX ${CMAKE_BINARY_DIR}/_host_build | ||
SOURCE_DIR ${_flatcc_source_dir} | ||
BINARY_DIR ${CMAKE_BINARY_DIR}/_host_build | ||
CMAKE_CACHE_ARGS | ||
-DFLATCC_TEST:BOOL=OFF -DFLATCC_REFLECTION:BOOL=OFF | ||
# See above comment about POSITION_INDEPENDENT_CODE. | ||
-DCMAKE_POSITION_INDEPENDENT_CODE:BOOL=ON | ||
INSTALL_COMMAND "" # Prevent the install step | ||
execute_process( | ||
COMMAND ${CMAKE_COMMAND} ${_flatcc_source_dir} | ||
-DFLATCC_TEST=OFF -DFLATCC_REFLECTION=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON | ||
-B${CMAKE_BINARY_DIR}/_host_build | ||
) | ||
set(_etdump_schema_gen_dep flatcc_project) | ||
execute_process( | ||
COMMAND ${CMAKE_COMMAND} --build ${CMAKE_BINARY_DIR}/_host_build | ||
) | ||
set(_etdump_schema_gen_dep) | ||
else() | ||
# If we're not cross-compiling, we can just use the plain commandline target. | ||
set(_etdump_schema_gen_dep flatcc_cli) | ||
endif() | ||
|
||
|
@@ -134,42 +125,11 @@ add_library( | |
bundled_program_schema INTERFACE ${_bundled_program_schema__outputs} | ||
) | ||
|
||
# Ensure the host tool is built before the main project | ||
add_dependencies(etdump_schema flatcc_cli) | ||
|
||
file(MAKE_DIRECTORY ${_program_schema__include_dir}/executorch/sdk/etdump) | ||
file(MAKE_DIRECTORY | ||
${_program_schema__include_dir}/executorch/sdk/bundled_program | ||
) | ||
|
||
if(EXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT) | ||
# If we cross-compiling, we need to use the version of the commandline tool | ||
# built for the host. | ||
set(_etdump_schema_gen_dep flatcc_project) | ||
|
||
# TODO(dbort): flatcc installs its files directly in its source directory | ||
# instead of under CMAKE_BINARY_DIR, and it has no options to avoid doing | ||
# this. We build flatcc twice in the executorch build: once to get the | ||
# `flatcc` host commandline tool, and once to get the (potentially | ||
# cross-compiled) target runtime library. The host build will put its outputs | ||
# in the source tree, making the cross-compiling target build think that the | ||
# outputs have already been built. It will then try to link against the | ||
# host-architecture libraries, failing when cross-compiling. To work around | ||
# this, delete the host outputs after running this command (which only runs | ||
# when setting up the cmake files, not when actually building). This leaves | ||
# room for the target build to put its own files in the source tree. We should | ||
# try to remove this hack, ideally by submitting an upstream PR that adds an | ||
# option to change the installation location. | ||
set(_etdump_schema_cleanup_paths ${_flatcc_source_dir}/bin/* | ||
${_flatcc_source_dir}/lib/* | ||
) | ||
else() | ||
# If we're not cross-compiling we can use the plain commandline target, and we | ||
# don't need to delete any files. | ||
set(_etdump_schema_gen_dep flatcc_cli) | ||
set(_etdump_schema_cleanup_paths "") | ||
endif() | ||
|
||
add_custom_command( | ||
OUTPUT ${_etdump_schema__outputs} | ||
COMMAND | ||
|
@@ -179,13 +139,19 @@ add_custom_command( | |
${_flatcc_source_dir}/bin/flatcc -cwr -o | ||
${_program_schema__include_dir}/executorch/sdk/etdump | ||
${_etdump_schema__srcs} | ||
COMMAND rm -f ${_etdump_schema_cleanup_paths} | ||
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}/sdk | ||
DEPENDS ${_etdump_schema_gen_dep} | ||
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}/sdk | ||
COMMENT "Generating etdump headers" | ||
VERBATIM | ||
) | ||
|
||
add_custom_target( | ||
etdump_schema_generated | ||
DEPENDS ${_etdump_schema__outputs} | ||
) | ||
|
||
add_dependencies(etdump_schema etdump_schema_generated) | ||
|
||
add_library( | ||
etdump ${CMAKE_CURRENT_SOURCE_DIR}/etdump/etdump_flatcc.cpp | ||
${CMAKE_CURRENT_SOURCE_DIR}/etdump/emitter.cpp | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
coeff
inserted to graph or just an intermediate tensor? If it's inserted, we may need to lift it to the i/o?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's inserted, but we make it static inside the graph for it could be identified when building operator. This can reduce extra memory copies, I think.