Qualcomm AI Engine Direct - add program validation #4297

haowhsu-quic · 2024-07-18T15:10:11Z

Summary:

update graph signature for get_fake_program to work properly
make sure program is valid after capture_program
retire cature_pre_autograd_graph
fix release build error & make cross-compile flatcc deterministic
some minor fix

Summary: - update graph signature for get_fake_program to work properly - make sure program is valid after capture_program - retire cature_pre_autograd_graph - fix release build error & make cross-compile flatcc deterministic - some minor fix

pytorch-bot · 2024-07-18T15:10:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4297

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 77e4060 with merge base 4a88318 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

Lint / Build failed (gh) (detected as infra flaky with no runner)
pull / Build failed (gh) (detected as infra flaky with no runner)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

haowhsu-quic · 2024-07-18T15:32:41Z

Hi @cccclai, this PR relates to #3860.
We also fix some build error on latest mainline.

For flatcc cross compile issue, since the original flow make ExternalProject_Add and build process happen concurrently which would fail randomly if the finish order was not deterministic.
I think we should move the host build to cmake configuration stage and leave the cross build only when compiling.

Thank you.

dbort · 2024-07-18T17:33:00Z

Thank you for these changes, @haowhsu-quic. To help focus our reviews, please create separate PRs for:

The backend_api.py change
The sdk/CMakeLists.txt change
The Qualcomm changes

cccclai

Thank you for putting up the pr!

cccclai · 2024-07-18T17:38:24Z

exir/backend/backend_api.py

-            f"Error in get_fake_program for graph {edge_program.graph_module}, fallback to deepcopy: {e}"
-        )
-        fake_edge_program = copy.deepcopy(edge_program)
+    fake_edge_program = get_fake_program(edge_program)


This change can be separate - we may need to test it with broader tests in case some paths still rely on the fallback. Glad to see qnn path can work with the fake edge program! I believe the RAM usage will go down quite a bit now.

cccclai · 2024-07-18T17:38:49Z

sdk/CMakeLists.txt

@@ -84,25 +84,16 @@ option(EXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT
 )

 if(EXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT)
-  # Add the host project. We build this separately so that we can generate


Also thanks for fixing the sdk build! cc: @Olivia-liu @tarun292

cccclai · 2024-07-18T17:39:36Z

backends/qualcomm/builders/op_prelu.py

+            # param nodes will be FakeTensor when doing partition
+            # fill in random numeric for validation
+            if isinstance(coeff, torch._subclasses.fake_tensor.FakeTensor):
+                coeff = torch.ones(coeff.shape)


Is coeff inserted to graph or just an intermediate tensor? If it's inserted, we may need to lift it to the i/o?

It's inserted, but we make it static inside the graph for it could be identified when building operator. This can reduce extra memory copies, I think.

haowhsu-quic · 2024-07-19T00:21:26Z

Thank you for these changes, @haowhsu-quic. To help focus our reviews, please create separate PRs for:

The backend_api.py change

The sdk/CMakeLists.txt change

The Qualcomm changes

Thank you, sorry for the inconvenience:
backend_api.py change: #4311
sdk/CMakeList.txt change: #4312

dbort · 2024-07-19T00:22:59Z

Thank you so much for splitting up the PR! This is very helpful for us.

cccclai · 2024-07-19T21:40:52Z

backends/qualcomm/utils/utils.py

@@ -223,7 +224,12 @@ def capture_program(
    core_ep.transform(ConvertBinaryOpsWithScalar())
    edge_ep = core_ep.to_edge(qnn_edge_config())
    _transform(edge_ep.exported_program)
-
+    # Since QDQ nodes are stripped, update graph signature again to validate program


Not related to this diff - is it possible to run a mix of fp ops and quantized ops? Does it support well? The reason I'm asking is because we're removing all q/dq ops and the insert back to the i/o. It may limit us to do mixed dtypes.

Currently mixed precision only supports for quantized ops since compiler spec for HTP precision (quantized or fp16) is on graph level granularity.
We'll have multi-graphs change in the near future, hopefully some mechanisms like weight sharing / fp mixed precision could be well-supported at that time.

To do so, it would be great if the framework interface can provide runtime option (like having an argument in method::execute()) for backend to react: e.g. change performance config, select which graph in the context to be executed.

Yeah we've been discussing how to pass the runtime option at the interface. Ideally passed it by the backend context. Question: do you need it to be method::init time or method::execute time?

method::execute looks more flexible on QNN. Look forward to the change!

cccclai

Looks great! Thank you for adding the validation and reduce the deep copy.

facebook-github-bot · 2024-07-19T21:41:47Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-07-22T22:37:58Z

@cccclai merged this pull request in 0e2b205.

Summary: As mentioned in #4297, the original flow makes host / cross build happen concurrently. This change moves host build process into cmake configuring stage and refine related dependencies. Pull Request resolved: #4312 Test Plan: - cross-compile > Through running `backends/qualcomm/script/build.sh --release`, we could check if the compiling process successfully finished. - native-compile > Run following to check: ```shell cmake \ -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DQNN_SDK_ROOT=${QNN_SDK_ROOT} \ -DEXECUTORCH_BUILD_QNN=ON \ -DEXECUTORCH_BUILD_SDK=ON \ -DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \ -S $EXECUTORCH_ROOT \ -B $EXECUTORCH_ROOT/build_x86_64 \ ``` Reviewed By: tarun292 Differential Revision: D60243701 Pulled By: dbort fbshipit-source-id: ff8d8cb06f0cc296c7ef465596e7e3df367dd059

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 18, 2024

try resolving CI error

bdb94f4

dbort requested a review from cccclai July 18, 2024 17:30

cccclai reviewed Jul 18, 2024

View reviewed changes

seperate into multi PRs

77e4060

This was referenced Jul 19, 2024

Retire deep copy of edge_program in to_backend method #4311

Open

Make flatcc cross-compile deterministic #4312

Closed

cccclai reviewed Jul 19, 2024

View reviewed changes

cccclai approved these changes Jul 19, 2024

View reviewed changes

facebook-github-bot closed this in 0e2b205 Jul 22, 2024

facebook-github-bot added the Merged label Jul 22, 2024

dbort mentioned this pull request Aug 2, 2024

Make flatcc cross-compile deterministic #4515

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qualcomm AI Engine Direct - add program validation #4297

Qualcomm AI Engine Direct - add program validation #4297

haowhsu-quic commented Jul 18, 2024

pytorch-bot bot commented Jul 18, 2024 •

edited

Loading

haowhsu-quic commented Jul 18, 2024 •

edited

Loading

dbort commented Jul 18, 2024

cccclai left a comment

cccclai Jul 18, 2024

cccclai Jul 18, 2024

cccclai Jul 18, 2024

haowhsu-quic Jul 19, 2024

haowhsu-quic commented Jul 19, 2024

dbort commented Jul 19, 2024

cccclai Jul 19, 2024

haowhsu-quic Jul 20, 2024

haowhsu-quic Jul 22, 2024 •

edited

Loading

cccclai Jul 22, 2024

haowhsu-quic Jul 22, 2024

cccclai left a comment

facebook-github-bot commented Jul 19, 2024

facebook-github-bot commented Jul 22, 2024

Qualcomm AI Engine Direct - add program validation #4297

Qualcomm AI Engine Direct - add program validation #4297

Conversation

haowhsu-quic commented Jul 18, 2024

pytorch-bot bot commented Jul 18, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4297

✅ You can merge normally! (2 Unrelated Failures)

haowhsu-quic commented Jul 18, 2024 • edited Loading

dbort commented Jul 18, 2024

cccclai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haowhsu-quic commented Jul 19, 2024

dbort commented Jul 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haowhsu-quic Jul 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cccclai left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jul 19, 2024

facebook-github-bot commented Jul 22, 2024

pytorch-bot bot commented Jul 18, 2024 •

edited

Loading

haowhsu-quic commented Jul 18, 2024 •

edited

Loading

haowhsu-quic Jul 22, 2024 •

edited

Loading