-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PPDiffuers] Add CycleDiffusion based on FastDeploy #4945
[PPDiffuers] Add CycleDiffusion based on FastDeploy #4945
Conversation
Thanks for your contribution! |
Codecov Report
@@ Coverage Diff @@
## develop #4945 +/- ##
===========================================
+ Coverage 46.35% 48.96% +2.60%
===========================================
Files 448 455 +7
Lines 64646 66517 +1871
===========================================
+ Hits 29965 32567 +2602
+ Misses 34681 33950 -731
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
untruncated_ids = self.tokenizer(prompt, padding="longest", return_tensors="np").input_ids | ||
|
||
if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not paddle.equal_all( | ||
text_input_ids, untruncated_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
text_input_ids
和untruncated_ids
是np的,这里paddle.equal_all是否需要用np的呢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感谢review,已经更新了~
).prev_sample | ||
if i == len(timesteps) - 1: | ||
# sync for accuracy it/s measure | ||
paddle.device.cuda.synchronize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是否是必须的呢,看其他的pipelile包括FastDeploy pipeline里没有这个
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
主要保证多流kernel异步发射的最终结果正确,需要同步
if use_fp16: | ||
option.trt_option.enable_fp16 = True | ||
cache_file = os.path.join(model_dir, model_prefix, "inference.trt") | ||
option.set_trt_cache_file(cache_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看到上面的代码已经开始在使用1.0.4的API了, option.paddle_infer_option
,那就统一都开始切为新版本API吧
改为option.trt_option.serialize_file = cache_file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
option.set_trt_cache_file(cache_file) | ||
# Need to enable collect shape for ernie | ||
if dynamic_shape is not None: | ||
option.enable_paddle_trt_collect_shape() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
option.paddle_infer_option.collect_trt_shape = True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
option.enable_paddle_trt_collect_shape() | ||
for key, shape_dict in dynamic_shape.items(): | ||
option.set_trt_input_shape( | ||
key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
option.trt_option.set_shape(name, min, opt, max)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
option.use_lite_backend() | ||
if device == "huawei_ascend_npu": | ||
option.use_ascend() | ||
option.set_lite_device_names(["huawei_ascend_npu"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此行代码应该无需显式调用,删除即可
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
option.set_lite_device_names(["huawei_ascend_npu"]) | ||
option.set_lite_model_cache_dir(os.path.join(model_dir, model_prefix)) | ||
option.set_lite_context_properties( | ||
"HUAWEI_ASCEND_NPU_SELECTED_DEVICE_IDS={};HUAWEI_ASCEND_NPU_PRECISION_MODE=allow_mix_precision".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
option.paddle_lite_option.nnadapter_model_cache_dir = os.path.join(model_dir, model_prefix)
option.paddle_lite_option.nnadapter_context_properties = "HUAWEI_ASCEND_NPU_SELECTED_DEVICE_IDS={};HUAWEI_ASCEND_NPU_PRECISION_MODE=allow_mix_precision".format(device_id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
option = fd.RuntimeOption() | ||
option.use_trt_backend() | ||
option.use_gpu(device_id) | ||
option.enable_trt_fp16() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
option.trt_option.enable_fp16 = True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
option.use_trt_backend() | ||
option.use_gpu(device_id) | ||
option.enable_trt_fp16() | ||
option.set_trt_max_workspace_size(workspace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
option.trt_option.max_workspace_size = workspace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
option.set_trt_max_workspace_size(workspace) | ||
if dynamic_shape is not None: | ||
for key, shape_dict in dynamic_shape.items(): | ||
option.set_trt_input_shape( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
option.set_shape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
onnx_file = os.path.join(model_dir, model_prefix, "inference.onnx") | ||
option.set_model_path(onnx_file, model_format=ModelFormat.ONNX) | ||
cache_file = os.path.join(model_dir, model_prefix, "inference.trt") | ||
option.set_trt_cache_file(cache_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
option.trt_option.serialize_file = cache_file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
PR types
New features
PR changes
Models
Description
Add CycleDiffusion based on FastDeploy.
Benchmark
Run CycleDiffusionPipeline 10 times and get average latency to compare the performace between FastDeploy version and pytorch version.
Average latency