forked from PaddlePaddle/Paddle
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flash attn for af2 #8
Open
Xreki
wants to merge
428
commits into
develop
Choose a base branch
from
add_flash_attn_for_af2
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Xreki
force-pushed
the
add_flash_attn_for_af2
branch
2 times, most recently
from
April 24, 2023 15:16
de37e2f
to
cf4a1c8
Compare
* test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop
…dle#50915)" (PaddlePaddle#53527) This reverts commit 9c40653.
* move UniformRawKernel to legacy * Update uniform_kernel.cc * Update uniform_kernel.cu * Update uniform_kernel.cc * Update uniform_kernel.cu * Update uniform_kernel.h * Update uniform_kernel.cc * Empty Commit to setup deployments
* rem npu in test * restore some code
* Add trt pow converter. * update to use AddConstantLayer * add dims=0 ut
* Rename randint_raw and move it to legacy * Update fetch_v2_op.cc * Update randint_kernel.cc * Update randint_kernel.cu * Empty Commit to setup deployments
* polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish
* use int64 to calc dim for c softmax * fix complie bug
* Add fused_gate_attention API. * Implement FusedDropout API. * Fix doc and add unittest. * Skip for non-gpu device. * Add unittest.
* add OpTrait OpInterface ValueIterator TypeList * refine code * refine code * refine code * add opinfo * add typeid copy constructor * add trait interface construct method for opinfo_impl * add trait interface construct method for opinfo_impl * add trait interface construct method for opinfo_impl * add trait interface construct method for opinfo_impl * add trait interface construct method for opinfo_impl * add create * add member func for opinfo * fix compile bug * add op interface in ircontext * fix compile bug * fix compile bug * refine code * fix compile bug * add ut * refine ut * refine code of opinfo_impl * delete unused code * add dyncast for operation * refine comment * refine opinfo_impl * delete unused code * refine code by comment * refine code * refine code * refine code for registerOp * refine opfin create * refine code of search method of ircontext * refine op attribute * change opinfo_map key from type_id to string
* add mul doubel grad * add sub_double_grad * add add sub high test * add mutiply test * modify other unsqueeze * delete api.yaml * only for make ci run * midify unsqueeze * modify unsqueeze * tmp * modify operants gen * review modify * modify review * debug * debug * modify ci cross boundary * delete log
* fix strided_slice ut * remove check_dygraph
…yer (PaddlePaddle#53554) * add lookup_table op trt converter * update
…addle#53744) * optimize logsumexp in small data scale * fix * fix * add #pragma once * compile protobuf offline * add submodlu gflags * check_submodules * check_submodules * add_submodule protobuf * add_submodule_protobuf * add_submodule * add .gitmodules * add_submodules * fix_compiler error * support offline compile * support offline compile * support offline_compile * remove cub * remove brpc * support offline compile * support offline compile * canning patching on cryptopp * modify .gitigonre of cryptopp * test * offline compile * add_submodule zlib * modify .gitmodules * modify .gitmodules * fix setup.py bug * delete submodule cryptopp * fix windows compile bug * fix xxhash compile problem --------- Co-authored-by: Asthestarsfalll <1186454801@qq.com> Co-authored-by: Asthestarsfalll <72954905+Asthestarsfalll@users.noreply.github.com>
* add master gradients on static graph * add unit test for bf16 master grad static graph * use float16 as v100 test dtype * only skip GPU which do not support bf16 * use linear layer to test master grad * 1.push master grad creation before all optimizer ops; 2.remove useless unittest; 3.use a function to create master grad states
* rm cmake npu * Update generic.cmake * Update generic.cmake
* rm tools npu * Update get_pr_ut.py * Update get_pr_ut.py
…53862) * [XPU] do not call check_nccl_version_for_p2p under xpu * refine code.
* simplify layer_norm_op.cc * support auto generate for op layer_norm * update unittest for composite_layer_norm * remove layer_norm_op.cc from scripts * replace layer_norm_op with generated_op * add get_expected_kernel for layer_norm * update cmake kernel register function for layer_norm_mkldnn_op
…ddle#52006) * [Dy2static-Fallback] add set_eval_frame function in pybind. 1. add set_eval_frame function in pybind. * add unittest for eval frame hooker. * [support py38] * fix-GeneratorExit error in eval frame hooker * support python == 3.9 * support 3.10 * fix some comments
* move sequence_mask op InferShape func * add dtype infer
* Fused elementwises kernels and ops * change fuse pass name * adjust .pbtxt files * adjust quantization attributes * add missing arguments and fix others, review fixed * simplify fused kernel registration * fix elementwise unit tests * reuse one fused elementwise op * adjust proto * Add supported datatypes * Change 'Scale' to 'scale' in tests, change some tests to onednn * Revert breaking changes * Fix unit tests * Delete obsolete test cases * Delete commented out code * Fix codestyle * delete temporary condition * fix conflicts and delete duplicate fusing * Fix code after merge * Move tests to new directory * fix tests volatility * Rename test_elementwise_add_onednn_op.py to test_elementwise_add_mkldnn_op.py * Update CMakeLists.txt add mkldnn op test --------- Co-authored-by: Silv3S <slawomir.siwek@intel.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
Performance optimization
PR changes
OPs
Description
RT