Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pull from qingshui #8

Merged
merged 21 commits into from
Jan 16, 2024
Merged

Commits on Dec 7, 2023

  1. add WeightOnlyLinear2Kernel support M,N,K

    humingqing authored and humingqing committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    5c6d034 View commit details
    Browse the repository at this point in the history
  2. opt fused moe

    humingqing authored and humingqing committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    6448833 View commit details
    Browse the repository at this point in the history
  3. add moe weight only op

    humingqing authored and humingqing committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    11995cb View commit details
    Browse the repository at this point in the history
  4. rollback paddle/fluid/operators/fused/fused_attention_op.cu

    humingqing authored and humingqing committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    dcd67aa View commit details
    Browse the repository at this point in the history
  5. reused workspace gpu memory

    humingqing authored and humingqing committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    fa9e8e8 View commit details
    Browse the repository at this point in the history

Commits on Dec 11, 2023

  1. Merge pull request #102 from laipaang/qingshui-2.4.2

    beam support 20/30 and fused_multi_transformer_int8 keep fp32
    laipaang authored Dec 11, 2023
    Configuration menu
    Copy the full SHA
    5535c6f View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2023

  1. Optimize MOE communication and remove unnecessary calculations

    humingqing authored and humingqing committed Dec 12, 2023
    Configuration menu
    Copy the full SHA
    d619f40 View commit details
    Browse the repository at this point in the history

Commits on Dec 13, 2023

  1. Merge pull request #103 from laipaang/qingshui-2.4.2

    share_external_data_paddle_tensor
    laipaang authored Dec 13, 2023
    Configuration menu
    Copy the full SHA
    e784b88 View commit details
    Browse the repository at this point in the history

Commits on Dec 14, 2023

  1. Merge pull request #104 from laipaang/qingshui-2.4.2

    fix import paddle
    laipaang authored Dec 14, 2023
    Configuration menu
    Copy the full SHA
    9f3ffa5 View commit details
    Browse the repository at this point in the history

Commits on Dec 15, 2023

  1. fix weightonly int4 quant, Scale improves performance by 5% using fp16

    humingqing authored and humingqing committed Dec 15, 2023
    Configuration menu
    Copy the full SHA
    6044151 View commit details
    Browse the repository at this point in the history
  2. fix weightonly int4 quant, Scale improves performance by 5% using fp16

    humingqing authored and humingqing committed Dec 15, 2023
    Configuration menu
    Copy the full SHA
    6ebb837 View commit details
    Browse the repository at this point in the history
  3. fix weightonly int8 and format code

    humingqing authored and humingqing committed Dec 15, 2023
    Configuration menu
    Copy the full SHA
    ce831e8 View commit details
    Browse the repository at this point in the history

Commits on Dec 29, 2023

  1. fix cuda11.8, add tensor shard buffer

    humingqing authored and humingqing committed Dec 29, 2023
    Configuration menu
    Copy the full SHA
    de67186 View commit details
    Browse the repository at this point in the history
  2. fix flash sm90

    humingqing authored and humingqing committed Dec 29, 2023
    Configuration menu
    Copy the full SHA
    31e7cff View commit details
    Browse the repository at this point in the history
  3. fix cutlass

    humingqing authored and humingqing committed Dec 29, 2023
    Configuration menu
    Copy the full SHA
    ef35202 View commit details
    Browse the repository at this point in the history

Commits on Jan 2, 2024

  1. cutlass3.0

    humingqing authored and humingqing committed Jan 2, 2024
    Configuration menu
    Copy the full SHA
    9293d34 View commit details
    Browse the repository at this point in the history

Commits on Jan 3, 2024

  1. fix weight only moe

    humingqing authored and humingqing committed Jan 3, 2024
    Configuration menu
    Copy the full SHA
    97e5554 View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2024

  1. add cutlass3.0, support moe expert aggregate gemm

    humingqing authored and humingqing committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    18d403e View commit details
    Browse the repository at this point in the history
  2. add cutlass3.0, support moe expert aggregate gemm

    humingqing authored and humingqing committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    4b57358 View commit details
    Browse the repository at this point in the history

Commits on Jan 9, 2024

  1. add cusparseLt 0.4 to speed up ffn1,ffn2,qkvo multiplication,speed up…

    … 22% (#107)
    
    Co-authored-by: yangjunchao <yangjunchao@baidu.com>
    chao9527 and yangjunchao authored Jan 9, 2024
    Configuration menu
    Copy the full SHA
    80933a8 View commit details
    Browse the repository at this point in the history

Commits on Jan 10, 2024

  1. add fuse mt weight only quant (#105)

    * add fuse mt weight only quant
    miaoli06 authored Jan 10, 2024
    Configuration menu
    Copy the full SHA
    ae56e86 View commit details
    Browse the repository at this point in the history