Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Half Precision not detected for RTX 3090 #104

Open
BA8F0D39 opened this issue Apr 14, 2023 · 3 comments
Open

Half Precision not detected for RTX 3090 #104

BA8F0D39 opened this issue Apr 14, 2023 · 3 comments

Comments

@BA8F0D39
Copy link

clpeak version: 1.1.2

Platform: NVIDIA CUDA
  Device: NVIDIA GeForce RTX 3090
    Driver version  : 525.89.02 (Linux x64)
    Compute units   : 82
    Clock frequency : 1725 MHz

    Global memory bandwidth (GBPS)
      float   : 816.91
      float2  : 841.68
      float4  : 856.31
      float8  : 785.62
      float16 : 844.80

    Single-precision compute (GFLOPS)
      float   : 35976.15
      float2  : 35279.88
      float4  : 35448.44
      float8  : 35229.30
      float16 : 34781.18

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 635.40
      double2  : 634.58
      double4  : 633.12
      double8  : 630.11
      double16 : 624.10

    Integer compute (GIOPS)
      int   : 19650.09
      int2  : 19531.53
      int4  : 19486.43
      int8  : 19548.59
      int16 : 19539.19

    Integer compute Fast 24bit (GIOPS)
      int   : 19452.70
      int2  : 18920.43
      int4  : 19145.33
      int8  : 19143.94
      int16 : 19075.51

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 9.96
      enqueueReadBuffer               : 10.48
      enqueueWriteBuffer non-blocking : 5.47
      enqueueReadBuffer non-blocking  : 5.55
      enqueueMapBuffer(for read)      : 10.76
        memcpy from mapped ptr        : 15.20
      enqueueUnmap(after write)       : 13.04
        memcpy to mapped ptr          : 15.20

    Kernel launch latency : 3.56 us
@moyang
Copy link

moyang commented May 20, 2023

There is no native half-precision support on NVIDIA Ampere (except for A100) or Ada GPU. Their half-precision performance is the same as single-precision.

@BA8F0D39
Copy link
Author

BA8F0D39 commented May 31, 2023

@moyang
RTX 3090 has native FP16 support in tensor cores
https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf

512 FP16 FMA per SM
128 FP16 FMA per Tensor core

RTX 3090 has 82 SM and 328 Tensor cores

@moyang
Copy link

moyang commented Jun 6, 2023

@BA8F0D39 This seems to be a problem with NVIDIA's OpenCL implementation. When querying device capabilities by apps (like clpeak), it reports "no half-precision support". I observed the same issue with other benchmarks, like SiSoftware Sandra. .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants