New tuning results #1

CNugteren · 2015-05-30T11:03:33Z

(See the README for details)

This is the place to post new tuning results. If you compiled with -DTUNERS=ON, ran one of the tuners on your device (or all perhaps?), and feel that these results should be included in the next release of CLBlast, please post them here.

You can do this by attaching the JSON files to this issue (archived in a .ZIP file).

The text was updated successfully, but these errors were encountered:

tremmelg · 2016-04-08T00:17:46Z

Here are some tuning results from an NVIDIA Titan Black, AMD Radeon HD 7970 and an ARM Mali T-628.

Just to let you know about JSON files, GitHub says "Unfortunately, we don’t support that file type. Choose Files Try again with a PNG, GIF, JPG, DOCX, PPTX, XLSX, TXT, PDF, or ZIP."
Archive.zip

CNugteren · 2016-04-12T03:10:33Z

Thanks for the tuning results! However, they seem to be ran with non-default settings (using specific values for alpha and beta). Could you perhaps run them again with the default settings?

By the way, the latest version already includes results for Tahiti (the HD 7970) and the ARM Mali T-628, so perhaps those are superfluous.

(I've updated the post regarding JSON-files and GitHub)

blueberry · 2016-04-30T14:56:58Z

Here are the results for AMD's Pitcairn (R9 270X). I'll also upload the results for Hawaii (R9 290X), but I am getting an error during Xgemm. I'll open another issue for that.
pitcairn.zip

CNugteren · 2016-05-01T17:33:12Z

Thanks! The results for Pitcairn are added to the development branch.

blueberry · 2016-05-01T19:01:14Z

Hawaii (AMD R9 290X):
hawaii.zip

blueberry · 2016-05-01T19:32:59Z

And i7 4790k:
i7-4790k.zip

CNugteren · 2016-05-02T18:12:51Z

The results for Hawaii will be added. As for the i7 results: the zip archive seems to include only a Makefile?

blueberry · 2016-05-02T20:13:19Z

Sorry, I messed up that zip. As I do not have those files any more, I'll send them when I manage to do that tuning.

fonghou · 2016-05-31T18:34:37Z

nvidia-grid-k520-aws-g2.zip

See details #61

CNugteren · 2016-06-01T07:43:39Z

@fonghou Thanks! The tuning results are added to the database. They are currently in the development branch but will be automatically included in the next release.

OursDesCavernes · 2016-06-18T12:01:14Z

Here are the results for the Intel i5-4210U iGPU:
Device name: 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' (OpenCL 1.2 beignet 1.2 (git-1b076ec))
i5-4210U_GPU.zip

CNugteren · 2016-06-19T13:04:13Z

@OursDesCavernes Added, thanks!

gcp · 2016-07-01T16:24:40Z

GTX 670, GTX 750 (non-Ti), and GTX 1070 tunings attached. One of the GEMV tunings took ages (or hung) on the latter two, but curiously enough not on the (older) first card. Luckily, it looks like GEMV is the last one to be tuned so these are fairly complete anyway.

gtx670.tar.gz
gtx1070.tar.gz
gtx750.tar.gz

CNugteren · 2016-07-03T18:31:42Z

@gcp Thanks for running all the tuners on those devices! The results are added to CLBlast, currently in the development branch but they will be automatically included in the next release. Indeed, I saw long compilation times for GEMV kernels on NVIDIA as well - it is the last one to be tuned for exactly this reason. NVIDIA promises to reduce compilation times significantly with CUDA 8.0, so hopefully that also fixes these kernels.

gcp · 2016-07-05T11:19:55Z

Intel HD530 (desktop Skylake iGPU)
IntelHD530.zip

CNugteren · 2016-07-10T09:50:14Z

@gcp Thanks, they are added.

CNugteren · 2016-07-26T18:11:17Z

Issue #83 caused a complete re-write of the third GEMV kernel (XgemvFastRot), so I had to throw away the corresponding tuning results. If it's not too much effort, I welcome updated clblast_xgemv_fast_rot_*.json tuning results based on the development branch. The other GEMV tuning results are still valid and included in CLBlast. Thanks!

OursDesCavernes · 2016-08-23T20:18:20Z

Intel(R) HD Graphics 5500 BroadWell U-Processor GT2:
hd5500.zip
Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile:
hd4400.zip

CNugteren · 2016-09-03T16:30:11Z

@OursDesCavernes Thanks, HD5500 is added and HD4400 is updated.

yingted · 2016-10-11T18:06:53Z

Intel(R) HD Graphics 4000
intel-hd4000.zip

CNugteren · 2016-10-13T10:20:55Z

@yingted Thanks! The tuning results for the IvyBridge GPU are added.

MigMuc · 2016-10-22T00:19:49Z

Radeon R9 380 (Tonga) tuning results:
Tobago_TuningResults.zip

MigMuc · 2016-10-22T00:27:14Z

Of course, the device is called Tonga, just a spelling mistake of the zip-file name.

CNugteren · 2016-10-22T14:42:41Z

@MigMuc The results for Tonga are added, thanks!

matze · 2016-10-24T08:51:58Z

Here are the results for the GTX Titan Black. Unfortunately, I had the same problem as @gcp on the last run. But again, should be fairly complete.

gtx-titan-black.tar.gz

CNugteren · 2016-10-24T17:56:14Z

@matze Thanks a lot for your contribution. The tuning results are added.

A2va · 2023-09-24T17:09:51Z

Here are the files from my GTX 1060
GTX 1060 6GB.zip

tangjinchuan · 2023-10-19T08:12:22Z

7800XT.zip

infinit-luffy · 2023-10-19T10:03:21Z

4060-Daoyuan Zhu@GZU.zip

RAN1027 · 2023-11-07T02:08:25Z

AMD 5600G.zip

WaToI · 2023-12-13T02:07:23Z

AMD 7840U Radeon 780M.zip
GPD WIN Max 2 2023

pjuhasz · 2023-12-23T10:45:12Z

Nvidia Quadro M200M
clblast_tuning_nvidia_quadro_m2000m.tar.gz

pjuhasz · 2023-12-23T15:06:18Z

Anecdotal benchmark for the previous result: running clblast-enabled whisper.cpp with the medium model (./main -m models/ggml-medium.bin ./samples/jfk.wav)

w/o tuning: 14 s
w tuning: 8.8 s

gpokat · 2024-02-05T20:27:06Z

Fp16 only for Helio G99 (ARM MALI G57 GPU)
helioG99_fp16_only.tar.gz
The stage 4 of clblast_tuner_xgemm tunner everytime trap to infinity loop, so I can't provide appropriate output.

TomTheHand · 2024-02-07T00:52:28Z

I have an Intel A750 and an i5-13400F (not sure if the processor matters, but running the tuners certainly occupies one of my cores). I know someone provided results for an A770 already, but hopefully these are still worthwhile. At the very least, the Intel Arc cards have more mature drivers at this point, which might make a difference.

I also wanted to ask if you would consider pushing out another release; I don't know how many changes have been made to the rest of the code, but there are a bunch of new tuning results since the last one (including the A770, so the current release has no Intel Arc results in it). I ask because I use some software that utilizes your release versions rather than compiling their own, and I'm sure others are in the same boat.

IntelA750+13400F.zip

Edit: Thank you so much for the new release! I compiled 1.6.1 and 1.6.2 with -DCLIENTS=ON and ran a few random benchmarks (I'm not sure which ones are the most important and/or most used by the software I use), and saw huge performance improvements: roughly 3x the GFLOPS for xgemm, for example, if I'm understanding correctly. Looking forward to 1.6.2 to being incorporated into more stuff so my A750 is less terrible 🤣

gspr · 2024-02-07T13:35:00Z

I believe these tuning results haven't yet been submitted: NVIDIA RTX A6000 (GA102GL).
A6000-tuning.tar.gz

Update: Sorry, I accidentally forgot to lock the GPU and memory clocks. New results will come shortly.

Here's the updated NVIDIA RTX A6000 GA102GL tuning (which also includes a broader set of floating point widths than the last one):
A6000-tuning-2.tar.gz

I'm not sure the GPU clock was set correctly, nvtop reported a lower value than what I set it to.

tangjinchuan · 2024-02-22T17:03:29Z

A770.zip
The latest Intel Arc A770 tuning results based on 31.0.101.5330 (version 2024/2/14).

tangjinchuan · 2024-04-12T08:41:35Z

More results are coming from my students in Artificial Intelligence.
易婉婷-2100170332-4050 Laptop 2.zip

刘杨杨-2100170317-4050 Laptop 1.zip

SomePerson1111 · 2024-04-13T16:29:21Z

intel_i7_12700H.zip

tangjinchuan · 2024-04-14T03:39:41Z

雷林杰-2000170377-4060 Laptop.zip

tangjinchuan · 2024-04-15T15:44:33Z

2100170337 杨雪琴 MX450.zip

gitbearflying · 2024-06-08T06:22:04Z

Does it make sense to add JSON files here which are based on a pocl device ? https://github.com/pocl/pocl
I have an old "Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz" where clinfo tells me that i have no opencl device.
After installing pocl i have one ;-)
While running "make alltuners" , should the machine have no load from other running applications to get accurate JSON files?
make alltuners is now running 9 hours and is not yet finished.

CNugteren · 2024-06-12T18:35:57Z

Does it make sense to add JSON files here which are based on a pocl device ?

Yes, why not?

While running "make alltuners" , should the machine have no load from other running applications to get accurate JSON files?

Ideally yes. But since there are many combinations it tests and often multiple are close to the optimal it doesn't harm for the end result if once a while something else happens in between.

make alltuners is now running 9 hours and is not yet finished.

Most likely most of the time is taken by the xgemm tuner? It consists of 4 parts, you could skip parts 2 and 4 and only run parts 1 and 3. You can do this by commenting out the lines that start with StartVariation<2> and StartVariation<12> in src/tuning/kernels/xgemm.cpp.

gitbearflying · 2024-06-13T06:43:12Z

Does it make sense to add JSON files here which are based on a pocl device ?

Yes, why not?

Because i guess an emulated opencl device(pocl cpu) can't be faster than e.g. OpenBLAS as this also is using the cpu.
Or can CLBlast make use of the built in "Mesa DRI Mobile Intel® GM45 Express Chipset" ?

make alltuners is now running 9 hours and is not yet finished.

Most likely most of the time is taken by the xgemm tuner? It consists of 4 parts, you could skip parts 2 and 4 and only run parts 1 and 3. You can do this by commenting out the lines that start with StartVariation<2> and StartVariation<12> in src/tuning/kernels/xgemm.cpp.

I'll try that, thanks!

hajokirchhoff · 2024-06-13T07:19:01Z

Does it make sense to add JSON files here which are based on a pocl device ? Yes, why not? Because i guess an emulated opencl device(pocl cpu) can't be faster than e.g. OpenBLAS as this also is using the cpu.

Makes a _lot_ of sense, IMHO. Even though pocl might be slower than OpenCLAS with the CPU, it has the advantage that you only need one codebase (with clblast) instead of two. Our application requires OpenCL and will not run without it. For the few customers without OpenCL, pocl is a way to run our application even with no GPU device installed. So I'd say: definitely yes.

gitbearflying · 2024-06-15T18:02:38Z

Does it make sense to add JSON files here which are based on a pocl device ?

Yes, why not?

Here it is:
Device Name cpu-penryn-Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz

intel-core-2-duo-T6670-pocl.zip

tangjinchuan · 2024-10-21T14:38:57Z

New results from my students in Electronic Information Engineering at Guizhou University. I keep their names to acknowledge their contributions.
石军4060 Laptop GPU.zip
2200860109 陈雍 Iris(R) Xe.zip
付子毅-2200860253 4060 Laptop GPU.zip

tangjinchuan · 2024-10-21T14:48:43Z

卢文骞 2300860026 3060 Laptop GPU.zip
李清山 2300860018gfx90c.zip
杨佳文 gfx1035.zip

tangjinchuan · 2024-10-21T14:54:31Z

汪嘉霖-2300860001 MX450.zip
潘金喜2300860010 GTX 1050 Ti.zip
王嘉琪 4050 Laptop GPU.zip

tangjinchuan · 2024-10-21T15:04:45Z

肖露-2300860047 3050 laptop.zip
邓凡-2300860049 2050.zip
钱敏-2300860044 AMD 780m.zip

CNugteren mentioned this issue May 29, 2016

CLBlast failed to compile with CLTune enabled #61

Closed

CNugteren mentioned this issue Sep 3, 2016

No XgemmHalf in database on development branch? #96

Closed

wernsaar mentioned this issue Sep 5, 2016

clblast_tuner_xgemm broken in latest development branch #97

Closed

MigMuc mentioned this issue Oct 22, 2016

Error in tuning results #116

Closed

CNugteren mentioned this issue Nov 12, 2023

Add tuning results for 4 devices #518

Merged

tangjinchuan mentioned this issue Jan 10, 2024

OpenCL is 3x slower artyom-beilis/pytorch_dlprim#10

Open

CNugteren mentioned this issue Feb 8, 2024

Add tuning results for 5 devices #526

Merged

CNugteren mentioned this issue Feb 21, 2024

SGEMM broken with 1.6.2 on Intel ARC #533

Closed

CNugteren mentioned this issue May 15, 2024

Add tuning results for GTX4050, MX450, and i7-12700H #541

Merged

CNugteren mentioned this issue Nov 7, 2024

Added tuning parameters for 13 devices #562

Closed

New tuning results #1

New tuning results #1

Comments

CNugteren commented May 30, 2015 • edited Loading

tremmelg commented Apr 8, 2016

CNugteren commented Apr 12, 2016

blueberry commented Apr 30, 2016

CNugteren commented May 1, 2016

blueberry commented May 1, 2016

blueberry commented May 1, 2016

CNugteren commented May 2, 2016

blueberry commented May 2, 2016 • edited Loading

fonghou commented May 31, 2016

CNugteren commented Jun 1, 2016

OursDesCavernes commented Jun 18, 2016

CNugteren commented Jun 19, 2016

gcp commented Jul 1, 2016

CNugteren commented Jul 3, 2016

gcp commented Jul 5, 2016

CNugteren commented Jul 10, 2016

CNugteren commented Jul 26, 2016

OursDesCavernes commented Aug 23, 2016

CNugteren commented Sep 3, 2016

yingted commented Oct 11, 2016

CNugteren commented Oct 13, 2016

MigMuc commented Oct 22, 2016

MigMuc commented Oct 22, 2016

CNugteren commented Oct 22, 2016

matze commented Oct 24, 2016

CNugteren commented Oct 24, 2016

A2va commented Sep 24, 2023

tangjinchuan commented Oct 19, 2023

infinit-luffy commented Oct 19, 2023

RAN1027 commented Nov 7, 2023

WaToI commented Dec 13, 2023

pjuhasz commented Dec 23, 2023

pjuhasz commented Dec 23, 2023

gpokat commented Feb 5, 2024

TomTheHand commented Feb 7, 2024 • edited Loading

gspr commented Feb 7, 2024 • edited Loading

tangjinchuan commented Feb 22, 2024

tangjinchuan commented Apr 12, 2024

SomePerson1111 commented Apr 13, 2024

tangjinchuan commented Apr 14, 2024

tangjinchuan commented Apr 15, 2024

gitbearflying commented Jun 8, 2024

CNugteren commented Jun 12, 2024

gitbearflying commented Jun 13, 2024

hajokirchhoff commented Jun 13, 2024 via email • edited by CNugteren Loading

gitbearflying commented Jun 15, 2024

tangjinchuan commented Oct 21, 2024

tangjinchuan commented Oct 21, 2024 • edited Loading

tangjinchuan commented Oct 21, 2024

tangjinchuan commented Oct 21, 2024

CNugteren commented May 30, 2015 •

edited

Loading

blueberry commented May 2, 2016 •

edited

Loading

TomTheHand commented Feb 7, 2024 •

edited

Loading

gspr commented Feb 7, 2024 •

edited

Loading

hajokirchhoff commented Jun 13, 2024 via email •

edited by CNugteren

Loading

tangjinchuan commented Oct 21, 2024 •

edited

Loading