Older News

June 18:
- some changes under-the-hood:
  - migrated from pthreads to C++11 threads: C++11 threads are easier to use, more standard, hopefully portable-ish
  - migrated from using the bash script cocl to a new python script cocl_py as the main compilation entry-point
    - you can continue to use cocl for now, if you wish, but it seems likely to not be maintained, even if it isnt physically deleted
    - this does mean that python 2.7 is now a runtime dependency, but I think python 2.7 is relatively ubiquitous?
  - the Coriander library and executables now build ok on Windows, which isnt to say they will run on Windows, but baby steps...
- created plugin architecture
  - see coriander-dnn for proof of concept for creating plugins :-)
  - ~~it uses the pluggable branch of Coriander~~ update: this branch is merged this to master now
  - the idea is that you can pick some cool functionality, that doesnt exist it, and create your own project, to implement that
  - to install a plugin, simply do eg cocl_plugins.py install --repo-url https://github.com/hughperkins/coriander-dnn
  - from then on, cocl_py will automatically add its includes and libraries when building :-)
June 11:
- provisionally set up Jenkins against Coriander, and commit status update, https://github.com/hughperkins/coriander/commits/master Not every commit has a build, but for those that do, and succeed, the build log is available, and build artifacts
June 4:
- added cmake macros cocl_add_executable and cocl_add_library
- these replace the previous add_cocl_executable, and have the advantage that they are standard targets, that you can use with target_link_libraries and so on
- see cmake usage
May 31:
- added a developer debugging option COCL_DUMP_CONFIG, to allow easy inspection of buffers returned by kernel calls, see options
May 28:
- revamped how we choose the type of buffer offsets passed into the kernels:
  - it's always done at runtime now, never at compile time
  - when you run an already built app, simply set the environment variable COCL_OFFSETS_32BIT to the string 1 to use 32-bit offsets
  - otherwise it will default to 64-bit offsets (means, can access more memory)
  - basically, unless you're using beignet, you can ignore this, and stop having to think about the 32-bit offsets variables any more :-)
- if you build with BUILD_TESTS set to OFF, you can still build the tests, eg by doing make cocl_unittests, and you can still run them eg by doing make run-tests: just, no longer builds them by default, when you do make
May 27:
- updated to LLVM 4.0. Thank you to @iame6162013 for inspiring me to do this
- Tensorflow random_op_gpu.cc compiles and runs ok now :-). There were a few hoops to jump through, #24
May 20:
- renamed to Coriander
May 18:
- Presented Coriander at this year's IWOCL :-) Full IWOCL program here, and there is a link to my own slides
May 5:
- Eigen unit tests at https://bitbucket.org/hughperkins/eigen/src/75842846799e15f1c26ef6885565d64c3d0a67b2/unsupported/test/Coriander/?at=eigen-cl pass on Mac Pro 4th Generation with both:
  - Intel HD Graphics 530, and
  - Radeon Pro 450 (using env var CL_GPUOFFSET=1 to select)
- I suspect this may have broken some other stuff, since one of the unit tests fails now, but I think it's a gentle step forward
May 1:
- dnn tests pass on Radeon Pro 450, on Mac Sierra now
- fix crash bugs in pooling forward/backward, on Mac Sierra
- thanks to my employer ASAPP giving me use of a nice Mac Book Pro 4th Generation, with Radeon Pro 450, unit tests now pass on said hardware :-)
April 29:
- Updated to latest EasyCL. This lets you use environment variable CL_GPUOFFSET to choose different gpus, eg set to 1 to use second gpu, to 2 to use third gpu, etc
April 15:
- added max pooling
- added ReLU, sigmoid and tanh activations
- added softmax forward
- now possible by and large to compile Tal Ben-Nun's cudnn-training. It needs some additions to the CMakeLists.txt, see my fork at https://github.com/hughperkins/cudnn-training , differences here
April 14:
- added backwards implementation for convolution, including data, filters, and bias
April 13:
- added CLBlast wrappers for: sgemv, sscal, saxpy
April 4:
- merged in current dnn branch, which provides forward convolutional implementation for cudnn API, using im2col over Cedric Nugteren's CLBlast
- Coriander got accepted for a technical presentation at this year's IWOCL conference :-) Conference sessions here: IWOCL 2017 Conference program
Nov 25:
- created release 4.0.4:
  - all current Eigen tests, https://bitbucket.org/hughperkins/eigen/src/eigen-cl/unsupported/test/Coriander/?at=eigen-cl , pass for me currently, using this release, on both beignet 1.2.1, on hd5500, and on NVIDIA 940M, using driver 367.57
  - fixes some issues with walk-dependency order during cl walk/generation
  - fixed an issue with un-initialized pointers, in structs passed as kernel parameters, into GPU kernels
Nov 24:
- merge from branch clwriter:
  - lots of refactorization under-the-hood
  - can handle determining the address-space of functions returning pointers
  - opencl generation is at runtime now => facilitates determining address-space; and counter-intuitively is actually faster, because less OpenCL to compile by the GPU driver
Nov 18:
- Mac build working :-) https://travis-ci.org/hughperkins/Coriander/builds/176997220#L1356
Nov 17:
- merged runtime-compile branch into master branch. This brings a few changes:
  - opencl generation is now at runtime, rather than at compile time
    - this lets us build only the one specific kernel we need
    - means more information is available at generation time, facilitating the generation process
  - build on Mac OS X is more or less working, eg https://travis-ci.org/hughperkins/Coriander/builds/176580716
  - code radically refactorized underneath
  - remove --run_branch_transforms, --branches_as_switch, for now
Nov 8:
- ~~exposed generation options as cocl options, eg --run_branching_transforms, --branches_as_switch, and the --devicell-opt [opt] options~~
Nov 6:
- created dockerfiles for Beignet and NVIDIA docker
Nov 5:
- switched from Makefile to CMakeLists.txt => build/install instructions have changed, see above
- added a cmake file, so you can easily add cocl to your cmakelists file, eg see https://bitbucket.org/hughperkins/eigen/src/d84b9f44f924e36a8527e66a46a189395f046d21/unsupported/test/Coriander/CMakeLists.txt?at=eigen-cl&fileviewer=file-view-default for an example
Nov 4:
- merged in changes that remove labels and gotos, and replace them with ifs, whiles, fors. This is a bit flaky/beta/duct-tape, but the unit tests do all pass...
Nov 1:
- turned on rpath, switched from static to shared compilation
Oct 29:
- negative infinity float constants handled correctly now (pre-requisite for reduce_min working in tensorflow)
- properties now return correct device name, total memory, and a few other device parameters
- added callbacks
- remember to cache the kernels between calls :-P (this should make things run quite a lot faster now...)
Oct 28:
- denormalized generated OpenCL out of SSA form, to make it more human-readable
- added support to pass null pointers into kernels
Oct 26:
- fixed a bug where BLAS results were empty on HD5500, using beignet 1.2
- added __shfl_down shim
- moved Eigen tests into a new Eigen fork, https://bitbucket.org/hughperkins/eigen/commits/branch/eigen-cl
Oct 25:
- BLAS wrapper handles memory offsets correctly now
Oct 24:
- fixed pow, min, max (beta)
Oct 23:
- fixed float4s. This is a critical bug-fix, without which Eigen componentwise works less well in Tensorflow :-P
- added BLAS, using Cedric Nugteren's CLBlast)
Oct 22:
- arrays of structs can be passed to kernels again, as long as they contain no pointers
  - (structs containing pointers can be passed only by-value)
- possible to call kernels with offsets added now, as in eg test/cocl/offsetkernelargs.cu
Oct 20:
- fix bug where threadIdx.x was being incorrectly written as get_global_id instead of get_local_id ...
  - magically, the test_cuda_elementwise kernel works much better now :-)
Oct 18:
- installs to /usr/local now
- libcocl.a contains libEasyCL.a now, no need for libEasyCL.so at runtime
- fixed bug with linking multiple compiled .cu files causing error about 'multiple definitions of __opencl_source'
Oct 16:
- added streams, including kernel launch on non-default stream
- removed pinned memory: cuMemHostAlloc now just calls malloc, see design.md for analysis and thoughts on this. Let me know if you have any ideas (eg via an issue).
- added ability to copy to/from device memory, with an offset added
Oct 15:
- fixed critical bug where return; wasnt being written out. Which didnt matter when that was at the end of a kernel. But mattered more when that was the only exit condition for a kernel :-P
- added event handling
- ~~added pinned memory handling~~
- added a bunch of api call implementations for getting information about the driver (mostly stubbed out for now...)
Oct 10:
- test/eigen/test_cuda_elementwise_small.cu builds and runs ok now
Oct 8:
- https://github.com/tensorflow/tensorflow/blob/r0.10/tensorflow/core/kernels/cwise_op_gpu_add.cu.cc compiles completely into compileable OpenCL now https://github.com/hughperkins/Coriander/blob/d491aca1b5123781ac59486d38b09fbecd049f45/tensorflow/generated/cwise_op_gpu_add-deviceside.cl
- implemented cudaMalloc, cudaMemcpy, cudaFree (using opencl)
- hostside object now contains generated OpenCL sourcecode
Oct 5
- fix float constants to correctly have .0f at the end
- added extractvalue
- conversion of https://github.com/tensorflow/tensorflow/blob/r0.10/tensorflow/core/kernels/cwise_op_gpu_add.cu.cc to OpenCL runs to completion now, though the OpenCL generated is not yet compileable
Oct 4:
- added llvm.memcpy
- added insertvalue
- added dumpinttoptr, trunc, srem (beta)
Oct 3
- added float4 (beta)
- added local memory (beta)
Oct 2:
- added structs
Oct 1:
- first working end-to-end kernel launch, using both host-side and device-side code :-)
Sept 30:
- added initial unit tests, that use pyopencl to compile the generated OpenCL code, and run tests against it
Sept 27:
- first created

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

news.md

news.md

Older News

Files

news.md

Latest commit

History

news.md

File metadata and controls

Older News