diff --git a/TODO.md b/TODO.md index 457f32ab4abd..c0d99402af8a 100644 --- a/TODO.md +++ b/TODO.md @@ -2,10 +2,27 @@ ## cleanups +* Hard coded alignment, don't why it exists and why it is that number +https://github.com/openai/triton-hopper/blob/1ada046fdaef13f94dc7e2f6e6d0966e5d1efe92/lib/Analysis/Allocation.cpp#L165 +https://github.com/openai/triton-hopper/blob/1ada046fdaef13f94dc7e2f6e6d0966e5d1efe92/lib/Analysis/Allocation.cpp#L176 +* Shared memory buffer sizes should be determined before graph coloring +https://github.com/openai/triton-hopper/blob/1ada046fdaef13f94dc7e2f6e6d0966e5d1efe92/lib/Analysis/Allocation.cpp#L493 +* Don't use any LLVM dialects in TritonGPU conversions +https://github.com/openai/triton-hopper/blob/1ada046fdaef13f94dc7e2f6e6d0966e5d1efe92/lib/Analysis/Membar.cpp#L110 +* We should have a more general barrier op +https://github.com/openai/triton-hopper/blob/1ada046fdaef13f94dc7e2f6e6d0966e5d1efe92/lib/Analysis/Membar.cpp#L180 +* tt.load's verifier has the `allowTensorPointerType` flag, which skips checking encoding if ptr is a tensor pointer. This is a hack and should be improved because it is unclear and violating the encoding is not good idea. +https://github.com/openai/triton-hopper/blob/1ada046fdaef13f94dc7e2f6e6d0966e5d1efe92/lib/Dialect/Triton/IR/Traits.cpp#L10 * Don't call arith to LLVM conversion from MLIR * linearize/delinearize helper have been duplicated (most likely due to layering problems). This should be merged ## bug fixes * Currently lowering of InsertSliceAsyncV2Op doesn't work if the mask is not a scalar even though the op semantic allows it. -* smem base is being passed to the patterns. This seems broken when using function calls. \ No newline at end of file +* smem base is being passed to the patterns. This seems broken when using function calls. +* Revisit the constraint for RewriteTensorPointer pass +https://github.com/openai/triton-hopper/blob/1ada046fdaef13f94dc7e2f6e6d0966e5d1efe92/lib/Dialect/TritonGPU/Transforms/RewriteTensorPointer.cpp#L644 +* The IR output of `make_tensor_ptr` is wrong. `!tt.ptr` is ignored in the output. +https://github.com/openai/triton-hopper/blob/1ada046fdaef13f94dc7e2f6e6d0966e5d1efe92/include/triton/Dialect/Triton/IR/TritonOps.td#L562 +* We rely on the `cuda-python` package currently, which prevents us from building triton on any node without CUDA installed. We should invoke TMA related functions in our thin CUDA wrapper. +https://github.com/openai/triton-hopper/blob/b6a6b32b0ee79e93247d20c95f15fd75039a40b9/python/triton/compiler/utils.py#L3 \ No newline at end of file