Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taichi CMake Overhaul #4832

Open
12 of 13 tasks
qiao-bo opened this issue Apr 21, 2022 · 16 comments
Open
12 of 13 tasks

Taichi CMake Overhaul #4832

qiao-bo opened this issue Apr 21, 2022 · 16 comments
Assignees
Labels
c++ C++ engineering related discussion Welcome discussion! enhancement Make existing things or codebases better

Comments

@qiao-bo
Copy link
Contributor

qiao-bo commented Apr 21, 2022

We would like to share our proposal for modernizing Taichi's CMake-based build system. By embracing the target-based approach, we can enforce a good modular design in our code base. This brings us benefits such as reduced (re-)compilation time. By being explicit and untangling the dependencies in current code base, code (target) that do not depend on others can be isolated and built independently (target level parallelism). Changes on one target would not incur the rebuild of others.

This is ongoing work. Previously some efforts have been spent on cleaning up the code base:

Related issues

Proposal

The proposed changes mainly consist of two parts: First, we would like to maintain a list of targets that Taichi uses and being built upon. Each of these targets should have its own CMakeList.txt that specifies the build requirements as well as the usage requirements of this target. This may require some shuffling in our code base. Second, replacing many of our current CMake functions that have a global scope with target-based APIs. For example, include_directories should be replaced with target_include_directories to reduce hidden dependencies on header files to its minimum. Here, we share the outcome of some preliminary discussions:

Targets

Current Taichi C++ code base (https://github.com/taichi-dev/taichi/tree/master/taichi) could be split into the following build targets (names TBD):

  1. program: Core module that calls other targets in order. (https://github.com/taichi-dev/taichi/tree/master/taichi/program)
  2. ir: Include ir, analysis, transform. This may depend on type, snode etc.
  3. codegen: Current location is (https://github.com/taichi-dev/taichi/tree/master/taichi/backends), We can further divide into LlvmCodegen and SpirvCodegen. codegen should depend on runtime.
  4. runtime: At the moment shares the same location as codegen. Need code shuffling to split out.
  5. artifact: This information is required by both codegen and runtime. One example in the context of AOT is KernelAttributes:
    struct KernelAttributes {
    std::string name;
    // Total number of threads to launch (i.e. threads per grid). Note that this
    // is only advisory, because eventually this number is also determined by the
    // runtime config. This works because grid strided loop is supported.
    int advisory_total_num_threads;
    // Block size in CUDA's terminology. On Metal, it is called a threadgroup.
    int advisory_num_threads_per_group;
    OffloadedTaskType task_type;
    struct RangeForAttributes {
    // |begin| has differen meanings depending on |const_begin|:
    // * true : It is the left boundary of the loop known at compile time.
    // * false: It is the offset of the begin in the global tmps buffer.
    //
    // Same applies to |end|.
    size_t begin;
    size_t end;
    bool const_begin{true};
    bool const_end{true};
    inline bool const_range() const {
    return (const_begin && const_end);
    }
    TI_IO_DEF(begin, end, const_begin, const_end);
    };
    struct RuntimeListOpAttributes {
    const SNode *snode = nullptr;
    };
    struct GcOpAttributes {
    const SNode *snode = nullptr;
    };
    std::vector<BufferDescriptor> buffers;
    // Only valid when |task_type| is `range_for`.
    std::optional<RangeForAttributes> range_for_attribs;
    // Only valid when |task_type| is `listgen`.
    std::optional<RuntimeListOpAttributes> runtime_list_op_attribs;
    // Only valid when |task_type| is `gc`.
    std::optional<GcOpAttributes> gc_op_attribs;
    std::string debug_string() const;
    TI_IO_DEF(name,
    advisory_total_num_threads,
    task_type,
    buffers,
    range_for_attribs);
    };

    As a first step, we can move this information from codegen to runtime such that runtime does not depend on codegen.
  6. common: Include logging, macro and such things. This target is shared among all targets.
  7. type: Many targets may depend on this.
  8. snode: Ideally we want one snode target that others can depend on. Currently, code are spread in ir, struct etc. We can have an snode builder such that an implemented snode no longer contains its constructor information.
  9. gui: Taichi_core's peripheral targets.
  10. python: Pybind related code.
  11. system: Includes platform

This dependency graph provides an overview of the mentioned targets:
Dependency graph

API Changes

  1. Minimize the usage of global variables. Being explicit using targets. For example, include_directories -> target_include_directories, link_directories -> target_link_directories, etc. Targets in sub-directories can be added by add_subdirectory().
  2. Differentiate between build and usage requirement of targets, use private for build requirement and interface or public for usage requirements. For example, -Wall is a build requirement not a usage requirement.
  3. Replace these file glob APIs with explicit target_source function.
    file(GLOB TAICHI_CORE_SOURCE
    "taichi/*/*/*/*.cpp" "taichi/*/*/*.cpp" "taichi/*/*.cpp" "taichi/*.cpp"
    "taichi/*/*/*/*.h" "taichi/*/*/*.h" "taichi/*/*.h" "taichi/*.h" "tests/cpp/task/*.cpp")
    file(GLOB TAICHI_BACKEND_SOURCE "taichi/backends/**/*.cpp" "taichi/backends/**/*.h")
    file(GLOB TAICHI_CPU_SOURCE "taichi/backends/cpu/*.cpp" "taichi/backends/cpu/*.h")
    file(GLOB TAICHI_WASM_SOURCE "taichi/backends/wasm/*.cpp" "taichi/backends/wasm/*.h")
    file(GLOB TAICHI_CUDA_SOURCE "taichi/backends/cuda/*.cpp" "taichi/backends/cuda/*.h")
    file(GLOB TAICHI_METAL_SOURCE "taichi/backends/metal/*.h" "taichi/backends/metal/*.cpp" "taichi/backends/metal/shaders/*")
    file(GLOB TAICHI_OPENGL_SOURCE "taichi/backends/opengl/*.h" "taichi/backends/opengl/*.cpp" "taichi/backends/opengl/shaders/*")
    file(GLOB TAICHI_DX11_SOURCE "taichi/backends/dx/*.h" "taichi/backends/dx/*.cpp")
    file(GLOB TAICHI_CC_SOURCE "taichi/backends/cc/*.h" "taichi/backends/cc/*.cpp")
    file(GLOB TAICHI_VULKAN_SOURCE "taichi/backends/vulkan/*.h" "taichi/backends/vulkan/*.cpp" "external/SPIRV-Reflect/spirv_reflect.c")
    file(GLOB TAICHI_INTEROP_SOURCE "taichi/backends/interop/*.cpp" "taichi/backends/interop/*.h")

    File glob is not recommended in modern CMake. More importantly, avoid using TI_WITH_XX to guard the inclusion of source files. This leads to the pollution of many TI_WITH_XX in Taichi core target. Such as
    #ifdef TI_WITH_LLVM
    program_impl_ = std::make_unique<LlvmProgramImpl>(config, profiler.get());
    #else
    TI_ERROR("This taichi is not compiled with LLVM");
    #endif
    } else if (config.arch == Arch::metal) {
    #ifdef TI_WITH_METAL
    TI_ASSERT(metal::is_metal_api_available());
    program_impl_ = std::make_unique<MetalProgramImpl>(config);
    #else
    TI_ERROR("This taichi is not compiled with Metal")
    #endif
    } else if (config.arch == Arch::vulkan) {
    #ifdef TI_WITH_VULKAN
    TI_ASSERT(vulkan::is_vulkan_api_available());
    program_impl_ = std::make_unique<VulkanProgramImpl>(config);
    #else
    TI_ERROR("This taichi is not compiled with Vulkan")
    #endif
    } else if (config.arch == Arch::dx11) {
    #ifdef TI_WITH_DX11
    TI_ASSERT(directx11::is_dx_api_available());
    program_impl_ = std::make_unique<Dx11ProgramImpl>(config);
    #else
    TI_ERROR("This taichi is not compiled with DX11");
    #endif
    } else if (config.arch == Arch::opengl) {
    TI_ASSERT(opengl::initialize_opengl(config.use_gles));
    program_impl_ = std::make_unique<OpenglProgramImpl>(config);
    } else if (config.arch == Arch::cc) {
    #ifdef TI_WITH_CC
    program_impl_ = std::make_unique<CCProgramImpl>(config);
    #else
    TI_ERROR("No C backend detected.");
    #endif

Implementation Roadmap

Phase one we would like to divide the current core target, namely taichi_isolated_core, into a few major build targets including program, codegen, runtime, ir, python.

set(CORE_LIBRARY_NAME taichi_isolated_core)
add_library(${CORE_LIBRARY_NAME} OBJECT ${TAICHI_CORE_SOURCE})

  • Split runtime from the backends dir. (Define runtime targets)
    • cc
    • cpu
    • cuda
    • dx
    • interop
    • metal
    • opengl
    • vulkan
    • wasm
  • Split codegen from the backends dir. (Define codegen targets)
  • Define ir target. Resolve the dependencies on program, backends etc.
  • Isolate Pybind related code.

Phase two we can further split the shared and peripheral targets from e.g. program. This also includes artifact and snode. In addition, we can move header files to a separate taichi/include directory. This allows us to distribute libraries with headers.
Tasks here will be continuously updated.

@qiao-bo qiao-bo added enhancement Make existing things or codebases better discussion Welcome discussion! RFC labels Apr 21, 2022
@taichi-ci-bot taichi-ci-bot moved this to Untriaged in Taichi Lang Apr 21, 2022
@k-ye k-ye added c++ C++ engineering related and removed RFC labels Apr 21, 2022
@k-ye k-ye changed the title [RFC] Taichi CMake Overhaul Taichi CMake Overhaul Apr 21, 2022
@k-ye k-ye moved this from Untriaged to Todo in Taichi Lang Apr 21, 2022
@k-ye
Copy link
Member

k-ye commented Apr 25, 2022

Cool! Could you also provide a dependency graph. https://excalidraw.com/ could be your friend (@ailzhang recommended)

@k-ye
Copy link
Member

k-ye commented Apr 25, 2022

Nice graph! IIUC, type and snode are sub-components under ir? Also artifact is probably too generic, better come up with a better naming for it... (Context: this refers to the artifacts/outcome from codegen)

@qiao-bo
Copy link
Contributor Author

qiao-bo commented Apr 26, 2022

IIUC, type and snode are sub-components under ir?

At the moment yes. Nevertheless, type should not be an ir-specific component. Other targets such as codegen can also depends on type (and it does). Same for snode. Ideally we want to have a type as well as an snode target.

Also artifact is probably too generic, better come up with a better naming for it... (Context: this refers to the artifacts/outcome from codegen)

agree, any suggestions?

@qiao-bo
Copy link
Contributor Author

qiao-bo commented Apr 29, 2022

Currently in taichi/backends folder, apart from the code for runtime and codegen, we also have code for our unified device API, maybe we can make a backend target for this part? Thus the dependencies would be something like:

runtime ->(depends on) backend
codegen -> backend
codegen -> runtime
program -> backend

WDYT? @k-ye @ailzhang

@k-ye
Copy link
Member

k-ye commented Apr 29, 2022

Would be great to separate out the unified device API! I'd call it rhi , though. @bobcao3

@qiao-bo
Copy link
Contributor Author

qiao-bo commented Apr 29, 2022

Discussion: We should later distinguish between public headers vs private headers (currently we don't). I think the recommended way is: For public headers, we always include by its absolute path (which means relative path from Taichi project's root). For private headers, we can go with relative path from it's target's include folder. As long as it's guaranteed by a unique path per header file.

@PENGUINLIONG
Copy link
Member

Discussion: We should later distinguish between public headers vs private headers (currently we don't). I think the recommended way is: For public headers, we always include by its absolute path (which means relative path from Taichi project's root). For private headers, we can go with relative path from it's target's include folder. As long as it's guaranteed by a unique path per header file.

Agree. It's common that AOT glue codes mistakenly refer to unexported functions leading to linking problems.

@k-ye
Copy link
Member

k-ye commented May 20, 2022

Note that right now also everything is grouped in a two-level namespace: ::taichi::lang. We should also consider just using taichi, and only the components that are truly language-related goes to taichi::lang, e.g. CHI IR. For backend stuff, it could be something like taichi::codegen, taichi::runtime, etc.

@qiao-bo
Copy link
Contributor Author

qiao-bo commented Jul 4, 2022

Update:

  • API Changes implemented as proposed. Include file globs, directories APIs, scope specifier etc.
  • Newly defined targets include rhi, codegen, program_impls, runtime. The current dependency relationship is:

流程图

Next steps:

  • Clean up header dependencies among rhi, codegen, and runtime.
  • Define common utilities targets to break from core source files.
  • Isolate Pybind source files
  • Split IR from core files

@qiao-bo
Copy link
Contributor Author

qiao-bo commented Jul 14, 2022

Current dependency graph
taichi

@qiao-bo
Copy link
Contributor Author

qiao-bo commented Jul 19, 2022

Ideally TaichiCore.cmake should only deal with taichi_core related build, which means decoupled from language frontend such as Python and others. @AmesingFlank What is the status of TI_EMSCRIPTENED. WDYT if we rename this to taichi_javascript? and maybe move this part into a separate TaichiJavascript.cmake?

set(CORE_WITH_EMBIND_LIBRARY_NAME taichi)

@AmesingFlank
Copy link
Collaborator

@qiao-bo
For taichi.js, I have decided to move away from using emscripten to compile taichi, because of binary size and performance issues. Instead, I have re-implemented the functionalities that I require in Javascript. So I think for now it would be a good idea to remove all TI_EMSCRIPTENED related code from the C++ codebase.

@qiao-bo
Copy link
Contributor Author

qiao-bo commented Jul 19, 2022

@qiao-bo For taichi.js, I have decided to move away from using emscripten to compile taichi, because of binary size and performance issues. Instead, I have re-implemented the functionalities that I require in Javascript. So I think for now it would be a good idea to remove all TI_EMSCRIPTENED related code from the C++ codebase.

OK, that makes it cleaner, thanks for the info.

@qiao-bo
Copy link
Contributor Author

qiao-bo commented Jul 26, 2022

Update: Current dependency graph among targets extracted from CMake.
流程图

@qiao-bo
Copy link
Contributor Author

qiao-bo commented Aug 18, 2022

Update together with @ailzhang: In addition to taichi_core (which can be further split into ir, analysis, etc.), we can split an artifact target from current runtime targets. This information is required by both codegen and runtime. One example in the context of AOT is KernelAttributes:

struct KernelAttributes {
std::string name;
// Total number of threads to launch (i.e. threads per grid). Note that this
// is only advisory, because eventually this number is also determined by the
// runtime config. This works because grid strided loop is supported.
int advisory_total_num_threads;
// Block size in CUDA's terminology. On Metal, it is called a threadgroup.
int advisory_num_threads_per_group;
OffloadedTaskType task_type;
struct RangeForAttributes {
// |begin| has differen meanings depending on |const_begin|:
// * true : It is the left boundary of the loop known at compile time.
// * false: It is the offset of the begin in the global tmps buffer.
//
// Same applies to |end|.
size_t begin;
size_t end;
bool const_begin{true};
bool const_end{true};
inline bool const_range() const {
return (const_begin && const_end);
}
TI_IO_DEF(begin, end, const_begin, const_end);
};
struct RuntimeListOpAttributes {
const SNode *snode = nullptr;
};
struct GcOpAttributes {
const SNode *snode = nullptr;
};
std::vector<BufferDescriptor> buffers;
// Only valid when |task_type| is `range_for`.
std::optional<RangeForAttributes> range_for_attribs;
// Only valid when |task_type| is `listgen`.
std::optional<RuntimeListOpAttributes> runtime_list_op_attribs;
// Only valid when |task_type| is `gc`.
std::optional<GcOpAttributes> gc_op_attribs;
std::string debug_string() const;
TI_IO_DEF(name,
advisory_total_num_threads,
task_type,
buffers,
range_for_attribs);
};

  • move this information from codegen to runtime such that runtime does not depend on codegen.
  • split artifact target!

@ailzhang
Copy link
Contributor

FYI @PGZXB #5889 introduced an extra dependency from gfx_runtime to spirv_codegen which is unexpected.
You can reproduce this by TAICHI_CMAKE_ARGS="-DTI_WITH_VULKAN:BOOL=ON -DTI_WITH_OPENGL:BOOL=OFF " python setup.py develop and then import taichi. Specifically it was introduced by https://github.com/taichi-dev/taichi/pull/5889/files#diff-f99a0ddc44f29052309b4ae1983406e7c442e8f44b683937edd532a1e70f2269R3. It can be worked arounded by linking spirv_codegen to gfx_runtime target but that contradicts with what we want. Ideally if cache_manager has to depend on both gfx_runtime and spirv_codegen it can be a separate target. Wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ C++ engineering related discussion Welcome discussion! enhancement Make existing things or codebases better
Projects
Status: In Progress
Development

No branches or pull requests

5 participants