Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMake build directories with "unconventional" names are undiscovered #120

Open
bbannier opened this issue Apr 22, 2024 · 9 comments
Open

Comments

@bbannier
Copy link

bbannier commented Apr 22, 2024

Currently kondo hardcodes a list of possible CMake build directories. This often works well enough, but since the name of a CMake build directory is merely convention it can fail for directories with other names.

A more reliable way to discover CMake build directories would be to search for subdirectories with a generated file CMakeCache.txt, but this cannot be supported in the current impl. Most of the infrastructure around artifact dirs assumes a static list of directories where instead a possible approach to address this would be to e.g., compute a (set of) globs for artifact_dirs and then have consumers of that API dynamically check for glob matches in the file system (e.g., here). It might make sense to also centralize glob matching to reduce slow repeated and identical hits of the filesystem.

@tbillington
Copy link
Owner

Oh I didn't know about CMakeCache.txt.

I'm working on "v2" that greatly increases the flexibility of the internal logic, here is an example. I think it would be possible to support searching subdirectories for CMakeCache.txt.

Do you think this API would fully support that use-case? Ideally we could determine the directory from CMakeLists but it's not a common format that's easy to parse.

fn is_artifact(&self, path: &Path) -> bool {
path.is_dir()
&& path
.file_name()
.is_some_and(|f| PATHS.iter().any(|p| *p == f))
}
fn artifacts(&self, root_dir: &Path) -> Vec<PathBuf> {
filter_exists(root_dir, &PATHS).collect()
}

@bbannier
Copy link
Author

bbannier commented Apr 23, 2024

Do you think this API would fully support that use-case? Ideally we could determine the directory from CMakeLists but it's not a common format that's easy to parse.

A top-level CMakeLists.txt could be used to determine the project name by parsing the optional project command, see https://cmake.org/cmake/help/latest/command/project.html. The top-level file should have at most one of these (though it could also be absent).

As for artifacts, the build dir is not set in CMakeLists.txt, but instead determined by how the user invokes cmake. The typical workflow is to create an arbitrarily named build directory somewhere (could be as a subdirectory of the project, but might also be a directory outside of the project directory, or even the project directory itself), and then invoke cmake from that directory, e.g.,

# Minimal CMake config.
touch CMakeLists.txt

# All of these are valid ways to configure the project, even simultaneously.
(mkdir build && cd build && cmake ..)                    # 1. Currently supported.
(mkdir _ && cd _ && cmake ..)                            # 2. Unsupported.
(mkdir build/config1 && cd build/config1 && cmake ../..) # 3. Unsupported.
(mkdir /tmp/build && cd /tmp/build && cmake /project/)   # 4. Unsupported, assumes project tree in `/project/`.
cmake .                                                  # 5. Unsupported.

(1) and (2) can be supported by looking for a subdirectory with CMakeCache.txt, but to support (3) we'd need to crawl the full project tree.

This approach breaks down for (4) where CMakeLists.txt and build directory with CMakeCache.txt have no relationship in the file system (though we could go from build directory to source directory by inspecting CMAKE_HOME_DIRECTORY in CMakeCache.txt).

One can detect (5) (CMakeCache.txt right next to CMakeLists.txt), but in this case it is impossible to decide what was an input and what is an artifact. It would probably make sense to detect this, but not perform any cleanups in this case.

The current interface sketch for Project could in principle support this, but potentially makes e.g., detection of root_dir for e.g., (4) awkward. It is passed in as an argument to artifacts, but not available for is_artifact. If the API would guarantee that e.g., is_project would always be called first one could store it away and make use of it. More generally, nothing in the trait promises that is_artifact is only called for files under root_dir, and from an API user perspective I'd rather only implement artifacts (directory with CMakeCache.txt and all files in it).

For all these reasons a better implementation might be to instead in a first implementation treat the build directory (directory with CMakeCache.txt) as root_dir and only ever look for CMakeLists.txt for the project name or to avoid removing files in the actual CMake project directory for (5) (in treating this there might be some overlap with CMakeCache.txt in a different kind of project, say CMakeCache.txt next to Cargo.toml, e.g., by having two CMake impls for Project, one for source trees which performs no cleanup and another one for CMake build trees which cleans up; this way a "good impl" for #111 would take care of (5) automatically).

@tbillington
Copy link
Owner

Thank you for the detailed breakdown, I haven't personally user CMake before so this is helpful.

I've fixed the naming of is_artifact -> is_root_artifact and artifacts -> root_artifacts to represent what they were actually doing.


If the API would guarantee that e.g., is_project would always be called first one could store it away and make use of it.

I've added that assumption to the docstring of the methods, thanks for pointing that out.

There may be a way to enforce this at a type level so it's impossible to not have a found project before you check artifacts etc. For example is_project would be something that returns a "scoped" path/project, similar to Project in the current version.

kondo/kondo-lib/src/lib.rs

Lines 124 to 127 in 415cd33

pub struct Project {
pub project_type: ProjectType,
pub path: path::PathBuf,
}


I'm still thinking over your last couple paragraphs :)

@bbannier
Copy link
Author

Thanks! I am certainly open to helping with this (e.g., by looking over PRs), or even contributing an implementation myself once your new API has become clearer. Feel free to ping me.

@tbillington
Copy link
Owner

I noticed while reading the project docs that PROJECT_BINARY_DIR exists.

Is this commonly used? Perhaps the binary path could be retrieved

@tbillington
Copy link
Owner

tbillington commented May 2, 2024

Option 1 ✅

(mkdir build && cd build && cmake ..)                    # 1. Currently supported.

Already supported.

Option 2 ✅

(mkdir _ && cd _ && cmake ..)                            # 2. Unsupported.

This shouldn't be a problem to implement. Requires recursive searching for CMakeCache.txt.

Option 3 👍

(mkdir build/config1 && cd build/config1 && cmake ../..) # 3. Unsupported.

It seems based on your 3rd option that only supporting artifacts at the root of the project is insufficient. I'll have a think how to address that.

Perhaps Project::root_artifacts should be able to return the top level of the artifact directory even if it's not in the project root. In this example it would include build/config.

Option 4 ⚠️

(mkdir /tmp/build && cd /tmp/build && cmake /project/)   # 4. Unsupported, assumes project tree in `/project/`.

This is an interesting one, until now nothing in kondo has required "out of tree" knowledge/logic. The possible solutions I can think of off the top of my head are not attractive. I'm tempted to just say this isn't supported for now. Can you think of any other project software like cmake etc that might have behave similarly? I wonder how common this pattern is.

Option 5 ⚠️

cmake .                                                  # 5. Unsupported.

I had a look at the output of CMakeCache.txt, there is some indirection but eventually I found a list of the files the in the generated CMakeFiles/Project.dir/cmake_clean. So we could either shell out to cmake to run it's (Makefile's?) generated clean (if it exists) or try and parse that file to find the files to delete ourselves.

Do you think the file list defined here is reliable? Here's an example generated file from me completing the cmake tutorial.

file(REMOVE_RECURSE
  "CMakeFiles/Tutorial.dir/tutorial.cxx.o"
  "CMakeFiles/Tutorial.dir/tutorial.cxx.o.d"
  "Tutorial"
  "Tutorial.pdb"
)

# Per-language clean rules from dependency scanning.
foreach(lang CXX)
  include(CMakeFiles/Tutorial.dir/cmake_clean_${lang}.cmake OPTIONAL)
endforeach()

@tbillington
Copy link
Owner

Do you know if there is a preference/rough order of which options tends to be most common for cmake projects?

@tbillington
Copy link
Owner

tbillington commented May 2, 2024

I have added option 1 support for the rework in 9b48edb.

@bbannier
Copy link
Author

bbannier commented May 2, 2024

I noticed while reading the project docs that PROJECT_BINARY_DIR exists.

Is this commonly used? Perhaps the binary path could be retrieved

This is typically use to construct paths to artifacts (e.g., to reference generated sources or headers). A similar variable is CMAKE_BINARY_DIR. Variables with PROJECT_ prefix (instead of CMAKE_) are used of one wants to support nested projects.

I have never worked on a project which sets these as opposed to read the values set by CMake.

Do you know if there is a preference/rough order of which options tends to be most common for cmake projects?

I'd say that building in a subdirectory is pretty typical (variations of (1) or (2)), but I also know people who e.g., keep their build directories on a different partition, i.e., (4). For quick and dirty stuff people also use (3).

Re: (4) (external build directory):

The possible solutions I can think of off the top of my head are not attractive. I'm tempted to just say this isn't supported for now.

This would be really unfortunate. I am not exactly sure what you tried, but as far as kondo is concerned the directory with CMakeLists.txt isn't even interesting and it would be sufficient to only work on directories with CMakeCache.txt. This is what I meant in #120 (comment):

  • discover CMake build directories by the presence of a toplevel file CMakeCache.txt (i.e., CMake build directory is "project directory)
  • they can be cleaned up if that would not conflict with other project setups (Discovery Rework #111)
    • potential conflict with e.g., Rust setup if Cargo.toml is present next to CMakeCache.txt
    • to avoid running into (5) (CMake build in source directory): create another flavor of CMake project which triggers on presence of CMakeLists.txt

Re: (5) (build in source directory):

I had a look at the output of CMakeCache.txt, there is some indirection but eventually I found a list of the files the in the generated CMakeFiles/Project.dir/cmake_clean. So we could either #122 to cmake to run it's (Makefile's?) generated clean (if it exists) or try and parse that file to find the files to delete ourselves.

Do you think the file list defined here is reliable? Here's an example generated file from me completing the cmake tutorial

The presence of that file seems to depend on the build system used (CMake supports generating for an open-ended zoo of build systems), e.g., I do not see it for the Ninja generator. As far as I know there is also no standard CMake command to clean up the build directory including the build directory itself (equivalent of e.g., cargo clean). It would also leave the question of whether the build directory should be removed or not since it would never appear in any of these listings (it likely was created by the user in possibly conflicting ways). With that it seems impossible to have CMake take care of cleaning up the artifacts (which is what makes kondo useful).

Another issue with defaulting to the build tools for #122 is that it would require the build tool to be present on the system which might not be the case. CMake projects might reconfigure when running cmake again which could run into issues of e.g., the CMake version changed, or files are outdated; each of these are not unlikely to fail.

Can you think of any other project software like cmake etc that might have behave similarly? I wonder how common this pattern is.

Autotools would behave similarly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants