Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid indexing the workspace for single-file mode #13770

Merged
merged 3 commits into from
Oct 18, 2024

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Oct 16, 2024

Summary

This PR updates the language server to avoid indexing the workspace for single-file mode.

What's a single-file mode?

When a user opens the file directly in an editor, and not the folder that represents the workspace, the editor usually can't determine the workspace root. This means that during initializing the server, the workspaceFolders field will be empty / nil.

Now, in this case, the server defaults to using the current working directory which is a reasonable default assuming that the directory would point to the one where this open file is present. This would allow the server to index the directory itself for any config file, if present.

It turns out that in VS Code the current working directory in the above scenario is the system root directory / and so the server will try to index the entire root directory which would take a lot of time. This is the issue as described in astral-sh/ruff-vscode#627. To reproduce, refer astral-sh/ruff-vscode#627 (comment).

This PR updates the indexer to avoid traversing the workspace to read any config file that might be present. The first commit (8dd2a31) refactors the initialization and introduces two structs Workspaces and Workspace. The latter struct includes a field to determine whether it's the default workspace. The second commit (61fc39b) utilizes this field to avoid traversing.

Closes: #11366

Editor behavior

This is to document the behavior as seen in different editors. The test scenario used has the following directory tree structure:

.
├── nested
│   ├── nested.py
│   └── pyproject.toml
└── test.py

where, the contents of the files are:

test.py

import os

nested/nested.py

import os
import math

nested/pyproject.toml

[tool.ruff.lint]
select = ["I"]

Steps:

  1. Open test.py directly in the editor
  2. Validate that it raises the F401 violation
  3. Open nested/nested.py in the same editor instance
  4. This file would raise only I001 if the nested/pyproject.toml was indexed

VS Code

When (1) is done from above, the current working directory is / which means the server will try to index the entire system to build up the settings index. This will include the nested/pyproject.toml file as well. This leads to bad user experience because the user would need to wait for minutes for the server to finish indexing.

This PR avoids that by not traversing the workspace directory in single-file mode. But, in VS Code, this means that per (4), the file wouldn't raise I001 but only raise two F401 violations because the nested/pyproject.toml was never resolved.

One solution here would be to fix this in the extension itself where we would detect this scenario and pass in the workspace directory that is the one containing this open file in (1) above.

Neovim

tl;dr it works as expected because the client considers the presence of certain files (depending on the server) as the root of the workspace. For Ruff, they are pyproject.toml, ruff.toml, and .ruff.toml. This means that the client notifies us as the user moves between single-file mode and workspace mode.

#13770 (comment)

Helix

Same as Neovim, additional context in #13770 (comment)

Sublime Text

tl;dr It works similar to VS Code except that the current working directory of the current process is different and thus the config file is never read. So, the behavior remains unchanged with this PR.

#13770 (comment)

Zed

Zed seems to be starting a separate language server instance for each file when the editor is running in a single-file mode even though all files have been opened in a single editor instance.

(Separated the logs into sections separated by a single blank line indicating 3 different server instances that the editor started for 3 files.)

   0.000053375s  INFO main ruff_server::server: No workspace settings found for file:///Users/dhruv/projects/ruff-temp, using default settings
   0.009448792s  INFO main ruff_server::session::index: Registering workspace: /Users/dhruv/projects/ruff-temp
   0.009906334s DEBUG ruff:main ruff_server::resolve: Included path via `include`: /Users/dhruv/projects/ruff-temp/test.py
   0.011775917s  INFO ruff:main ruff_server::server: Configuration file watcher successfully registered

   0.000060583s  INFO main ruff_server::server: No workspace settings found for file:///Users/dhruv/projects/ruff-temp/nested, using default settings
   0.010387125s  INFO main ruff_server::session::index: Registering workspace: /Users/dhruv/projects/ruff-temp/nested
   0.011061875s DEBUG ruff:main ruff_server::resolve: Included path via `include`: /Users/dhruv/projects/ruff-temp/nested/nested.py
   0.011545208s  INFO ruff:main ruff_server::server: Configuration file watcher successfully registered

   0.000059125s  INFO main ruff_server::server: No workspace settings found for file:///Users/dhruv/projects/ruff-temp/nested, using default settings
   0.010857583s  INFO main ruff_server::session::index: Registering workspace: /Users/dhruv/projects/ruff-temp/nested
   0.011428958s DEBUG ruff:main ruff_server::resolve: Included path via `include`: /Users/dhruv/projects/ruff-temp/nested/other.py
   0.011893792s  INFO ruff:main ruff_server::server: Configuration file watcher successfully registered

Test Plan

When using the ruff server from this PR, we see that the server starts quickly as seen in the logs. Next, when I switch to the release binary, it starts indexing the root directory.

For more details, refer to the "Editor Behavior" section above.

@dhruvmanila dhruvmanila added the server Related to the LSP server label Oct 16, 2024
@dhruvmanila dhruvmanila force-pushed the dhruv/single-file-support branch 2 times, most recently from ea90647 to 2166588 Compare October 16, 2024 09:13
@dhruvmanila dhruvmanila marked this pull request as ready for review October 16, 2024 09:45
Copy link
Contributor

github-actions bot commented Oct 16, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new structs are great!

I'm not entirely sure that I understand the full extent of the change. Why are we skipping the current directory if it is the default workspace? Doesnt' that have an impact for non vs code users? Would you mind explaining in more detail how this works in today and how it affects different editors?

Comment on lines 113 to 114
// Add any settings from above the workspace root. The ones from the workspace root will be
// added only if it's not the default workspace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: What interests me is why we're skipping the root itself if it is the default workspace. I think that would also make the comment easier to understand if it explains why we sometimes skip the root vs when we consider it.

@dhruvmanila
Copy link
Member Author

I'm not entirely sure that I understand the full extent of the change. Why are we skipping the current directory if it is the default workspace?

We will consider any config file that's present in the current directory but not the ones that are present in any nested directories.

The client didn't provide any workspace during initialization therefore we use the current working directory as the default workspace.

Doesnt' that have an impact for non vs code users? Would you mind explaining in more detail how this works in today and how it affects different editors?

For VS Code, if the user opened only a single file in the editor then there won't be any workspace visible in the sidebar. The user can't open any files that are inside any of the nested directories. Here, the limitation would be that if there is any config file present in the directory containing the open file, that won't be considered. This is because the current working directory of the editor process is the system root directory.

For Neovim, if the user opened a single file directly via nvim test.py, then the client will try to find the root directory for it by looking for any of the relevant files like pyproject.toml, etc. For example, for ruff this is described at https://github.com/neovim/nvim-lspconfig/blob/541f3a2781de481bb84883889e4d9f0904250a56/lua/lspconfig/configs/ruff.lua#L3-L7. Now, if those files aren't present then the workspaceFolders will be empty during initialization where the server will assume the current working directory. In this case, the path would be same as that of the nvim process. The neat thing here is that if the user opened a file which is nested in the workspace and that workspace contains the root file, the Neovim client will send a request to add a new workspace folder that's pointing to the directory containing this new file. So, those settings will be considered by the server because they've been indexed.

@dhruvmanila
Copy link
Member Author

For Neovim,

With the following tree structure:

.
├── nested
│   ├── nested.py
│   └── pyproject.toml
└── test.py
   0.000071500s  INFO main ruff_server::server: No workspace(s) were provided during initialization. Using the current working directory as a default workspace: /Users/dhruv/projects/ruff-temp
   0.000151750s  INFO main ruff_server::server: No workspace settings found for file:///Users/dhruv/projects/ruff-temp, using default settings
   0.004234667s  INFO main ruff_server::session::index: Registering workspace: /Users/dhruv/projects/ruff-temp
   0.004946542s DEBUG ruff:worker:3 ruff_server::resolve: Included path via `include`: /Users/dhruv/projects/ruff-temp/test.py
   0.005483375s  INFO     ruff:main ruff_server::server: Configuration file watcher successfully registered
   1.204698542s  INFO     ruff:main ruff_server::session::index: Registering workspace: /Users/dhruv/projects/ruff-temp/nested
   1.246196250s DEBUG ruff:worker:4 ruff_server::resolve: Included path via `include`: /Users/dhruv/projects/ruff-temp/nested/nested.py

While, if we remove the nested/pyproject.toml file, then:

   0.000400084s  INFO main ruff_server::server: No workspace(s) were provided during initialization. Using the current working directory as a default workspace: /Users/dhruv/projects/ruff-temp
   0.000877750s  INFO main ruff_server::server: No workspace settings found for file:///Users/dhruv/projects/ruff-temp, using default settings
   0.012916500s  INFO main ruff_server::session::index: Registering workspace: /Users/dhruv/projects/ruff-temp
   0.014122625s  INFO ruff:main ruff_server::server: Configuration file watcher successfully registered
   0.014338834s DEBUG ruff:worker:0 ruff_server::resolve: Included path via `include`: /Users/dhruv/projects/ruff-temp/test.py
   1.608874959s DEBUG ruff:worker:2 ruff_server::resolve: Included path via `include`: /Users/dhruv/projects/ruff-temp/nested/nested.py

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think I understand now what's happening but the code in RuffSettingsIndex::new with the early return and the skip doesn't make it obvious what's happening.

I suggest that we add more extensive documentation why we don't skip the workspace directory if it is the default (because it won't take the recursive path).

We should also add a comment to the early return for the default workspace explaining why we exit early.

I'm still somewhat uncertain about the change itself because it means that we won't find any nested configurations for editors that don't pass a workspace root by default. The way I remember it is that VS Code is the exception and most other editors don't provide a workspace root. That would mean that this change is a breaking change for all those users, because their setup only continues to work if the ruff setting is in the workspace root (root of the cwd)

@dhruvmanila
Copy link
Member Author

I'm still somewhat uncertain about the change itself because it means that we won't find any nested configurations for editors that don't pass a workspace root by default. The way I remember it is that VS Code is the exception and most other editors don't provide a workspace root. That would mean that this change is a breaking change for all those users, because their setup only continues to work if the ruff setting is in the workspace root (root of the cwd)

Neovim provides workspace root, it uses files like pyproject.toml, ruff.toml, and .ruff.toml to find it. I can look into what other editors are doing.

@dhruvmanila
Copy link
Member Author

Sublime Text using the LSP implementation as https://github.com/sublimelsp/LSP provides the workspace folders during initialization. It seems to be similar to VS Code minus the multi-root workspace support. When I open a folder in the editor, it passes it as the workspace folder but if I just open a single file, then it doesn't provide any folder. But, unlike VS Code, the std::env::current_dir for Sublime Text when a single file is opened returns /Applications/Sublime Text.app/Contents/MacOS for me:

   0.000099083s  INFO main ruff_server::server: No workspace(s) were provided during initialization. Using the current working directory as a default workspace...
   0.009809375s  INFO main ruff_server::session::index: Registering workspace: /Applications/Sublime Text.app/Contents/MacOS

The conclusion is that it wouldn't be a breaking change in Sublime Text because it already behaves correctly and the server does unnecessary work by indexing.


Helix also behaves in a similar manner as Neovim as it determines the workspace root based on certain set of files: https://github.com/helix-editor/helix/blob/d1b8129491124ce6068e95ccc58a7fefb1c9db45/languages.toml#L861. And, it also sends a request when a nested folder is opened with a root file in it which Helix would consider as a workspace folder (similar to Neovim):

2024-10-16T22:07:44.889 helix_lsp::transport [INFO] ruff -> {"jsonrpc":"2.0","method":"workspace/didChangeWorkspaceFolders","params":{"event":{"added":[{"name":"nested","uri":"file:///Users/dhruv/projects/ruff-temp/nested"}],"removed":[]}}}

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for testing this. The case I'm concerned about is #10398 but what I understand from your comments is that neovim handles this correctly because it sets the cwd to the file's parent directory.

@dhruvmanila
Copy link
Member Author

Thanks for testing this. The case I'm concerned about is #10398 but what I understand from your comments is that neovim handles this correctly because it sets the cwd to the file's parent directory.

That issue seems unrelated to what's being changed here. Can you explain more on what your concern is about?

Regardless, I'll update the RuffSettingsIndex::new to be more clearer in what's happening using comments and possible code.

Comment on lines +492 to +496
tracing::info!(
"No workspace(s) were provided during initialization. \
Using the current working directory as a default workspace: {}",
current_dir.display()
);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also changed this from warn to info. I don't think it's a warning i.e., not something that the user should be concerned of.

@MichaReiser
Copy link
Member

Isn't it related because it introduced the fallback to the current working directory if no workspace path was provided and this PR changes the behavior for this exact use case?

@dhruvmanila
Copy link
Member Author

Isn't it related because it introduced the fallback to the current working directory if no workspace path was provided and this PR changes the behavior for this exact use case?

Yes.

So, for Neovim specifically, it might not pass a working directory which means that the server should be running in single-file mode where we don't need to traverse the current directory (the behavior as is in this PR). If a user opened any file that's nested inside the current directory and if it contained a config file, then Neovim will make sure to inform the server about it by adding a new workspace folder using DidChangeWorkspaceFolders notification.

What this means is that the server is doing extra work for single-file mode without this PR.

@dhruvmanila dhruvmanila changed the title Avoid indexing the workspace for single-file support Avoid indexing the workspace for single-file mode Oct 17, 2024
@dhruvmanila
Copy link
Member Author

My current plan for this PR is to test it out in Zed, update the PR description with the details on different editor behavior and then merge it.

@dhruvmanila dhruvmanila merged commit 040a591 into main Oct 18, 2024
20 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/single-file-support branch October 18, 2024 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
server Related to the LSP server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Native server indexes the entire project when no local config found
2 participants