Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE PROPOSAL: Navigate massive C# codebases using symbol information provided by Azure DevOps Code Search #1383

Closed
dmgonch opened this issue Jan 13, 2019 · 9 comments

Comments

@dmgonch
Copy link
Contributor

dmgonch commented Jan 13, 2019

[Sorry for so lengthy description but there are a lot things to unpack here]

USER EXPERIENCE:
This proposal makes it easier to navigate massive C# codebases where loading of all projects takes too long or isn't even feasible. This proposal builds on previous work of limiting the number of symbols returned by "Go to Symbol in Workspace" (#1243) and loading projects on demand (#1316). The experience that all these improvements are seeking to enable is where a developer can open the whole repo folder, with potentially hundreds or thousands of C# projects, and almost immediately be able to navigate the whole codebase by using 'Go To Symbols in Workspace', 'Go To Definition' and 'Find References' operations. As user locates and opens source files, the relevant C# projects are loaded on demand allowing for even richer development experience.

Azure DevOps Services offer free Code Search extension (https://docs.microsoft.com/en-us/azure/devops/project/search/code-search?view=vsts&tabs=new-nav) that turns on indexing of repos and unlocks the ability to query for branch symbol information (API: https://docs.microsoft.com/en-us/rest/api/azure/devops/search/?view=azure-devops-rest-5.0; .NET client: https://www.nuget.org/packages/Microsoft.VisualStudio.Services.Search.Client).

Here are few examples of Code Search requests, using a clone of omnisharp-roslyn repo code:

At first glance it may appear that the symbol information provided by Code Search service doesn't map very well to code navigation concepts of OmniSharp. The latter are based on the assumption that Roslyn compiler can locate the exact symbol given its position in a source file. The former operates more like a database where query results can be filtered out using simple pattern-matching on symbol name and some basic metadata. On top of that Code Search allows only indexing of max 5 branches which can lead to query results not matching to an old local clone of a repo.

But after some experimentation it turns out that the data returned from the queries like above is pretty close to what is need to get a developer 'close enough' to right parts of a big codebase very quickly. The first query above fetches information that can be used to generate a proper 'Go To Symbols in Workspace' response. The second one is pretty close to the data needed to be able to serve 'Find References' requests and the 3rd query works quite well for 'Go To Definition' requests. And once a developer got closer to the right place in the code, on demand projects loading will fill Roslyn's workspace with precise symbol information that will replace the 'approximate' data initially received from Code Search. The resulting experience is that as long as a developer has at least slightest idea of what to search for, s/he will be able to get the list of matching symbols from Code Search right away and then navigate through references/definitions using Code Search data while projects are lazily loading in the background. The best part though is that the developer won't be even aware about where the data is coming from - it will feel like VSCode can search and navigate the whole massive codebase instantly!

PROTOTYPE:
I hacked together a prototype (https://github.com/dmgonch/oms-r/tree/feature/CodeSearchIntergrationPrototype) that allows to evaluate the proposed user experience. I've been using/evolving the prototype for few months now with repos containing hundreds of C# projects and can attest that in the majority of cases it was taking me to the right places in the code. Note, that even though this prototype provides a pretty close experience to what I'm looking to enable, the implementation is a complete hack. See Design Consideration and Authentication sections below for more on this.

The prototype provides the following functionality:

  • The prototype currently only works for Windows (due to the way auth is implemented). More on Linux and MacOS auth options below.
  • It is best to use the prototype with https://github.com/OmniSharp/omnisharp-vscode/releases/tag/v1.18.0-beta3 installed and omnisharp.enableMsBuildLoadProjectsOnDemand setting set to true.
  • The prototype allows the following operations to work even when no projects have been loaded yet: 'Go To Symbols in Workspace', 'Go To Definition', 'Find References' .
  • The prototype includes OmniSharp.CodeSearch* projects that allow to see how auth will work on all OSs (below more on this).

To activate the prototype, create %USERPROFILE%.omnisharp\omnisharp.json with the following content. Omit HackAuthPat to be prompted to log in if you are using the prototype against your repo with Code Search enabled. To use the prototype with https://productive-dev@dev.azure.com/productive-dev/public-projects/_git/CodeSearchEnabled, ping me at https://omnisharp.slack.com for example and I can provide you with a temporary PAT.

{
  "MsBuild": {
    "HackEnabled": true,
    "HackAuthPat": "<Personal Access Token with Code-Read scope>",
  }
}

AUTHENTICATION:
On Windows, .NET client library for Code Search support interactive auth (https://docs.microsoft.com/en-us/azure/devops/integrate/concepts/dotnet-client-libraries?view=vsts). When prompted, you would enter your Azure Devops creds. The library caches the creds so I only needed to re-enter them every few days.

At the moment the only authentication method supported by Code Search .NET libraries that works across Windows. MacOS and Linux is the one that uses Personal Access Tokens (PATs) (https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-authenticate?view=vsts). The good news though is that there is a wrapper tool, Git Credential Manager (https://docs.microsoft.com/en-us/azure/devops/repos/git/set-up-credential-managers?view=vsts), that abstracts OS specific secret stores (Windows Credential Manager, Mac/Linux Keychain). Once git is configured to use the tool it will allow OmniSharp to use the same command ("git credential-manager" ) to retrieve stored PATs on any OS. On Windows Git Credential Manager is installed and configured as a part of installing Git. On MacOS and Linux, the steps described at https://github.com/Microsoft/Git-Credential-Manager-for-Mac-and-Linux/blob/master/Install.md eventually worked for me (to test on Linux I used Ubuntu 18.04.1 LTS and installed RPM package of Git Credential Manager via Alien).

User would need to request/create a PAT with Code-Read scope. A PAT can be issued to stay valid for up to a year. After Git Credential Manager is installed and Git is configured to use it, user would need to store the PAT in OS secret store by following these steps (which potentially could be wrapped into a helper tool later):

host=dev.azure.com
path=productive-dev@productive-dev/public-projects/_git/CodeSearchEnabled    <= if repo url has the 'userinfo@' part before the host name it should be moved to the front of the path
username=PersonalAccessToken
password=<Personal Access Token with Code-Read scope>
<empty line here is important>
  • Run command (replace 'cat' with 'type' for Windows): cat store.txt | git credential-manager store
  • Delete store.txt

After this setup (that can last for up to a year) a user can just open the repo root folder in VSCode and OmniSharp Code Search integration logic will determine repo's URL and retrieve the PAT by invoking 'git credential-manager fill' command. Git must be in the PATH for this to work.

DESIGN CONSIDERATIONS:

Release and deployment: users that are not using Code Search integration should not incur any additional costs of downloading and loading binaries containing Code Search related logic. For this to work such binaries should be packaged into separate zip/gz packages for each OS/architecture respectfully. Only when a user sets the setting that enables the integration, VSCode C# extension will download and unpack the zip and will path to OmniSharp the location of these additional binaries.

Code Search service querying: On Linux and Mac OmniSharp runs on Mono. Code Search .NET client libraries at the moment can only target .NET Full or Core. Thus the following approach is proposed for handing Code Search service interactions:

  • Use a separate process that will be spawned by OmniSharp and will be communicated with via standard I/O.
  • OmniSharp.CodeSearch.Proxy.Interactive.exe (see the project with the same name in the prototype) will be used only on Windows if user chose to enable interactive authentication on that OS. The EXE will be targeting .NET Full / Mono.
  • OmniSharp.CodeSearch.Proxy.dll (see the project with the same name in the prototype) will be targeting .NET Core and will be used on all platforms if user chose to use PAT authentication method.
  • The exact communication format b/w OmniSharp and OmniSharp.CodeSearch.Proxy* processes is TBD but the existing schema of requests/responses for the related endpoints (i.e. /findsymbols, /gotodefinition and /findusages) should be a good start and will simplify the results merging logic.
  • OmniSharp.CodeSearch.Proxy.Shared project in the prototype is where all platform independent logic (which should be the majority of it) will be located.

Roslyn/Code Search results merging: the prototype showed that symbols info coming from Roslyn and Code Search need to be carefully merged. For example for /gotodefinition request, the Code Search service should not be called at all if the symbol info can be obtained directly from Roslyn workspace. In case of /findsymbols and /findusages the approach that worked best in the prototype was to use Code Search results only for files for which Roslyn didn't provide any results (since Roslyn's info for a given file cannot be less correct that the one provided by Code Search).

Given these requirements the design that might be a good fit is the one that is similar to how ICodeActionProvider implementations are discovered in the existing OmniSharp codebase. In that case RoslynCodeActionProvider and ExternalCodeActionProvider represent respectively well-known and additional set of provider implementations and separate logic in request handlers properly merges the result received from all providers.

OmniSharp.CodeSearch.Services project in the prototype is where all Code Search related extensions will be located. These extensions will spawn proper OmniSharp.CodeSearch.Proxy* process on initialization and they will be managing communications with it. The request handlers then will be using these different providers to fetch QuickFixes and merge the results appropriately.

STAGING THE WORK:
To simplify code reviews and reduce the risk of regressions the work can staged like below (with full set of UTs added at each stage):

  • Implement extensibility changes for merging Roslyn and Code Search results. Nothing will be using the logic after this stage yet.
  • Implement Windows-only Code Search proxy (OmniSharp.CodeSearch.Proxy.Interactive)
  • Provide end-to-end code search integration in OmniSharp for handing /findsymbols requests
  • Implement Linux/Mac version of Code Search proxy (OmniSharp.CodeSearch.Proxy) - integrated FindSymbols scenario must work on all 3 OSs after this step.
  • Update release logic to push Code Search related packages into Azure blobs with each release
  • Update C# extension to allow enabling of CodeSearch integration.
  • Provide end-to-end code search integration for handing /findusages requests on all platforms.
  • Provide end-to-end code search integration for handing /gotodefinition requests on all platforms.
@dmgonch
Copy link
Contributor Author

dmgonch commented Jan 13, 2019

@DustinCampbell @rchande @akshita31 - looking forward for your comments, questions and concerns!

@rchande
Copy link

rchande commented Jan 18, 2019

@dmgonch Thanks for the proposal; this is an interesting idea. However, I don't believe that this functionality belongs in the OmniSharp core.

Particularly, I don't believe that the scenario being addressed (user needs to search hundreds of projects in Azure Devops) is common enough justify building this into OmniSharp. It also sounds like there are some configuration concerns to work through that would work better if you could provide custom UI to do Azure DevOps authentication. This is an interesting feature that deserves to get marketed on its own, not as a sub-bullet of a feature of OmniSharp. Separately, from the perspective of the C# extension, we would prefer not to add more components that ship through the C# extension (or inside OmniSharp) itself.

Do note that OmniSharp has a "plugin" system that you can use to write assemblies that run in OmniSharp without contributing code to this repository (Razor support being a great example).

I think the best approach for you from a development/marketing/iteration speed perspective would be to write a standalone VS Code extension that implements this functionality and is loosely coupled with the C# extension and OmniSharp for getting semantic information when necessary. There are a couple of ways that could work, with the aforementioned plugin system being one.

@dmgonch
Copy link
Contributor Author

dmgonch commented Jan 18, 2019

@rchande: it looks like Razor extension IS shipped with C# extension - I see its files placed under .vscode\extensions\ms-vscode.csharp-1.18.0-beta3.razor on my machine and razorPluginPath is explicitly recognized by the extension (https://github.com/OmniSharp/omnisharp-vscode/blob/b2581f68d8f754287640ec2050e1ada53a5594e6/src/omnisharp/server.ts#L306). Are you suggesting something like the approach of adding another setting, say omnisharp.pluginPaths, where a user is asked to manually list folders (say comma separated) containing plugins provided by other extensions?

Regarding "Roslyn/Code Search results merging" - what do you think of adding extensiblity to the 3 request handlers (as discussed on the proposal) to allow for "smart" results merging? Or you would rather maybe introduce more metadata on request handlers so for example GoToDefintion handler from a plugin isn't called at all when a symbol has been found by Roslyn in the workspace. Should there be as well something like QuickFixResponseMerger extension introduced to allow results from Roslyn to be preferred over those coming from a plug (for FindUsages for example)?

Please advise.

@dmgonch
Copy link
Contributor Author

dmgonch commented Jan 27, 2019

@rchande Thank you Ravi for the feedback. I’ve been exploring what it will take to have Code Search integration functionality in a separate VSCode extension. One key requirement described in this proposal is to allow proper merging of results received from Roslyn with those obtained from Code Search service. Below is the link for the a potential design to support this requirement. The change introduces the ability for a request handler to mark itself as “auxiliary”. When processing a request, OmniSharp will first invoke all primary handlers (which will typically be those exposed by omnisharp-roslyn extension itself). Then it will invoke all auxiliary ones and will allow them to merge their results accordingly. This functionality should enable the Code Search extension to deduplicate Roslyn results with its own and give preference to the former.

dmgonch@ba805a9

Looking forward for your thoughts on this approach.

@rchande
Copy link

rchande commented Jan 28, 2019

@dmgonch Can you share a screenshot of what this looks like? If your VS Code extension provides a VS Code reference provider object, you could provide references that way and allow VS Code's extension handling to unify the results at the presentation level without having to involve OmniSharp at all.

@rchande
Copy link

rchande commented Jan 28, 2019

Note that at the VS Code layer, a "reference" is just a text span (https://code.visualstudio.com/api/references/vscode-api#ReferenceProvider) and that's all we return from the C# extension.

@dmgonch
Copy link
Contributor Author

dmgonch commented Jan 29, 2019

@rchande Unfortunately I don't believe that to get the right used experience I can simply rely on VS Code's unifying logic. As I explained in "Roslyn/Code Search results merging" section of the original proposal, the results from Code Search should NOT be shown at all for files that were already analyzed by Roslyn. The only way how I know to achieve this is by first asking OmniSharp to handle the request and then letting the Code Search extension to "smartly" pick from the data it received from the service and amend the final result with it. Letting VS Code to simply concatenate the results will cause them to be duplicated and even potentially show incorrect ones coming from Code Search for files that were changed locally. Please advise if you think it is still possible to achieve this level of control over results filtering without extending OmniSharp's results merging. Thanks.

@dmgonch
Copy link
Contributor Author

dmgonch commented Feb 27, 2019

@rchande After much consideration I agree with your assessment that the functionality proposed here is better suited for a separate VSCode extension that is only loosely coupled with the C# extension. Thanks again for your feedback.

@gmkado
Copy link

gmkado commented Feb 22, 2023

@dmgonch is this the extension referenced in this issue? https://github.com/dmgonch/fastcodenav

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants