Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add hyperlink support #2610

Merged
merged 6 commits into from
Sep 25, 2023
Merged

add hyperlink support #2610

merged 6 commits into from
Sep 25, 2023

Conversation

BurntSushi
Copy link
Owner

@BurntSushi BurntSushi commented Sep 21, 2023

This is based on #2483 and keeps its general structure, but rejiggers some details. The biggest change is probably lifting the gethostname call from the grep-printer crate all the way up into ripgrep's core. We also add a --hostname-bin flag that allows the end user to provide an executable program that prints the hostname ripgrep should use.

Closes #665

@ltrzesniewski
Copy link
Contributor

I know this is a WIP but I took a quick glance and thought I'd mention it just in case: you removed the OnceCell which I put there so that only matching files would get their paths canonicalized.

@BurntSushi
Copy link
Owner Author

Oh, hmmm. The point about a non-match is a good one. I'll take a closer look. Thank you for catching that!

@BurntSushi
Copy link
Owner Author

Yes, I see, that's a good catch. Arguably other stuff in a PrinterPath should be behind OnceCell too. Good call.

@BurntSushi BurntSushi force-pushed the ag/hyperlinks branch 4 times, most recently from a432325 to 17bba5d Compare September 25, 2023 18:25
ltrzesniewski and others added 6 commits September 25, 2023 14:37
This commit represents the initial work to get hyperlinks working and
was submitted as part of PR #2483. Subsequent commits largely retain the
functionality and structure of the hyperlink support added here, but
rejigger some things around.
This does a variety of polishing.

1. Deprecate the tty methods in favor of std's IsTerminal trait.
2. Trim down un-needed dependencies.
3. Use bstr to implement escaping.
4. Various aesthetic polishing.

I'm doing this as prep work before adding more to this crate. And as
part of a general effort toward reducing ripgrep's dependencies.
This will enable us to query for the current system's hostname in both
Unix and Windows environments.

We could have pulled in the 'gethostname' crate for this, but:

1. I'm not a huge fan of micro-crates.
2. The 'gethostname' crate panics if an error occurs. (Which, to be
fair, an error should never occur, but it seems plausible on borked
systems? ripgrep runs in a lot of places, so I'd rather not take the
chance of a panic bringing down ripgrep for an optional convenience
feature.)
3. The 'gethostname' crate uses the 'windows-targets' crate from
Microsoft. This is arguably the "right" thing to do, but ripgrep
doesn't use them yet and they appear high-churn.

So I just added a safe wrapper to do this to winapi-util[1] and then
inlined the Unix version here. This brings in no extra dependencies and
the routine is fallible so that callers can recover from potentially
strange failures.

[1]: BurntSushi/winapi-util#14
Like a previous commit did for the grep-cli crate, this does some
polishing to the grep-printer crate. We aren't able to achieve as much
as we did with grep-cli, but we at least eliminate all rust-analyzer
lints and group imports in the way I've been doing recently.

Next we'll start doing some more invasive changes.
I originally did not put PathPrinter into grep-printer because I
considered it somewhat extraneous to what a "grep" program does, and
also that its implementation was rather simple. But now with hyperlink
support, its implementation has grown a smidge more complicated. And
more importantly, its existence required exposing a lot more of the
hyperlink guts. Without it, we can keep things like HyperlinkPath and
HyperlinkSpan completely private.

We can now also keep `PrinterPath` completely private as well. And this
is a breaking change.
This essentially takes the work done in #2483 and does a bit of a
facelift. A brief summary:

* We reduce the hyperlink API we expose to just the format, a
  configuration and an environment.
* We move buffer management into a hyperlink-specific interpolator.
* We expand the documentation on --hyperlink-format.
* We rewrite the hyperlink format parser to be a simple state machine
  with support for escaping '{{' and '}}'.
* We remove the 'gethostname' dependency and instead insist on the
  caller to provide the hostname. (So grep-printer doesn't get it
  itself, but the application will.) Similarly for the WSL prefix.
* Probably some other things.

Overall, the general structure of #2483 was kept. The biggest change is
probably requiring the caller to pass in things like a hostname instead
of having the crate do it. I did this for a couple reasons:

1. I feel uncomfortable with code deep inside the printing logic
   reaching out into the environment to assume responsibility for
   retrieving the hostname. This feels more like an application-level
   responsibility. Arguably, path canonicalization falls into this same
   bucket, but it is more difficult to rip that out. (And we can do it
   in the future in a backwards compatible fashion I think.)
2. I wanted to permit end users to tell ripgrep about their system's
   hostname in their own way, e.g., by running a custom executable. I
   want this because I know at least for my own use cases, I sometimes
   log into systems using an SSH hostname that is distinct from the
   system's actual hostname (usually because the system is shared in
   some way or changing its hostname is not allowed/practical).

I think that's about it.

Closes #665, Closes #2483
@BurntSushi BurntSushi merged commit f608d4d into master Sep 25, 2023
14 checks passed
@BurntSushi BurntSushi deleted the ag/hyperlinks branch September 25, 2023 18:39
@ltrzesniewski
Copy link
Contributor

May I bother you with a little question I was wondering about?

I initially used write! to format numbers into a buffer, (for instance: Part::Line => write!(output, "{}", values.line)). From what I could gather, this is the standard way of doing it, and it apparently doesn't allocate.

You replaced those with explicit to_string() calls, such as:

let line = values.line.unwrap_or(1).to_string();
dest.extend_from_slice(line.as_bytes());

I was wondering why? Is the optimizer able to remove the heap allocation, or does it simply not matter in this context and you prefer that code style?

@BurntSushi
Copy link
Owner Author

Great question! And feel free to ask more like it.

I replaced them because I have a faint recollection that write! goes through the formatting machinery. I can't remember whether it allocates or not, but my recollection is that the formatting machinery had more cost than just doing n.to_string(). I note, for example, that n.to_string() is how line numbers (and column numbers and byte offsets) have been written for a while. So I deferred to past experience and copied that technique.

It's funny you bring it up, because I actually have a TODO (before the next release) to try and more carefully benchmark this choice. I wanted to bake off four different choices:

  1. Status quo.
  2. write!("{}", n).
  3. The itoa crate.
  4. A simple hand-rolled version that doesn't allocate or use unsafe.

In particular, I want to run benchmarks both with the system allocator (glibc on my system) and with musl's allocator, which has been a bit slower in my experience.

@ltrzesniewski
Copy link
Contributor

Thanks! 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option to print file paths as file URLs
2 participants