Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nix repl: Provide documentation from comment when evaluating to lambda #1652

Closed
wants to merge 1 commit into from

Conversation

roberth
Copy link
Member

@roberth roberth commented Oct 31, 2017

This provides limited support for python-like docstrings in the nix repl. When the user evaluates an expression to a lambda, nix repl will now print the contents of a documentation comment, as long the comment written right before the attribute, and the attribute value is an actual lambda.

Current limitations:

  • Overriding the documentation requires redundant lambdas (eta abstraction to be precise)
  • No docstring support for anything other than functions

Demo:

nix-repl> pkgs.lib.concatMapStringsSep ":"
«lambda @ /nix/store/ksi72626r14035xkndbn584pmb7l703r-nixos-17.09.1535.1fdca25ee8/nixos/lib/strings.nix:64:30»

| concatMapStringsSep
| -------------------
| 
| NOTE: This function has already been applied!
|       You should ignore the first 1 parameter(s) in this documentation,
|       because they have already been applied.
|
| First maps over the list and then concatenates it.
| 
| Example:
|    concatMapStringsSep "-" (x: toUpper x)  ["foo" "bar" "baz"]
|    => "FOO-BAR-BAZ"


nix-repl>

Changes:

  • A new test set, nix-repl.sh
  • A new module, comment.cc for the documentation retrieval logic
  • Additions to repl.cc

Todo:

  • choose a comment syntax
  • hide the documentation behind a hint that :doc displays the documentation for the function

@thufschmitt
Copy link
Member

Really nice improvement :)

It may be more natural to have this printed only with the :t command (or maybe a new :doc one) rather than when evaluating, what do you think of this?

@grahamc
Copy link
Member

grahamc commented Nov 1, 2017 via email

@gilligan
Copy link
Contributor

gilligan commented Nov 1, 2017

Oops.. i didn't realize the PR was already open and commented on your fork instead.. well you can see it anyway ;)


std::string rawComment = matches[1];
std::string name = matches[2];
int timesApplied = countLambdas(matches[3]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think matches could be of length 3 here and you are accessing the 4th element? Since you check for < 3 in line 126..

}

// SLOW, probably O(n^2)
std::string stripPrefix(std::string prefix, std::string s) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like this:

std::string stripPrefix(std::string prefix, std::string s) {
  std::string::size_type index = s.find(prefix);
  return (index == 0) ? s.erase(0, prefix.length()) : s;
}

Untested but I guess that should work?! ;-)

regex_search(sourcePrefix, matches, e);

std::stringstream buffer;
if (matches.length() < 3) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know what kind of standards the nix code base has in general.. i usually try to avoid "magic numbers" and would introduce some #define here or so.. same for the indices to access matches below. Just a thought.

@gilligan
Copy link
Contributor

gilligan commented Nov 1, 2017

@roberth I wonder if this should perhaps restricted to not just any kind of /* .. */ delimited block but that we instead introduce some marker ?

{
  /*nixdoc
  Proofs that P = NP
  */
  proofThingy = p: np: 42;
}

Otherwise we might end up with arbitrary things like TODO: rewrite this or FIXME or whatever else. Since we (NixOsDocs folks) are starting off with improving nix functions documentation we could pick this up from the get go.

This would definitely also help for automatically extracting docstrings like these from nixpkgs. In fact - maybe we could go even further and use nixdoc:<function> (So nixdoc:proofThingy for the example above). That would make it really easy to create things like ctags files for navigating nixpkgs. I would love that.

@roberth roberth force-pushed the lambda-docstring branch 2 times, most recently from aa530fb to d05552b Compare November 7, 2017 16:56
@roberth
Copy link
Member Author

roberth commented Nov 7, 2017

@gilligan , thank you for your review. I have implemented all of your suggestions except the nixdoc:.

I'm also a bit concerned about the ugly comments that might show up. I'm not sure what the special syntax should be. nixdoc:myfunc will inevitably lead to copy paste errors and bitrot. A 'small syntax' like /** */ might be better than nixdoc, but that leaves no room for metadata in the comment that isn't normally shown...

@gilligan
Copy link
Contributor

gilligan commented Nov 7, 2017

@roberth right, having the function name duplicated in the comment might easily go out of sync. Then again: doesn’t that also apply to any kind of comment written for any function anyway?

So the function documentation can always be wrong but if we have some tag we could maybe limit the output to the “right” kind of comment blocks?

In short: IMHO looking for “/** nixdoc” might be preferable.

@roberth
Copy link
Member Author

roberth commented Dec 2, 2017

@gilligan I agree that we need some kind of marker to indicate that it's a documentation comment.
I think /** or ## should be sufficient, but it's also nice to be forward compatible, so we can switch to RST/Markdown/whatever when we want to generate HTML docs. My suggestion:

 * `/**` followed by plain text, to be shown verbatim in the repl
 * `/**sometoken ` followed by text in some format

We can remove the sometoken in the repl, for forward compatibility. When we're going to generate HTML docs, we can decide on a format without causing breakage when switching. So in the future, docs may look like

`/**md
 # Introduction

 This function bla bla
` */

What I like about this is that it is very clear that we're using markdown here, it will be shown in the repl without the md part, and can be processed in a future repl, knowing that it's the 'Markdown+Nix' format.

I'm not saying that we should do Markdown in the future, though. There's no standard for it. At least Restructured Text has a better specification (from what I've heard). Markdown is rather popular though.
Let's not go for docbook at least :)

@dtzWill
Copy link
Member

dtzWill commented Dec 20, 2017

Sounds good to me! Can't wait! ^_^

@roberth
Copy link
Member Author

roberth commented Jan 10, 2018

Before the holidays, I put out a twitter poll about the syntax for documentation comments. The results:

It got 24 votes:

33% /** always plain text / (8 people)
33% /
* always markdown */
17% /**md to pick md format (4 people)
17% ## let's discuss\n=======
One person mentioned doxygen, but didn't gain support from others.
On IRC, grahamc mentioned docbook

So, there is no clear winner, but a combination of /** plain text by default and /**md to 'upgrade' to markdown can be argued to have 50% of the votes (ignoring the possibility that some plain text voters hate markup languages at all cost)

If anyone is going to decide the color of this bike shed I think it should be @edolstra because I don't want to ruin his awesome creation with ugly comments.

@edolstra
Copy link
Member

I'm not in favor of Markdown.

@roberth
Copy link
Member Author

roberth commented Jan 10, 2018

@edolstra good.

What about an extensible syntax marked by a keyword right after /**?
Or do you want to 'disallow' plain text (most comments now) and standardize on a single format? If so, which one? Some possibilities are restructured text (has a specification; python doc format), docbook and doxygen.


return parseDoc(buffer.str());
} catch (std::exception e) {
std::cout << "Caught exception: " << e.what() << std::endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=> ignoreException().

i++) {
buffer << line << "\n";
}
buffer << line.substr(0, pos.column-1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to something like for (auto & line : tokenizeString(readFile(pos.file), "\n") { ... }.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokenizeString didn't work because it doesn't give back the empty lines. I didn't find another suitable function in util either, so I have factored the thing out instead. The other review items are now solved.

//
// Will return empty values if nothing can be found.
// For its limitations, see the docs of the implementation.
struct Doc lookupDoc(Pos & pos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be const Pos & pos.

#include "comment.hh"
#include "util.hh"

// This module looks for documentation comments in the source code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use // in the Nix code base except for single-line comments.

@roberth
Copy link
Member Author

roberth commented Jan 11, 2018

Before this can be merged we have to make decisions about syntax, because this code will return FIXME comments and such.

@stale
Copy link

stale bot commented Feb 12, 2021

I marked this as stale due to inactivity. → More info

@edolstra
Copy link
Member

I'm not really fond of abusing comments for documentation. That's what you do when you don't control the language and don't have a way to add documentation in a more structured way (e.g. C++ and doxygen). But we have the freedom to add documentation in a "proper" way (e.g. as annotations).

@roberth
Copy link
Member Author

roberth commented Oct 11, 2022

What makes use abuse? Aren't comments equally controllable?

By turning comments into annotations, you enable static introspection, so that you don't have to burden the interpreter with it. Evaluation performance is already a problem and __functor has an overhead.

Adding documentation at the value level is what you do when you don't control the language tooling and don't have a way to add documentation in a way that works without having to evaluate code. After all, documentation is not a runtime concept but one that relates directly to the source code.

@piegamesde
Copy link
Member

I'm really in favor of this change (yay for more static and less evaluation, yay for less functors), but I really think we should have a dedicated team of people that discuss the syntax, independently of the implementation.

@edolstra
Copy link
Member

edolstra commented Oct 11, 2022

Shoving documentation into comments and then trying to correlate the comment with some nearby identifier is the sort of hack you do when you can't add documentation as first-class syntax (e.g. Python annotations). At the least, documentation comments should have some distinguishing syntax (like doxygen comments), since it's not obvious in

  # Just a function.
  f = x: x;

that the comment # Just a function is intended as documentation. (E.g. it could also be # FIXME: bla bla.)

In Nix, documentation can only be done at the value level (as in the NixOS module system) because we don't have a proper module system. So we can't actually statically extract from a source file how the functions in that file are supposed to be called. E.g. we have no way to figure out from an attribute like

  /* Bla */
  concatMapStringsSep = ...;

that this defines a function that can be called as lib.concatMapStringsSep.

@piegamesde
Copy link
Member

Even if you want to do an implementation that is more value/run time based, there are alternative proposals that would not make it a functor-exclusive feature: #5527 (comment)

@blaggacao
Copy link
Contributor

blaggacao commented Oct 11, 2022

We need this (native doc support). A diversity of solutions isn't adding ecosystem (research) value any more at this point in time. To the contrary.

@Ericson2314
Copy link
Member

Ericson2314 commented Oct 12, 2022

We should do this. We were also talking in the documentation team about need for better API docs in the code so they do not get out of sync CC @fricklerhandwerk @infinisil.

@edolstra I understand your concerns about hijacking comments being ugly, but I think it is a good place for iteration. Consider these examples:

  1. In Rust, a /// variation on comments is common place, even though there are other annotation options
  2. Typescript, mypy, and other post-hoc type checkers evolved from special comments to dedicated syntax.

Basically, I think it will take a few rounds of experimentation to figure out exactly the system we want. The fact that we are so expression-oriented means that we cannot easily steal ideas from other languages with a "static top-level" as-is. If we go for dedicated syntax immediately, we tie our hands with comparability concerns. Conversely, if we "steal" some comment syntax we can be sure that we are backwards compatible.

In this manner we can have a low-impact unstable feature will both allow us to start written extremely important API documentation immediately, and also give us plenty of flexibility to experiment with different variations. Once we decide what we want, we can introduce dedicated syntax, and only stabilize that --- not the interim comment syntax. Existing API documentation (which is hopefully much more comprehensive at that point than it is today!) can then be transitioned over to using the new system.

This is the lowest risk plan which allows us to accelerate documentation contribution the soonest. Let's do it!

@Ericson2314
Copy link
Member

CC @mightyiam who was looking to work on this very problem!

@mightyiam
Copy link
Member

Thank you, @Ericson2314 .

Tracking issue seems to be #228 (comment). Will provide updates there. If anyone is willing to answer some questions we may have, please subscribe to that issue.

@shlevy shlevy removed their assignment Oct 22, 2022
@fricklerhandwerk fricklerhandwerk added feature Feature request or proposal language The Nix expression language; parser, interpreter, primops, evaluation, etc UX The way in which users interact with Nix. Higher level than UI. labels Jan 10, 2023
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-generate-documentation-from-arbitrary-nix-code/22292/9

@fricklerhandwerk
Copy link
Contributor

Discussed in the Nix team meeting 2023-01-20:

  • @fricklerhandwerk: there are multiple approaches to the broad problem of documenting Nix language code
  • @thufschmitt: while the implementation is small, it's a substantial change to the language
  • @roberth: don't have time to work on this
  • agreement: this should be an RFC
    • @NixOS/documentation-team should coordinate this

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-01-20-nix-team-meeting-minutes-25/25432/1

@roberth roberth added significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. repl The Read Eval Print Loop, "nix repl" command and debugger labels Jun 2, 2023
@thufschmitt thufschmitt marked this pull request as draft June 23, 2023 11:47
@Ericson2314 Ericson2314 added RFC Related to an accepted RFC and removed RFC Related to an accepted RFC labels Jun 23, 2023
// this is crucial information.
int timesApplied;

Doc(std::string rawComment, std::string comment, std::string name, int timesApplied) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to prefer the C++ way writing ctors

Suggested change
Doc(std::string rawComment, std::string comment, std::string name, int timesApplied) {
Doc(std::string rawComment, std::string comment, std::string name, int timesApplied) : rawComment(std::move(rawComment), comment(std::move(comment)), name(std::move(name)), timesApplied(timesApplied) {

and remove this-> ... = ... assignments.

We are using copy constructors for std::string, this is inefficient because C++ string are really "value-like" strings, not references, the buffer will be copied each time if we do

this->name = name


/* Try to recover a Doc by looking at the text that leads up to a term
definition */
static struct Doc parseDoc(std::string sourcePrefix) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather like to preserve comment information in our lexer, not using regexes here. That is, we store pointers to comments here, and construct & strip indentation after parsing state.

nix/src/libexpr/lexer.l

Lines 305 to 306 in 9428d7d

\#[^\r\n]* /* single-line comments */
\/\*([^*]|\*+[^*/])*\*+\/ /* long comments */

This has no overhead but we must carefully deal with the lifetime of these references/pointers.

I think we should do this in the lexer because it will be flexible for further changes & keep consistent with the lexer.

This module does not support tab ('\t') characters. In some places
they are treated as single spaces. They should be avoided.
*/
namespace nix::Comment {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace nix::Comment {
namespace nix::comment {

nit pick (NFC)

}

/* See lambdas in parseDoc */
static int countLambdas(std::string piece) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static int countLambdas(std::string piece) {
static int countLambdas(const std::string &piece) {

std::ostream & printValue(std::ostream & str, Value & v, unsigned int maxDepth, ValuesSeen & seen);

// Only prints if a comment is found preceding the position.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Only prints if a comment is found preceding the position.
/// Only prints if a comment is found preceding the position.


struct Doc {

// Name that the term is assigned to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Name that the term is assigned to
/// Name that the term is assigned to


static std::string readFileUpToPos(const Pos & pos) {

std::ifstream ifs(static_cast<const std::string>(pos.file));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we read from the buffer instead? Because for static analysis tooling, actually there is no such file on the filesystem, this is a problem we deal with #6530

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-07-08-nix-team-meeting-minutes/49099/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature request or proposal language The Nix expression language; parser, interpreter, primops, evaluation, etc repl The Read Eval Print Loop, "nix repl" command and debugger significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. UX The way in which users interact with Nix. Higher level than UI.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.