Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add (syntax-only) modules for functions #145

Open
JeanMertz opened this issue Dec 11, 2020 · 5 comments
Open

add (syntax-only) modules for functions #145

JeanMertz opened this issue Dec 11, 2020 · 5 comments
Labels
type: enhancement A value-adding code change that enhances its existing functionality vrl: syntax Changes to the syntax

Comments

@JeanMertz
Copy link
Contributor

JeanMertz commented Dec 11, 2020

Now that we're cranking out more and more functions, I'm worried it'll become more difficult to know the distinction between all of them.

Take for example the newly proposed to_level and to_severity. On their own, they make total sense, but when you put them next to to_string, to_int and the other to_* functions that convert to a concrete value, they can be confusing.

A similar situation arises with functions such as parse_aws_vpc_flow and parse_aws_elb.

We've already started categorizing the functions in our Cue documentation precisely because the list of functions is becoming longer still, and you need some way to more easily find what you're looking for.

I propose we extend this categorization to the language itself, by introducing function modules.

The concept is simple:

  1. We define a new Module enum that has variants such as Root, Aws, Syslog, and more.
  2. We update our Function trait to include a fn module(&self) -> Module function.
  3. All function implementations are updated to define their module.
  4. We update the parser to read function modules and convert them to syntax
  5. Users now write syslog::to_level(.foo), or aws::parse_vpc_flow(.bar) instead.
  6. Functions assigned the Root module will stay as-is, so you'd still use to_int, parse_json, and other often-used functions.

I'll leave it as an exercise to others to come up with the correct module names (I think the Cue files are a good starting point, and I'm sure @lucperkins has some good ideas for this list as well), but I do believe this would only add a small amount of syntactic noise to the language (especially since the most commonly used functions won't have a module prefix), while gaining a bit more structure in the ever-growing list of functions.

(note I used <module>::<ident> since it doesn't collide with other syntax we currently have, and it's also what I'm used to as a Rustacean (and it's not uncommon in other languages either), but we can entertain other forms of syntax for this)

cc: @binarylogic @FungusHumungus

@JeanMertz JeanMertz added the type: enhancement A value-adding code change that enhances its existing functionality label Dec 11, 2020
@lucperkins
Copy link
Contributor

Big +1 to this and also to the :: choice, which is, in my opinion, gentle on the eyes. But more importantly, colons currently have no meaning in the language, which makes them the best candidate among the usual syntax suspects (like dot).

@StephenWakely
Copy link
Contributor

Yes, good idea. It happens to every language (apart from emacs lisp) sooner or later!

You make no mention of adding a use command, which would allow the code to drop the <module>:: prefix when using a function in that module. If we don't think that is necessary, then maybe we don't even need a Module enum, we can just rely on naming the functions accordingly (eg. syslog::to_level). The only change required will be a slight adjustment to the grammar.pest file.

If we do add a use command, then we will need to decide how to handle name conflicts should they occur.

@JeanMertz
Copy link
Contributor Author

You make no mention of adding a use command, which would allow the code to drop the <module>:: prefix when using a function in that module.

I should have mentioned this in the issue. I purposefully left this out of scope, as I think it's 1) something we could add later, but also 2) something that might just not be worth doing at all (it's that fine balance between making the language convenient and simple, vs. making it complex and powerful).

we can just rely on naming the functions accordingly (eg. syslog::to_level).

I would definitely not be favouring this solution. Leaving aside the potential future changes a separate Module enum unlocks, the extra type-safety the module enum brings is well worth the relatively small extra step of updating the Function trait and adding the Module enum.

The biggest advantage at the start is that we don't introduce accidental typo's, and also that any new modules that are added are easily visible to PR reviewers (and thus potentially worth a discussion).

@binarylogic
Copy link
Contributor

Noting, we are going to finish vectordotdev/vector#3740 before doing this. I'd like to see how far good naming and guidelines get us. My concern with this change is the complexity and adding to the learning curve. The simplicity of a list of functions is very nice.

@JeanMertz JeanMertz added the vrl: syntax Changes to the syntax label Jun 7, 2022
@JeanMertz
Copy link
Contributor Author

JeanMertz commented Aug 12, 2022

We haven't done anything with this proposal so far, and we might never, but the one place where we do start to see some desire for a feature like this, is with functions that can act on both strings and bytes, and behave differently depending on the input.

For example, we have length("foo") and strlen("foo"), the first counting bytes, the second counting UTF-8 characters. We could potentially resolve some confusion by doing string::length("foo") and bytes::length("foo").

Other than that though, the lack of modules hasn't been a major issue thus far, so it's unlikely this one example will make this worthwhile, but it is a data point for a potential future decision on this feature.

(there's also the possibility that we'll differentiate between valid UTF-8 strings and bytes at the Value type level, which would allow us to continue using a single function length, that returns the correct length, depending on the type (string or bytes) of the passed in value)

@fuchsnj fuchsnj changed the title remap-lang: add (syntax-only) modules for functions add (syntax-only) modules for functions Mar 28, 2023
@fuchsnj fuchsnj transferred this issue from vectordotdev/vector Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement A value-adding code change that enhances its existing functionality vrl: syntax Changes to the syntax
Projects
None yet
Development

No branches or pull requests

4 participants