Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Remap specification #3740

Closed
34 tasks
binarylogic opened this issue Sep 7, 2020 · 3 comments · Fixed by #8735
Closed
34 tasks

Add Remap specification #3740

binarylogic opened this issue Sep 7, 2020 · 3 comments · Fixed by #8735
Assignees
Labels
domain: internal docs Anything related to Vector's internal documentation domain: vrl Anything related to the Vector Remap Language type: task Generic non-code related tasks

Comments

@binarylogic
Copy link
Contributor

binarylogic commented Sep 7, 2020

Now that we're expanding the Remap language, we should materialize all of the rules and guidelines into a spec. This spec can live in the Remap crate/folder in markdown format.

Requirements

Format

Language

  • Principles (performance, safety, self-documenting).
  • How do we enforce these principles? Create a feature/principle matrix. (ex: not introducing methods, avoiding loops, state management, preserving self-documentation, type safety, etc)
  • Language execution contexts. Assignment & mutability rules. (ex: the del and merge functions)

Limits

Things we have explicitly decided not to do:

  • Network calls
  • Module
  • Classes
  • Custom functions
  • Loops
  • Lexical scope

Types

  • List the types.
  • Types should align with JSON as much as possible, with the exception of Timestamps. (ex: we will never introduce sets).

Syntax

  • Just cover the basics.

Strictness

  • Require that all errors be handled at compile time?
  • Type checking at compile time?
    • How does this work with schemas (knowing and not knowing)

Functions

  • Naming
  • Signature
    • When does a function take a value or operate on the entire event? (ex: del vs append)
    • If a function operates on a value, it must take it as the first argument.
    • If a function traverses multiple values what is the return value?
  • Errors
    • When to error and when to not. (ex: should del error on missing keys?)
  • Observability
    • Should functions implement instrumentation? If, should they follow the event-driven pattern?
    • When should a emit an event?
  • Rules for including/excluding a function
    • Network calls?

Errors

  • How are errors returned from functions?
  • How are errors handled?
@JeanMertz
Copy link
Contributor

There's a lot to write down, but I'll just focus on a few key things that we want to get right before we ship v1:

Functions

general

  • Work towards composability over individual capabilities. For example, use built-in iteration solutions (once that lands) over allowing multiple arguments of the same type to apply a function more than once. Or, if we have a function that provides a certain task, don't incorporate that into another function as an optional argument, but instead make it possible to compose the two functions together. Using performance as a reason not to follow this rule should be backed by real-life examples and benchmarks.
  • Functions should rarely mutate their input, instead creating a new value and returning that. Exceptions to this are functions that directly impact the object/event, such as del and merge
  • When implementing functions, focus on performance, but not at the expense of usability.
  • Try to design your function to be infallible, if you can make it fallible by removing one specific obscure part of it, then that's usually the best choice to make, we can always introduce that specific behaviour in a new, fallible — less often used — function.
  • We have test_type_def and test_function macro's, use them to validate your expectations!
  • We also have a bench_function macro to add Criterion benchmarks, add them if you think it'll help us understand the performance profile of complex functions.
  • Be sure to update function documentation when you update a function.
  • When first implementing a function, keep its scope narrow, we can always expand its scope later, but we can't take away features without breaking people's code.

naming

  • When adding a function, make sure it doesn't conflict with existing ones (meaning, it shouldn't be too similar to existing function names or capabilities)
  • In terms of naming, I'd say keep the most-often used functions short and simple, and use more descriptive names for more obscure functions unlikely to be used as often.
  • If a function is specific to a provider/source/sink, name it accordingly, e.g. aws_... etc
  • us is_* only for functions that return a boolean (but of course there are plenty of functions that don't use that pattern and still return a boolean, such as contains, which is fine)
  • use parse_* when parsing a string into a specific type (e.g. parse_timestamp, parse_json, and parse_url)
  • use format_* to go from any type to a string, where the string can be formatted in different ways
  • use to_* to convert between specific values (e.g. to_string, etc)

parameters

  • For function parameters, keep it limited, try not to add functions with too many parameters
  • As a convention put the value you're acting on as the first parameter, and name it value
  • Try not to restrict input types too much. e.g. don't just limit the input to a string literal if you don't have to, but instead accept any expression and resolve it to a string at runtime (we already have runtime checks that return an error if the expression doesn't return a string, and we're working towards more compile-time checking, and so at some point we'll inform users at boot-time to make sure their expression will return a string at runtime, f.e. by wrapping it in to_string)
  • Avoid using parameters such as default, there are (and will be) other language constructs to allow falling back to default values, so that we don't have to implement it for every function individually.

@binarylogic
Copy link
Contributor Author

@JeanMertz I updated the requirements to cover problems and questions that I've seen come up.

@binarylogic
Copy link
Contributor Author

Reopening since additional design questions are still popping up. Ex: should the new unnest function in #7038 take string or path arguments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: internal docs Anything related to Vector's internal documentation domain: vrl Anything related to the Vector Remap Language type: task Generic non-code related tasks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants