RFC: surface warnings and/or errors for unknown vindex params #13041

maxenglander · 2023-05-08T16:36:59Z

Introduction

Vindexes allow Vindex-specific parameters to be passed via the params field. For example:

{
  "type": "lookup",
  "params": {
    "autocommit": "true",
    "read_lock": "none"
  }
}

Currently it is the responsibility of each Vindex to validate its own params. Some Vindexes do strict validation of params, while other Vindexes do very little param validation. All Vindexes happily accept unknown params.

Problem

For the most part, there aren't any problems with this. If a user forgets to pass in a required param, the Vindex will complain or fail, and the user will know about it.

However, if the user passes in an optional param, but makes a typo in the param name, most Vindexes will happily accept the typo param, and do nothing with it.

In some cases that silent ignore can be a problem. For example, the lookup Vindex family supports a number of performance-impacting parameters, such as read_lock. If a user tries to set raed_lock (notice typo) to none, expecting that this will reduce lock contention, they will be in for an unpleasant surprise during the next traffic spike.

Proposal

This RFC proposes updating Vitess to warn users when they pass in unknown parameters to the Vindexes. The warnings can be surfaced to users via logs and metrics.

Importantly, warnings will be backwards-compatible, and it will be the responsibility of users to monitor logs and metrics for these warnings, and take additional corrective actions.

Optionally, a user can opt in to stricter param validation via a flag such as --vschema-strict, which will escalate these warnings to errors. A user might choose to opt in to --vschema-strict in development and staging environments, but leave the default behavior on in production.

Options

Some ways this could be implemented.

Have vindexes optionally report warnings

Introduce a ParamValidating interface like:

+ type ParamValidating interface {
+   UnknownParams() []string
+ }

Then update the logic of individual Vindexes to return non-empty UnknownParams() when any parameters passed to the Vindex during Vindex creation was unknown.

Update various parts of Vitess codebase to look for unknown params in Vindexes, and emit logs and metrics. For example:

Update vtgate.(*VschemaStats) to export warnings in its HTTP output.
Log warnings and increment a vtgate_vschema_vindex_unknown_param_count stat from vtgate.(*VSchemaManager).
Update ApplyVSchema to return warnings in response, while continuing to accept the input if there aren't any errors.

Pros

Is not a breaking change.
Does not require any future deprecations or removals.
Does not add any complexity to upgrades.
Does not require users to change their VSchemas.
Gives users soft warnings (metrics, logs) by default with the option of opting in to hard errors.

Cons

Requires more code changes compared to the other options.
Potentially difficult to maintain: any time we add or remove a supported param from a Vindex, we need to update both the validation code as well as the code that actually uses that param.

Add per-Vindex VSchema params

Currently all Vindexes that accept parameters do so via the params field, which is not really strongly typed.

We can introduce stronger typing by having each Vindex which declares params register strongly typed params in the VSchema protobuf definition.

How this might look in the protobuf definition.

// Vindex is the vindex info for a Keyspace.
message Vindex {
+  message LookupVindexParams {
+    enum ReadLock {
+      READ_LOCK_EXCLUSIVE = 0;
+      READ_LOCK_SHARED = 1;
+      READ_LOCK_NONE = 2;
+    }
+  } 
+
+   bool autocommit = 1;
+   ReadLock read_lock = 2;
+ }
  // The type must match one of the predefined
  // (or plugged in) vindex names.
  string type = 1;
  // params is a map of attribute value pairs
  // that must be defined as required by the
  // vindex constructors. The values can only
  // be strings.
  map<string, string> params = 2;
  // A lookup vindex can have an owner table defined.
  // If so, rows in the lookup table are created or
  // deleted in sync with corresponding rows in the
  // owner table.
  string owner = 3;

+  // Params specific to "lookup" Vindex. 
+  LookupVindexParams lookup_params;
}

A user could opt in to stronger Vindex param validation by specifying lookup_params in the VSchema:

{
  "type": "lookup",
  "lookup_params": {
     "autocommit": true,
     "read_lock": "none" 
  }
}

Notice how in the VSchema above autocommit is a boolean, instead of a string with the word "true".

Initially we would support both params as well as lookup_params. lookup_params would take precedence when present, and fall back to params when null.

We could choose to support both params as well as {type}_params indefinitely, or else eventually deprecate and remove params. The former would give users to option of choosing between loose param validation or strict param validation by using either params or {type}_params.

Pros

Does not require a lot of code relative to the other options. Protobuf parsing will do all the heavy lifting of parsing and validating params.
Allows users to opt into strict validation at their own pace, one Vindex at a time.

Cons

If we eventually deprecate and remove params in favor of {type}_params, that would be a big change to force on the community.
If we support both params and {type}_params indefinitely, that is a burden for maintainers to support.
Is probably a non-starter for out-of-tree Vindex plugins; we would need to support the loosely typed params field indefinitely for those out-of-tree plugins.
Could make upgrades more difficult to execute. For example, if we ever removed the read_lock field from the protobuf definition, in order to upgrade users would need to first remove this field from their VSchema, and then upgrade their VT* components.

Recommendation

I really like that option 2 would add strong typing for Vindex params, but I think the main challenge is that out-of-tree Vindex plugins won't be able to take advantage of that.

I recommend option 1 because:

It does not require users to change their VSchemas in order to opt in to soft warnings.
Users can choose to ignore warnings, monitor them, or treat them as errors via --vschema-strict.
It does not create any breaking changes in code.
It gives out-of-tree Vindex plugins the ability to issue their own warnings.

Decision

I like the RAPID model for decision making.

Recommend: @maxenglander
Agree: @deepthi @systay
Perform: @maxenglander
Input: community
Decide: @harshit-gangal: "I think Option 1 is good as it does not change user behaviour and gives the option to log warnings to indicate the issue with the vindex setup."

Tasks

Create and implement vindexes.ParamValidating interface, via vindexes: return unknown params #12951 .
Expose unknown params via logs and metrics.
- Expose VTGate metric, via vtctl,vindexes: logs warnings and export stat for unknown vindex params #13322.
- Report warnings from vtctl ApplyVSchema CLI command, via vtctl,vindexes: logs warnings and export stat for unknown vindex params #13322.
- Report warnings from ApplyVSchemaResponse gRPC response.
Opt into strict validation via flag.

The text was updated successfully, but these errors were encountered:

harshit-gangal · 2023-05-09T11:10:10Z

Regarding Vindex Warnings.
Till now any addition is done using an interface so that it does not change the vindex creation syntax and implementing the interface is opt-in.
This way any custom implementation does not have to change anything.

type VindexWarning interface {
    func VindexWarnings() []string
}

If I have understood correctly, regarding supporting feature #12413. I think we can change the params from map[string]string to map[string]any and the type checking will be performed by the vindex itself with json unmarshal.

maxenglander · 2023-05-09T14:10:26Z

Hey @harshit-gangal

Till now any addition is done using an interface so that it does not change the vindex creation syntax and implementing the interface is opt-in.

OK that makes sense! I'll rework option 1 to do that.

If I have understood correctly, regarding supporting feature #12413.

Yeah using map[string]any is another way to solve #12413. Overall I think I am muddying the RFC by talking about that feature request, so I'm going to remove discussion of that from option 2.

maxenglander · 2023-05-09T21:09:36Z

@harshit-gangal made those updates

harshit-gangal · 2023-05-18T09:07:57Z

I think Option 1 is good as it does not change user behaviour and gives the option to log warnings to indicate the issue with the vindex setup.

maxenglander · 2023-05-18T14:03:49Z

@harshit-gangal great, I'll get started on updating the PR to match option 1.

One thing about option 1 is that it will require Vindexes to store their either their input params or their warnings, e.g.:

type binary struct {
  warnings []error
}

type (b *binary) Warnings() []error {
  return b.warnings
}

In theory this will increase the amount of memory used by Vindexes compared to if they just returned warnings during Vindex creation phase.

I assume you already thought of this and are not concerned, but just adding this comment in case you or anyone else would have concerns about that.

harshit-gangal · 2023-05-18T15:35:19Z

You can check and return at that point in time as this is not something will be used repeatedly

maxenglander · 2023-05-18T15:41:19Z

@harshit-gangal even if this is not used repeatedly, Vindexes will be carrying extra memory from holding param errors:

type binary struct {
  warnings []error
}

type (b *binary) Warnings() []error {
  return b.warnings
}

func newBinary(name string, params map[string]string) (Vindex, error) {
  return &binary{name: name, warnings: validateParams(params)}
}

We can't check and return an error in newBinary that would be a breaking change. We have to store any warnings that were found directly in the Vindex, so that they can be checked by callers. If a caller chooses not to check the warnings, that means the warnings will be held in memory for the lifetime of the VTGate.

harshit-gangal · 2023-05-18T16:35:35Z

I say we do not need to store the warnings , we already have the passed in params.
When vtgate ask for vindex warnings using Warnings() []error at that point, vindex will analyze the passed in params and return the [] error

maxenglander · 2023-05-18T17:02:30Z

Hey @harshit-gangal I can do that but I think that is likely to use more memory than storing the warnings. If we pass in 10 params and they are all valid, then no warnings are generated and storing the params on the Vindex is just wasted memory. If we store the warnings, if all the passed in params are valid then we will use less memory.

harshit-gangal · 2023-05-18T17:27:39Z

We are already storing the parameters. The only way to make vindex know what to do with the passed in params.

harshit-gangal · 2023-05-18T17:32:49Z

I see what you mean, we use the parameters and store transformed values.

Today we store any issue with a keyspace in

type KeyspaceSchema struct {
	Keyspace *Keyspace
	Tables   map[string]*Table
	Vindexes map[string]Vindex
	Views    map[string]sqlparser.SelectStatement
	>>> Error    error <<<
}

should we continue to use this and add warnings here?

harshit-gangal · 2023-05-18T17:34:28Z

Also, do we really need []error store or a single aggregated error is enough?

maxenglander · 2023-05-18T18:10:33Z

Hey @harshit-gangal even before we think about storing values on KeyspaceSchema we need to store new values on the Vindex itself. Otherwise the Vindex won't be able to report whether params passed to it during creation were invalid or not. Those new values will take up memory, and the only question is do we want to take up more (if we store map[string]string or less (if we store []VindexWarning) memory.

In terms of what we store on KeyspaceSchem, I don't think it makes sense to re-use Error, since that will potentially cause unexpected errors for users. The idea behind having warnings is that we can emit logs and metrics, and maybe in a future PR let users opt in to stricter validation with --vschema-strict or something like that.

If you think that it makes more sense to store errors in KeyspaceSchema.Error, then I think we can forget about the WarningReporter interface entirely and just return warnings in the error return value of CreateVindex. But it will potentially break things for users who are currently passing in unknown params into their VSchemas.

Also, do we really need []error store or a single aggregated error is enough?

I think it would be more useful to store these as separate []error, that way we can increment a warning metric by len(errors). However I don't feel too strongly about it.

maxenglander · 2023-05-19T02:39:24Z

Also, do we really need []error store or a single aggregated error is enough?

Alternately the interface could be something like:

type ParamValidating interface {
    // UnknownParams returns a slice of params that were passed to the Vindex
    // at creation time but ignored because they were not known.
    UnknownParams() []string
}

And let callers decide whether they qualify as errors or not.

harshit-gangal · 2023-05-19T04:52:19Z

This sounds reasonable. We can store the unknown parameters at the vindex level. The memory will increase only if there are unknown parameters passed so this is acceptable.

maxenglander · 2023-05-19T13:02:45Z

@harshit-gangal OK have updated the RFC to incorporate the convo up to this point and updated #12951 to match.

brendar · 2023-07-20T19:51:47Z

@maxenglander It's great to see these changes. I noticed that #13322 added warnings to the Vtctl ApplyVSchema output. Are you planning to add something similar to the ApplyVSchema gRPC? (I'm not sure what that would look like. Maybe warning messages in the ApplyVSchemaResponse?)

maxenglander · 2023-07-20T19:55:22Z

@brendar good idea! Added to the task list in the issue description. I'll take a crack at it if someone else doesn't beat me to it.

maxenglander · 2023-12-21T13:25:59Z

Oops, didn't mean to close this, one item left.

Opt into strict validation via flag.

@brendar the ApplyVSchema gRPC response now includes warnings for unknown params, both with and without --dry-run.

maxenglander self-assigned this May 8, 2023

maxenglander mentioned this issue May 8, 2023

vindexes: return unknown params #12951

Merged

4 tasks

harshit-gangal added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving Type: RFC Request For Comment labels May 9, 2023

maxenglander mentioned this issue Jun 14, 2023

Feature Request: emit a metric and log when VSchema contains unrecognized parameters #12476

Closed

maxenglander mentioned this issue Dec 4, 2023

vtctldclient,grpcvtctldserver ApplySchema: return unknown params from grpcvtctldserver.ApplySchema, log them in vtctldclient.ApplySchema #14672

Merged

4 tasks

rohit-nayak-ps closed this as completed in #14672 Dec 21, 2023

maxenglander reopened this Dec 21, 2023

maxenglander mentioned this issue Dec 26, 2023

vtctldclient: --strict rejects unknown vindex params in ApplyVSchema #14862

Merged

5 tasks

rohit-nayak-ps closed this as completed in #14862 Dec 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: surface warnings and/or errors for unknown vindex params #13041

RFC: surface warnings and/or errors for unknown vindex params #13041

maxenglander commented May 8, 2023 •

edited

Loading

harshit-gangal commented May 9, 2023

maxenglander commented May 9, 2023 •

edited

Loading

maxenglander commented May 9, 2023

harshit-gangal commented May 18, 2023 •

edited

Loading

maxenglander commented May 18, 2023

harshit-gangal commented May 18, 2023

maxenglander commented May 18, 2023

harshit-gangal commented May 18, 2023

maxenglander commented May 18, 2023

harshit-gangal commented May 18, 2023

harshit-gangal commented May 18, 2023

harshit-gangal commented May 18, 2023

maxenglander commented May 18, 2023

maxenglander commented May 19, 2023

harshit-gangal commented May 19, 2023

maxenglander commented May 19, 2023

brendar commented Jul 20, 2023

maxenglander commented Jul 20, 2023

maxenglander commented Dec 21, 2023

RFC: surface warnings and/or errors for unknown vindex params #13041

RFC: surface warnings and/or errors for unknown vindex params #13041

Comments

maxenglander commented May 8, 2023 • edited Loading

Introduction

Problem

Proposal

Options

Have vindexes optionally report warnings

Pros

Cons

Add per-Vindex VSchema params

Pros

Cons

Recommendation

Decision

Tasks

harshit-gangal commented May 9, 2023

maxenglander commented May 9, 2023 • edited Loading

maxenglander commented May 9, 2023

harshit-gangal commented May 18, 2023 • edited Loading

maxenglander commented May 18, 2023

harshit-gangal commented May 18, 2023

maxenglander commented May 18, 2023

harshit-gangal commented May 18, 2023

maxenglander commented May 18, 2023

harshit-gangal commented May 18, 2023

harshit-gangal commented May 18, 2023

harshit-gangal commented May 18, 2023

maxenglander commented May 18, 2023

maxenglander commented May 19, 2023

harshit-gangal commented May 19, 2023

maxenglander commented May 19, 2023

brendar commented Jul 20, 2023

maxenglander commented Jul 20, 2023

maxenglander commented Dec 21, 2023

maxenglander commented May 8, 2023 •

edited

Loading

maxenglander commented May 9, 2023 •

edited

Loading

harshit-gangal commented May 18, 2023 •

edited

Loading