Skip to content

Commit

Permalink
Allow enabling PromQL experimental functions by tenant (#9798)
Browse files Browse the repository at this point in the history
* Allow enabling PromQL experimental functions by tenant

* Return name of experimental function

* Add comments and initial check

* Rename middleware

* Expand doc message for new config

* Don't rerun parsing for the query expression in the experimental functions middleware

* Only include the new middleware if experimental functions are enabled globally

* Add note

* Simplify containsExperimentalFunction

* Support configuring enabled experimental functions

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

* Fix build and update comment

* Add cautionary note

* Check experimental functions earlier

* Revise according to discussion

* Rerun parsing for the query expression in the experimental functions middleware and remove `GetQueryExpr`

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
  • Loading branch information
zenador and aknuds1 authored Nov 16, 2024
1 parent 810a7d9 commit 966f046
Show file tree
Hide file tree
Showing 11 changed files with 278 additions and 21 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
* [FEATURE] Querier: add experimental streaming PromQL engine, enabled with `-querier.query-engine=mimir`. #9367 #9368 #9398 #9399 #9403 #9417 #9418 #9419 #9420 #9482 #9504 #9505 #9507 #9518 #9531 #9532 #9533 #9553 #9558 #9588 #9589 #9639 #9641 #9642 #9651 #9664 #9681 #9717 #9719 #9724 #9874
* [FEATURE] Distributor: Add support for `lz4` OTLP compression. #9763
* [FEATURE] Query-frontend: added experimental configuration options `query-frontend.cache-errors` and `query-frontend.results-cache-ttl-for-errors` to allow non-transient responses to be cached. When set to `true` error responses from hitting limits or bad data are cached for a short TTL. #9028
* [FEATURE] Query-frontend: add middleware to control access to specific PromQL experimental functions on a per-tenant basis. #9798
* [FEATURE] gRPC: Support S2 compression. #9322
* `-alertmanager.alertmanager-client.grpc-compression=s2`
* `-ingester.client.grpc-compression=s2`
Expand Down
22 changes: 22 additions & 0 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -4306,6 +4306,17 @@
"fieldFlag": "query-frontend.align-queries-with-step",
"fieldType": "boolean"
},
{
"kind": "field",
"name": "enabled_promql_experimental_functions",
"required": false,
"desc": "Enable certain experimental PromQL functions, which are subject to being changed or removed at any time, on a per-tenant basis. Defaults to empty which means all experimental functions are disabled. Set to 'all' to enable all experimental functions.",
"fieldValue": null,
"fieldDefaultValue": "",
"fieldFlag": "query-frontend.enabled-promql-experimental-functions",
"fieldType": "string",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "cardinality_analysis_enabled",
Expand Down Expand Up @@ -6413,6 +6424,17 @@
"fieldType": "boolean",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "block_promql_experimental_functions",
"required": false,
"desc": "True to control access to specific PromQL experimental functions per tenant.",
"fieldValue": null,
"fieldDefaultValue": false,
"fieldFlag": "query-frontend.block-promql-experimental-functions",
"fieldType": "boolean",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "query_sharding_target_series_per_shard",
Expand Down
4 changes: 4 additions & 0 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -2199,6 +2199,8 @@ Usage of ./cmd/mimir/mimir:
[experimental] Timeout for writing active series responses. 0 means the value from `-server.http-write-timeout` is used. (default 5m0s)
-query-frontend.align-queries-with-step
Mutate incoming queries to align their start and end with their step to improve result caching.
-query-frontend.block-promql-experimental-functions
[experimental] True to control access to specific PromQL experimental functions per tenant.
-query-frontend.cache-errors
[experimental] Cache non-transient errors from queries.
-query-frontend.cache-results
Expand All @@ -2207,6 +2209,8 @@ Usage of ./cmd/mimir/mimir:
Cache requests that are not step-aligned.
-query-frontend.downstream-url string
URL of downstream Prometheus.
-query-frontend.enabled-promql-experimental-functions comma-separated-list-of-strings
[experimental] Enable certain experimental PromQL functions, which are subject to being changed or removed at any time, on a per-tenant basis. Defaults to empty which means all experimental functions are disabled. Set to 'all' to enable all experimental functions.
-query-frontend.grpc-client-config.backoff-max-period duration
Maximum delay when backing off. (default 10s)
-query-frontend.grpc-client-config.backoff-min-period duration
Expand Down
12 changes: 12 additions & 0 deletions docs/sources/mimir/configure/configuration-parameters/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -1681,6 +1681,11 @@ results_cache:
# CLI flag: -query-frontend.prune-queries
[prune_queries: <boolean> | default = false]

# (experimental) True to control access to specific PromQL experimental
# functions per tenant.
# CLI flag: -query-frontend.block-promql-experimental-functions
[block_promql_experimental_functions: <boolean> | default = false]

# (advanced) How many series a single sharded partial query should load at most.
# This is not a strict requirement guaranteed to be honoured by query sharding,
# but a hint given to the query sharding when the query execution is initially
Expand Down Expand Up @@ -3506,6 +3511,13 @@ The `limits` block configures default and per-tenant limits imposed by component
# CLI flag: -query-frontend.align-queries-with-step
[align_queries_with_step: <boolean> | default = false]

# (experimental) Enable certain experimental PromQL functions, which are subject
# to being changed or removed at any time, on a per-tenant basis. Defaults to
# empty which means all experimental functions are disabled. Set to 'all' to
# enable all experimental functions.
# CLI flag: -query-frontend.enabled-promql-experimental-functions
[enabled_promql_experimental_functions: <string> | default = ""]

# Enables endpoints used for cardinality analysis.
# CLI flag: -querier.cardinality-analysis-enabled
[cardinality_analysis_enabled: <boolean> | default = false]
Expand Down
113 changes: 113 additions & 0 deletions pkg/frontend/querymiddleware/experimental_functions.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
// SPDX-License-Identifier: AGPL-3.0-only

package querymiddleware

import (
"context"
"fmt"

"github.com/go-kit/log"
"github.com/grafana/dskit/tenant"
"github.com/prometheus/prometheus/promql/parser"
"golang.org/x/exp/slices"

apierror "github.com/grafana/mimir/pkg/api/error"
)

const (
allExperimentalFunctions = "all"
)

type experimentalFunctionsMiddleware struct {
next MetricsQueryHandler
limits Limits
logger log.Logger
}

// newExperimentalFunctionsMiddleware creates a middleware that blocks queries that contain PromQL experimental functions
// that are not enabled for the active tenant(s), allowing us to enable specific functions only for selected tenants.
func newExperimentalFunctionsMiddleware(limits Limits, logger log.Logger) MetricsQueryMiddleware {
return MetricsQueryMiddlewareFunc(func(next MetricsQueryHandler) MetricsQueryHandler {
return &experimentalFunctionsMiddleware{
next: next,
limits: limits,
logger: logger,
}
})
}

func (m *experimentalFunctionsMiddleware) Do(ctx context.Context, req MetricsQueryRequest) (Response, error) {
tenantIDs, err := tenant.TenantIDs(ctx)
if err != nil {
return nil, apierror.New(apierror.TypeBadData, err.Error())
}

enabledExperimentalFunctions := make(map[string][]string, len(tenantIDs))
allExperimentalFunctionsEnabled := true
for _, tenantID := range tenantIDs {
enabled := m.limits.EnabledPromQLExperimentalFunctions(tenantID)
enabledExperimentalFunctions[tenantID] = enabled
if len(enabled) == 0 || enabled[0] != allExperimentalFunctions {
allExperimentalFunctionsEnabled = false
}
}

if allExperimentalFunctionsEnabled {
// If all experimental functions are enabled for all tenants here, we don't need to check the query
// for those functions and can skip this middleware.
return m.next.Do(ctx, req)
}

expr, err := parser.ParseExpr(req.GetQuery())
if err != nil {
return nil, apierror.New(apierror.TypeBadData, DecorateWithParamName(err, "query").Error())
}
funcs := containedExperimentalFunctions(expr)
if len(funcs) == 0 {
// This query does not contain any experimental functions, so we can continue to the next middleware.
return m.next.Do(ctx, req)
}

// Make sure that every used experimental function is enabled for all the tenants here.
for name := range funcs {
for tenantID, enabled := range enabledExperimentalFunctions {
if len(enabled) > 0 && enabled[0] == allExperimentalFunctions {
// If the first item matches the const value of allExperimentalFunctions, then all experimental
// functions are enabled for this tenant.
continue
}
if !slices.Contains(enabled, name) {
err := fmt.Errorf("function %q is not enabled for tenant %s", name, tenantID)
return nil, apierror.New(apierror.TypeBadData, DecorateWithParamName(err, "query").Error())
}
}
}

// Every used experimental function is enabled for the tenant(s).
return m.next.Do(ctx, req)
}

// containedExperimentalFunctions returns any PromQL experimental functions used in the query.
func containedExperimentalFunctions(expr parser.Expr) map[string]struct{} {
expFuncNames := map[string]struct{}{}
parser.Inspect(expr, func(node parser.Node, _ []parser.Node) error {
call, ok := node.(*parser.Call)
if ok {
if parser.Functions[call.Func.Name].Experimental {
expFuncNames[call.Func.Name] = struct{}{}
}
return nil
}
agg, ok := node.(*parser.AggregateExpr)
if ok {
// Note that unlike most PromQL functions, the experimental nature of the aggregation functions are manually
// defined and enforced, so they have to be hardcoded here and updated along with changes in Prometheus.
switch agg.Op {
case parser.LIMITK, parser.LIMIT_RATIO:
expFuncNames[agg.Op.String()] = struct{}{}
}
}
return nil
})
return expFuncNames
}
64 changes: 64 additions & 0 deletions pkg/frontend/querymiddleware/experimental_functions_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
// SPDX-License-Identifier: AGPL-3.0-only

package querymiddleware

import (
"testing"

"github.com/prometheus/prometheus/promql/parser"
"github.com/stretchr/testify/require"
)

func TestContainedExperimentalFunctions(t *testing.T) {
t.Cleanup(func() { parser.EnableExperimentalFunctions = false })
parser.EnableExperimentalFunctions = true

testCases := map[string]struct {
query string
expect []string
}{
"sum by": {
query: `sum(up) by (namespace)`,
},
"mad_over_time": {
query: `mad_over_time(up[5m])`,
expect: []string{"mad_over_time"},
},
"mad_over_time with sum and by": {
query: `sum(mad_over_time(up[5m])) by (namespace)`,
expect: []string{"mad_over_time"},
},
"sort_by_label": {
query: `sort_by_label({__name__=~".+"}, "__name__")`,
expect: []string{"sort_by_label"},
},
"sort_by_label_desc": {
query: `sort_by_label_desc({__name__=~".+"}, "__name__")`,
expect: []string{"sort_by_label_desc"},
},
"limitk": {
query: `limitk by (group) (0, up)`,
expect: []string{"limitk"},
},
"limit_ratio": {
query: `limit_ratio(0.5, up)`,
expect: []string{"limit_ratio"},
},
"limit_ratio with mad_over_time": {
query: `limit_ratio(0.5, mad_over_time(up[5m]))`,
expect: []string{"limit_ratio", "mad_over_time"},
},
}

for name, tc := range testCases {
t.Run(name, func(t *testing.T) {
expr, err := parser.ParseExpr(tc.query)
require.NoError(t, err)
var enabled []string
for key := range containedExperimentalFunctions(expr) {
enabled = append(enabled, key)
}
require.ElementsMatch(t, tc.expect, enabled)
})
}
}
3 changes: 3 additions & 0 deletions pkg/frontend/querymiddleware/limits.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,9 @@ type Limits interface {
// ResultsCacheForUnalignedQueryEnabled returns whether to cache results for queries that are not step-aligned
ResultsCacheForUnalignedQueryEnabled(userID string) bool

// EnabledPromQLExperimentalFunctions returns the names of PromQL experimental functions allowed for the tenant.
EnabledPromQLExperimentalFunctions(userID string) []string

// BlockedQueries returns the blocked queries.
BlockedQueries(userID string) []*validation.BlockedQuery

Expand Down
9 changes: 9 additions & 0 deletions pkg/frontend/querymiddleware/limits_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -569,6 +569,10 @@ func (m multiTenantMockLimits) ResultsCacheForUnalignedQueryEnabled(userID strin
return m.byTenant[userID].resultsCacheForUnalignedQueryEnabled
}

func (m multiTenantMockLimits) EnabledPromQLExperimentalFunctions(userID string) []string {
return m.byTenant[userID].enabledPromQLExperimentalFunctions
}

func (m multiTenantMockLimits) BlockedQueries(userID string) []*validation.BlockedQuery {
return m.byTenant[userID].blockedQueries
}
Expand Down Expand Up @@ -615,6 +619,7 @@ type mockLimits struct {
resultsCacheTTLForLabelsQuery time.Duration
resultsCacheTTLForErrors time.Duration
resultsCacheForUnalignedQueryEnabled bool
enabledPromQLExperimentalFunctions []string
blockedQueries []*validation.BlockedQuery
alignQueriesWithStep bool
queryIngestersWithin time.Duration
Expand Down Expand Up @@ -703,6 +708,10 @@ func (m mockLimits) ResultsCacheForUnalignedQueryEnabled(string) bool {
return m.resultsCacheForUnalignedQueryEnabled
}

func (m mockLimits) EnabledPromQLExperimentalFunctions(string) []string {
return m.enabledPromQLExperimentalFunctions
}

func (m mockLimits) CreationGracePeriod(string) time.Duration {
return m.creationGracePeriod
}
Expand Down
42 changes: 31 additions & 11 deletions pkg/frontend/querymiddleware/roundtrip.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,17 +56,18 @@ var (

// Config for query_range middleware chain.
type Config struct {
SplitQueriesByInterval time.Duration `yaml:"split_queries_by_interval" category:"advanced"`
ResultsCacheConfig `yaml:"results_cache"`
CacheResults bool `yaml:"cache_results"`
CacheErrors bool `yaml:"cache_errors" category:"experimental"`
MaxRetries int `yaml:"max_retries" category:"advanced"`
NotRunningTimeout time.Duration `yaml:"not_running_timeout" category:"advanced"`
ShardedQueries bool `yaml:"parallelize_shardable_queries"`
PrunedQueries bool `yaml:"prune_queries" category:"experimental"`
TargetSeriesPerShard uint64 `yaml:"query_sharding_target_series_per_shard" category:"advanced"`
ShardActiveSeriesQueries bool `yaml:"shard_active_series_queries" category:"experimental"`
UseActiveSeriesDecoder bool `yaml:"use_active_series_decoder" category:"experimental"`
SplitQueriesByInterval time.Duration `yaml:"split_queries_by_interval" category:"advanced"`
ResultsCacheConfig `yaml:"results_cache"`
CacheResults bool `yaml:"cache_results"`
CacheErrors bool `yaml:"cache_errors" category:"experimental"`
MaxRetries int `yaml:"max_retries" category:"advanced"`
NotRunningTimeout time.Duration `yaml:"not_running_timeout" category:"advanced"`
ShardedQueries bool `yaml:"parallelize_shardable_queries"`
PrunedQueries bool `yaml:"prune_queries" category:"experimental"`
BlockPromQLExperimentalFunctions bool `yaml:"block_promql_experimental_functions" category:"experimental"`
TargetSeriesPerShard uint64 `yaml:"query_sharding_target_series_per_shard" category:"advanced"`
ShardActiveSeriesQueries bool `yaml:"shard_active_series_queries" category:"experimental"`
UseActiveSeriesDecoder bool `yaml:"use_active_series_decoder" category:"experimental"`

// CacheKeyGenerator allows to inject a CacheKeyGenerator to use for generating cache keys.
// If nil, the querymiddleware package uses a DefaultCacheKeyGenerator with SplitQueriesByInterval.
Expand All @@ -92,6 +93,7 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
f.BoolVar(&cfg.CacheErrors, "query-frontend.cache-errors", false, "Cache non-transient errors from queries.")
f.BoolVar(&cfg.ShardedQueries, "query-frontend.parallelize-shardable-queries", false, "True to enable query sharding.")
f.BoolVar(&cfg.PrunedQueries, "query-frontend.prune-queries", false, "True to enable pruning dead code (eg. expressions that cannot produce any results) and simplifying expressions (eg. expressions that can be evaluated immediately) in queries.")
f.BoolVar(&cfg.BlockPromQLExperimentalFunctions, "query-frontend.block-promql-experimental-functions", false, "True to control access to specific PromQL experimental functions per tenant.")
f.Uint64Var(&cfg.TargetSeriesPerShard, "query-frontend.query-sharding-target-series-per-shard", 0, "How many series a single sharded partial query should load at most. This is not a strict requirement guaranteed to be honoured by query sharding, but a hint given to the query sharding when the query execution is initially planned. 0 to disable cardinality-based hints.")
f.StringVar(&cfg.QueryResultResponseFormat, "query-frontend.query-result-response-format", formatProtobuf, fmt.Sprintf("Format to use when retrieving query results from queriers. Supported values: %s", strings.Join(allFormats, ", ")))
f.BoolVar(&cfg.ShardActiveSeriesQueries, "query-frontend.shard-active-series-queries", false, "True to enable sharding of active series queries.")
Expand Down Expand Up @@ -453,6 +455,24 @@ func newQueryMiddlewares(
queryInstantMiddleware = append(queryInstantMiddleware, newInstrumentMiddleware("retry", metrics), newRetryMiddleware(log, cfg.MaxRetries, retryMiddlewareMetrics))
}

if parser.EnableExperimentalFunctions && cfg.BlockPromQLExperimentalFunctions {
// We only need to check for tenant-specific settings if experimental functions are enabled globally
// and if we want to control access to them per tenant.
// Does not apply to remote read as those are executed remotely and the enabling of PromQL experimental
// functions for those are not controlled here.
experimentalFunctionsMiddleware := newExperimentalFunctionsMiddleware(limits, log)
queryRangeMiddleware = append(
queryRangeMiddleware,
newInstrumentMiddleware("experimental_functions", metrics),
experimentalFunctionsMiddleware,
)
queryInstantMiddleware = append(
queryInstantMiddleware,
newInstrumentMiddleware("experimental_functions", metrics),
experimentalFunctionsMiddleware,
)
}

return
}

Expand Down
3 changes: 3 additions & 0 deletions pkg/frontend/querymiddleware/roundtrip_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -541,11 +541,13 @@ func TestMiddlewaresConsistency(t *testing.T) {
cfg.CacheResults = true
cfg.ShardedQueries = true
cfg.PrunedQueries = true
cfg.BlockPromQLExperimentalFunctions = true

// Ensure all features are enabled, so that we assert on all middlewares.
require.NotZero(t, cfg.CacheResults)
require.NotZero(t, cfg.ShardedQueries)
require.NotZero(t, cfg.PrunedQueries)
require.NotZero(t, cfg.BlockPromQLExperimentalFunctions)
require.NotZero(t, cfg.SplitQueriesByInterval)
require.NotZero(t, cfg.MaxRetries)

Expand Down Expand Up @@ -585,6 +587,7 @@ func TestMiddlewaresConsistency(t *testing.T) {
"splitInstantQueryByIntervalMiddleware", // Not applicable because specific to instant queries.
"stepAlignMiddleware", // Not applicable because remote read requests don't take step in account when running in Mimir.
"pruneMiddleware", // No query pruning support.
"experimentalFunctionsMiddleware", // No blocking for PromQL experimental functions as it is executed remotely.
},
},
}
Expand Down
Loading

0 comments on commit 966f046

Please sign in to comment.