Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Frontend: new service for queuing and retrying queries. #910

Merged
merged 10 commits into from
Aug 16, 2018
Merged
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,14 @@ cmd/configs/configs
cmd/distributor/distributor
cmd/ingester/ingester
cmd/querier/querier
cmd/query-frontend/query-frontend
cmd/ruler/ruler
cmd/table-manager/table-manager
cmd/lite/lite
.uptodate
.pkg
.cache
pkg/ingester/client/cortex.pb.go
pkg/querier/frontend/frontend.pb.go
pkg/ring/ring.pb.go
images/
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ images:
@echo > /dev/null

# Generating proto code is automated.
PROTO_DEFS := $(shell find . $(DONT_FIND) -type f -name '*.proto' -print)
PROTO_DEFS := $(shell find . $(DONT_FIND) -type f -name '*.proto' -print) vendor/github.com/weaveworks/common/httpgrpc/httpgrpc.proto
PROTO_GOS := $(patsubst %.proto,%.pb.go,$(PROTO_DEFS))

# Building binaries is now automated. The convention is to build a binary
Expand Down
13 changes: 12 additions & 1 deletion cmd/querier/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,15 @@ import (
"github.com/prometheus/prometheus/web/api/v1"
"github.com/prometheus/tsdb"

httpgrpc_server "github.com/weaveworks/common/httpgrpc/server"
"github.com/weaveworks/common/middleware"
"github.com/weaveworks/common/server"
"github.com/weaveworks/common/tracing"
"github.com/weaveworks/cortex/pkg/chunk"
"github.com/weaveworks/cortex/pkg/chunk/storage"
"github.com/weaveworks/cortex/pkg/distributor"
"github.com/weaveworks/cortex/pkg/querier"
"github.com/weaveworks/cortex/pkg/querier/frontend"
"github.com/weaveworks/cortex/pkg/ring"
"github.com/weaveworks/cortex/pkg/util"
)
Expand All @@ -39,9 +41,10 @@ func main() {
chunkStoreConfig chunk.StoreConfig
schemaConfig chunk.SchemaConfig
storageConfig storage.Config
workerConfig frontend.WorkerConfig
)
util.RegisterFlags(&serverConfig, &ringConfig, &distributorConfig, &querierConfig,
&chunkStoreConfig, &schemaConfig, &storageConfig)
&chunkStoreConfig, &schemaConfig, &storageConfig, &workerConfig)
flag.Parse()

// Setting the environment variable JAEGER_AGENT_HOST enables tracing
Expand Down Expand Up @@ -86,6 +89,14 @@ func main() {
}
defer chunkStore.Stop()

// TODO this avoids our middleware for logging and latecy collection.
worker, err := frontend.NewWorker(workerConfig, httpgrpc_server.NewServer(server.HTTP), util.Logger)
if err != nil {
level.Error(util.Logger).Log("err", err)
os.Exit(1)
}
defer worker.Stop()

queryable, engine := querier.Make(querierConfig, dist, chunkStore)
api := v1.NewAPI(
engine,
Expand Down
10 changes: 10 additions & 0 deletions cmd/query-frontend/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
FROM alpine:3.8
RUN apk add --no-cache ca-certificates
COPY query-frontend /bin/query-frontend
EXPOSE 80
ENTRYPOINT [ "/bin/query-frontend" ]

ARG revision
LABEL org.opencontainers.image.title="query-frontend" \
org.opencontainers.image.source="https://github.com/weaveworks/cortex/tree/master/cmd/query-frontend" \
org.opencontainers.image.revision="${revision}"
53 changes: 53 additions & 0 deletions cmd/query-frontend/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
package main

import (
"flag"
"os"

"github.com/go-kit/kit/log/level"
"google.golang.org/grpc"

"github.com/weaveworks/common/middleware"
"github.com/weaveworks/common/server"
"github.com/weaveworks/common/tracing"
"github.com/weaveworks/cortex/pkg/querier/frontend"
"github.com/weaveworks/cortex/pkg/util"
)

func main() {
var (
serverConfig = server.Config{
MetricsNamespace: "cortex",
GRPCMiddleware: []grpc.UnaryServerInterceptor{
middleware.ServerUserHeaderInterceptor,
},
}
frontendConfig frontend.Config
)
util.RegisterFlags(&serverConfig, &frontendConfig)
flag.Parse()

// Setting the environment variable JAEGER_AGENT_HOST enables tracing
trace := tracing.NewFromEnv("query-frontend")
defer trace.Close()

util.InitLogger(&serverConfig)

server, err := server.New(serverConfig)
if err != nil {
level.Error(util.Logger).Log("msg", "error initializing server", "err", err)
os.Exit(1)
}
defer server.Shutdown()

f, err := frontend.New(frontendConfig, util.Logger)
if err != nil {
level.Error(util.Logger).Log("msg", "error initializing frontend", "err", err)
os.Exit(1)
}
defer f.Close()

frontend.RegisterFrontendServer(server.GRPC, f)
server.HTTP.PathPrefix("/api/prom").Handler(middleware.AuthenticateUser.Wrap(f))
server.Run()
}
28 changes: 28 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Cortex Architecture

*NB this document is a work-in-progress.*

The Cortex architecture consists of multiple, horizontally scalable microservices. Each microservice uses the most appropriate technique for horizontal scaling; most are stateless and can handle requests for any users, and some (the ingesters) are semi-stateful and depend on consistent hashing.

For more details on the Cortex architecture, you should read / watch:
- The original design doc "[Project Frankenstein: A multi tenant, scale out Prometheus](https://docs.google.com/document/d/1C7yhMnb1x2sfeoe45f4mnnKConvroWhJ8KQZwIHJOuw/edit#heading=h.nimsq29kl184)"
- PromCon 2016 Talk: "[Multitenant, Scale-Out Prometheus](https://promcon.io/2016-berlin/talks/multitenant-scale-out-prometheus/)"
- KubeCon Prometheus Day talk "Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service" [slides](http://www.slideshare.net/weaveworks/weave-cortex-multitenant-horizontally-scalable-prometheus-as-a-service) [video](https://www.youtube.com/watch?v=9Uctgnazfwk)
- PromCon 2017 Talk: "[Cortex: Prometheus as a Service, One Year On](https://promcon.io/2017-munich/talks/cortex-prometheus-as-a-service-one-year-on/)"
- CNCF TOC Presentation; "Horizontally Scalable, Multi-tenant Prometheus" [slides](https://docs.google.com/presentation/d/190oIFgujktVYxWZLhLYN4q8p9dtQYoe4sxHgn4deBSI/edit#slide=id.g3b8e2d6f7e_0_6)

## Query Path

### Query Frontend

The query frontend is an optional job which accepts HTTP requests and queues them by tenant ID, retrying them on errors. This allow for the occasional large query which would otherwise cause a querier OOM, allowing us to over-provision querier parallelism. Also, it prevents multiple large requests from being convoyed on a single querier by distributing them FIFO across all queriers. And finally, it prevent a single tenant from DoSing other tenants by fairly scheduling queries between tenants.

The query frontend job accepts gRPC streaming requests from the queriers, which then "pull" requests from the frontend. For HA it is recommended you run multiple frontends - the queriers will connect to (and pull requests from) all of them. To get the benefit of the fair scheduling, it is recommended you run fewer frontends than queriers - two should suffice.

See the document "[Cortex Query Woes](https://docs.google.com/document/d/1lsvSkv0tiAMPQv-V8vI2LZ8f4i9JuTRsuPI_i-XcAqY)" for more details design discussion. In the future, query splitting, query alignment and query results caching will be added to the frontend.

The query frontend is completely optional - you can continue to use the queriers directly. If you want to use the query frontend, direct incoming authenticated traffic at them and set the `-querier.frontend-address` flag on the queriers.

### Queriers

The queriers handled the actual PromQL evaluation. They embed the chunk store client code for fetching data from long-term storage, and communicate with the ingesters for more recent data.
2 changes: 1 addition & 1 deletion pkg/querier/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ type Config struct {
Iterators bool
}

// RegisterFlags adds the flags required to config this to the given FlagSet
// RegisterFlags adds the flags required to config this to the given FlagSet.
func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
f.IntVar(&cfg.MaxConcurrent, "querier.max-concurrent", 20, "The maximum number of concurrent queries.")
f.DurationVar(&cfg.Timeout, "querier.timeout", 2*time.Minute, "The timeout for a query.")
Expand Down
Loading