admission: change store write admission control to use byte tokens

This switch to byte tokens will result in better accounting for large writes, including ingests based on whether their bytes land in L0 or elsewhere. It is also a precursor to taking into account flush capacity (in bytes). The store write admission control path now uses a StoreWorkQueue which wraps a WorkQueue and provides additional functionality: - Work can specify WriteBytes and whether it is an IngestRequest. This is used to decide how many byte tokens to consume. - Done work specifies how many bytes were ingested into L0, so token consumption can be adjusted. The main framework change is that a single work item can consume multiple (byte) tokens, which ripples through the various interfaces including requester, granter. There is associated cleanup: kvGranter that was handling both slots and tokens is eliminated since in practice it was only doing one or the other. Instead, for the slot case the slotGranter is reused. For the token case there is a new kvStoreTokenGranter. The main logic change is in ioLoadListener which computes byte tokens and various estimates. The change is (mostly) neutral if no write provides WriteBytes, since the usual estimation will take over. The integration changes in this PR are superficial in that the requests don't provide WriteBytes. Improvements to the integration, along with experimental results, will happen in future PRs. Informs cockroachdb#79092 Informs cockroachdb#77357 Release note: None
sumeerbhola · Apr 28, 2022 · 81aa87d · 81aa87d
1 parent 7092ff4
commit 81aa87d
Show file tree

Hide file tree

Showing 10 changed files with 1,490 additions and 606 deletions.
diff --git a/pkg/kv/kvserver/store.go b/pkg/kv/kvserver/store.go
@@ -3673,7 +3673,8 @@ var _ KVAdmissionController = KVAdmissionControllerImpl{}
 type admissionHandle struct {
 	tenantID                           roachpb.TenantID
 	callAdmittedWorkDoneOnKVAdmissionQ bool
-	storeAdmissionQ                    *admission.WorkQueue
+	storeAdmissionQ                    *admission.StoreWorkQueue
+	storeWorkHandle                    admission.StoreWorkHandle
 }
 
 // MakeKVAdmissionController returns a KVAdmissionController. Both parameters
@@ -3730,10 +3731,13 @@ func (n KVAdmissionControllerImpl) AdmitKVWork(
 		}
 		admissionEnabled := true
 		if ah.storeAdmissionQ != nil {
-			if admissionEnabled, err = ah.storeAdmissionQ.Admit(ctx, admissionInfo); err != nil {
+			// TODO(sumeer): Plumb WriteBytes for ingest requests.
+			ah.storeWorkHandle, err = ah.storeAdmissionQ.Admit(
+				ctx, admission.StoreWriteWorkInfo{WorkInfo: admissionInfo})
+			if err != nil {
 				return admissionHandle{}, err
 			}
-			if !admissionEnabled {
+			if !ah.storeWorkHandle.AdmissionEnabled() {
 				// Set storeAdmissionQ to nil so that we don't call AdmittedWorkDone
 				// on it. Additionally, the code below will not call
 				// kvAdmissionQ.Admit, and so callAdmittedWorkDoneOnKVAdmissionQ will
@@ -3758,7 +3762,8 @@ func (n KVAdmissionControllerImpl) AdmittedKVWorkDone(handle interface{}) {
 		n.kvAdmissionQ.AdmittedWorkDone(ah.tenantID)
 	}
 	if ah.storeAdmissionQ != nil {
-		ah.storeAdmissionQ.AdmittedWorkDone(ah.tenantID)
+		// TODO(sumeer): Plumb ingestedIntoL0Bytes and handle error return value.
+		_ = ah.storeAdmissionQ.AdmittedWorkDone(ah.storeWorkHandle, 0)
 	}
 }
 

diff --git a/pkg/util/admission/doc.go b/pkg/util/admission/doc.go
@@ -69,33 +69,35 @@
 //   the admission order within a WorkKind based on tenant fairness,
 //   importance of work etc.
 // - granter: the counterpart to requester which grants admission tokens or
-//   slots. The implementations are slotGranter, tokenGranter, kvGranter. The
-//   implementation of requester interacts with the granter interface.
+//   slots. The implementations are slotGranter, tokenGranter,
+//   kvStoreTokenGranter. The implementation of requester interacts with the
+//   granter interface.
 // - granterWithLockedCalls: this is an extension of granter that is used
 //   as part of the implementation of GrantCoordinator. This arrangement
 //   is partly to centralize locking in the GrantCoordinator (except for
 //   the lock in WorkQueue).
 // - cpuOverloadIndicator: this serves as an optional additional gate on
 //   granting, by providing an (ideally) instantaneous signal of cpu overload.
 //   The kvSlotAdjuster is the concrete implementation, except for SQL
-//   nodes, where this is implemented by sqlNodeCPUOverloadIndicator.
+//   nodes, where this will be implemented by sqlNodeCPUOverloadIndicator.
 //   CPULoadListener is also implemented by these structs, to listen to
 //   the latest CPU load information from the scheduler.
 //
-// Load observation and slot count or token burst adjustment: Currently the
-// only dynamic adjustment is performed by kvSlotAdjuster for KVWork slots.
-// This is because KVWork is expected to usually be CPU bound (due to good
-// caching), and unlike SQLKVResponseWork and SQLSQLResponseWork (which are
-// even more CPU bound), we have a completion indicator -- so we can expect to
-// have a somewhat stable KVWork slot count even if the work sizes are
-// extremely heterogeneous.
+// Load observation and slot count or token burst adjustment: Dynamic
+// adjustment is performed by kvSlotAdjuster for KVWork slots. This is because
+// KVWork is expected to usually be CPU bound (due to good caching), and
+// unlike SQLKVResponseWork and SQLSQLResponseWork (which are even more CPU
+// bound), we have a completion indicator -- so we can expect to have a
+// somewhat stable KVWork slot count even if the work sizes are extremely
+// heterogeneous.
 //
-// Since there isn't token burst adjustment, the burst limits should be chosen
-// to err on the side of fully saturating CPU, since we have the fallback of
-// the cpuOverloadIndicator to stop granting even if tokens are available.
-// If we figure out a way to dynamically tune the token burst count, or
-// (even more ambitious) figure out a way to come up with a token rate, it
-// should fit in the general framework that is setup here.
+// There isn't token burst adjustment (except for each store -- see below),
+// and the burst limits should be chosen to err on the side of fully
+// saturating CPU, since we have the fallback of the cpuOverloadIndicator to
+// stop granting even if tokens are available. If we figure out a way to
+// dynamically tune the token burst count, or (even more ambitious) figure out
+// a way to come up with a token rate, it should fit in the general framework
+// that is setup here.
 //
 
 // Partial usage example (regular cluster):
@@ -118,4 +120,8 @@
 // doWork()
 // if enabled { kvQueue.AdmittedWorkDone(tid) }
 
+// Additionally, each store has a single StoreWorkQueue and GrantCoordinator
+// for writes. See kvStoreTokenGranter and how its tokens are dynamically
+// adjusted based on Pebble metrics.
+
 package admission