Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blobstore: Add metrics_tag config field #205

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

minor-fixes
Copy link
Contributor

@minor-fixes minor-fixes commented May 11, 2024

This change adds a single string field to the base blobstore config message, the value of which gets propagated to a tag value on exported metrics for the corresponding backend instance.

In particular, this change allows for attaching different tags on separate uses of a single "label" backend in order to collect metrics specific to a particular instantiation. Without this change, all accesses to the referenced backend shared one combined set of metrics; with this change, it's possible to set separate metrics_tag values on each instantiation and collect separate metrics, as well as tagging the underlying backend and grabbing aggregate metrics as well.

TODO before marking non-draft:

  • pass presubmit checks
  • drop spurious bzlmod changes
  • format code
  • second pass on proto comments

Tested: see comment below

@minor-fixes
Copy link
Contributor Author

With the config attached below, I started a bb_storage instance, and issued 7 RPCs:

  • 1x "access through readCaching": grpc_cli call localhost:8980 ContentAddressableStorage.FindMissingBlobs 'instance_name: "" digest_function: 1 blob_digests { hash: "aed662c522d64bb9e73534f4cd9cd0d98c8407e737689f1a591e4409f4d9e0ce" size_bytes: 5 }'
  • 2x "access slow directly": grpc_cli call localhost:8980 ContentAddressableStorage.FindMissingBlobs 'instance_name: "slow" digest_function: 1 blob_digests { hash: "aed662c522d64bb9e73534f4cd9cd0d98c8407e737689f1a591e4409f4d9e0ce" size_bytes: 5 }'
  • 4x "access fast directly": grpc_cli call localhost:8980 ContentAddressableStorage.FindMissingBlobs 'instance_name: "fast" digest_function: 1 blob_digests { hash: "aed662c522d64bb9e73534f4cd9cd0d98c8407e737689f1a591e4409f4d9e0ce" size_bytes: 5 }'

The counts for the respective tags were:

  • top: 7 (sum of all RPCs)
  • read_cached: 1 (only one call to the readCaching backend)
  • slow_underlying: 3 (1 call to readCaching + 2 calls to 'slow' directly via instance name)
  • fast_underlying: 4 (0 calls to readCaching? + 4 calls to 'fast' directly via instance name)
  • slow_direct: 2 (counts only direct calls via instance name)
  • slow_cached: 1 (only one call to the readCaching backend)
  • fast_direct: 4 (counts only direct calls via instance name)
  • fast_cached: 0 (presumably; could not find metrics)

I wouldn't mind adding an appropriate test though; any ideas on a better testing approach?


Config:

local SmallCache = function(tag) {
  metricsTag: tag,
  'local': {
    keyLocationMapInMemory: { entries: 1000 },
    keyLocationMapMaximumGetAttempts: 8,
    keyLocationMapMaximumPutAttempts: 32,
    oldBlocks: 2,
    currentBlocks: 1,
    newBlocks: 1,
    blocksInMemory: {
      blockSizeBytes: 10 * 1000 * 1000,
    },
  },
};

{
  executeAuthorizer: { allow: {} },
  grpcServers: [
    {
      listenAddresses: ["127.0.0.1:8980"],
      authenticationPolicy: { allow: {} },
    },
  ],
  global: {
    diagnosticsHttpServer: {
      httpServers: [{ listen_addresses: ["127.0.0.1:9980"], authenticationPolicy: { allow: {} } }],
      enablePrometheus: true,
    },
  },
  contentAddressableStorage: {
    getAuthorizer: { allow: {} },
    putAuthorizer: { allow: {} },
    findMissingAuthorizer: { allow: {} },
    backend: {
      withLabels: {
        backend: {
          metricsTag: 'top',
          demultiplexing: {
            instanceNamePrefixes: {
              '': {
                backend: {
                  metricsTag: 'read_cached',
                  readCaching: {
                    slow: { label: 'slow', metricsTag: 'slow_cached' },
                    fast: { label: 'fast', metricsTag: 'fast_cached' },
                    replicator: { 'local': {} },
                  },
                },
              },
              'slow': {
                backend: {
                  label: 'slow',
                  metricsTag: 'slow_direct',
                },
              },
              'fast': {
                backend: {
                  label: 'fast',
                  metricsTag: 'fast_direct',
                },
              },
            },
          },
        },
        labels: {
          'fast': SmallCache('fast_underlying'),
          'slow': SmallCache('slow_underlying'),
        },
      },
    },
  },
}

@minor-fixes minor-fixes force-pushed the metrics_tag branch 3 times, most recently from fc91d76 to 2e717ee Compare May 12, 2024 23:11
WIP; more description to come
@EdSchouten
Copy link
Member

Instead of actually having a separate metrics_tag, what are your thoughts on reusing the existing with_label / label facility? As in, make it so that using with_label causes a label="..." to be added to any BlobAccess underneath.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants