Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PatrolKeyspaceAssignment got stuck in the middle of process for hours #6628

Closed
binshi-bing opened this issue Jun 16, 2023 · 6 comments · Fixed by #6629
Closed

PatrolKeyspaceAssignment got stuck in the middle of process for hours #6628

binshi-bing opened this issue Jun 16, 2023 · 6 comments · Fixed by #6629
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@binshi-bing
Copy link
Contributor

binshi-bing commented Jun 16, 2023

Enhancement Task

[2023/06/16 20:44:48.545 +00:00] [INFO] [tso_keyspace_group.go:172] ["enter into SplitKeyspaceGroupByID !!!!!!!!!!!!!!!!"]
[2023/06/16 20:44:48.545 +00:00] [INFO] [tso_keyspace_group.go:192] ["SplitKeyspaceGroupByID splitParams.Keyspaces !!!!!!!!!!!!!!!!"] [splitParams.Keyspaces="[4000]"]
[2023/06/16 20:44:48.545 +00:00] [INFO] [keyspace.go:673] ["SplitKeyspaceGroupByID after PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"]
[2023/06/16 20:44:48.545 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=0]
[2023/06/16 20:44:48.561 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=120]
[2023/06/16 20:44:48.572 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=240]
[2023/06/16 20:44:48.599 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=360]
...
[2023/06/16 20:44:48.843 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=3803]
[2023/06/16 20:44:48.853 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=4008]

There are more keyspaces to go, but it neither proceed nor exit from the assignment loop.

This might be related -- the keyspaces aren't consequtives.

Environment

dev env (dev-us-east-1-f01), namespace(tidb-serverless)

Root Cause Analysis

It's observed that it's blocked on manager.metaLock.Lock(keyspace.Id).

load 128 keyspaces (meta) from etcd (/pd/{cluster id}/keyspaces/meta/)
for _, keyspace := range keyspaces {
    ...
    manager.metaLock.Lock(keyspace.Id)
    ...

metaLock is created with syncutil.NewLockGroup(syncutil.WithHash(MaskKeyspaceID)),

// MaskKeyspaceID is used to hash the spaceID inside the lockGroup.
// A simple mask is applied to spaceID to use its last byte as map key,
// limiting the maximum map length to 256.
// Since keyspaceID is sequentially allocated, this can also reduce the chance
// of collision when comparing with random hashes.
func MaskKeyspaceID(id uint32) uint32 {
return id & 0xFF
}

Which means that if two keyspace ids in the loaded batch have the same last byte, it encounters deadlock. In our case, the keyspace ids loaded from keyspace meta storage isn't consecutive -- though the keyspace id is created sequentially, the stored keyspace meta might be expired or deleted as time goes.

There are holes in the keyspace range in keyspace meta space:

...
/pd/7187976276065784319/keyspaces/meta/00000862
/pd/7187976276065784319/keyspaces/meta/00000863
/pd/7187976276065784319/keyspaces/meta/00000864
/pd/7187976276065784319/keyspaces/meta/00000880
/pd/7187976276065784319/keyspaces/meta/00000899
/pd/7187976276065784319/keyspaces/meta/00000920
/pd/7187976276065784319/keyspaces/meta/00000921
...

In our case, the process stuck at MetaLock.Lock(ks-id=4520) which has the same last byte 0xA8 as the first keyspace id 4008 in this batch, because MetaLock.Lock(ks-id=4008) already acquired.

@binshi-bing binshi-bing added the type/enhancement The issue or PR belongs to an enhancement. label Jun 16, 2023
@binshi-bing
Copy link
Contributor Author

The problem seems to be that when it got stuck when trying to load keyspaces from the range [4008, 4127]

@binshi-bing binshi-bing changed the title PatrolKeyspaceAssignment got stuck in the middle of proces PatrolKeyspaceAssignment got stuck in the middle of proces for hours Jun 16, 2023
@binshi-bing binshi-bing changed the title PatrolKeyspaceAssignment got stuck in the middle of proces for hours PatrolKeyspaceAssignment got stuck in the middle of process for hours Jun 16, 2023
@binshi-bing
Copy link
Contributor Author

[2023/06/16 22:52:11.305 +00:00] [INFO] [keyspace.go:704] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment after LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [current-start-id=3116] [load-keyspaces-len=120] [keyspace-num=120] [current-start-id=3116] [next-start-id=3236] [more-to-patrol=true]
[2023/06/16 22:52:11.305 +00:00] [INFO] [keyspace.go:766] ["SplitKeyspaceGroupByID before SaveKeyspaceGroup !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3000] [assigned=false]
[2023/06/16 22:52:11.306 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment before start transaction !!!!!!!!!!!!!!!!"] [next-start-id=3236]
[2023/06/16 22:52:11.308 +00:00] [INFO] [keyspace.go:689] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment before LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [next-start-id=3236]
[2023/06/16 22:52:11.313 +00:00] [INFO] [keyspace.go:704] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment after LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [current-start-id=3236] [load-keyspaces-len=120] [keyspace-num=120] [current-start-id=3236] [next-start-id=3356] [more-to-patrol=true]
[2023/06/16 22:52:11.313 +00:00] [INFO] [keyspace.go:766] ["SplitKeyspaceGroupByID before SaveKeyspaceGroup !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3120] [assigned=false]
[2023/06/16 22:52:11.314 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment before start transaction !!!!!!!!!!!!!!!!"] [next-start-id=3356]
[2023/06/16 22:52:11.317 +00:00] [INFO] [keyspace.go:689] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment before LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [next-start-id=3356]
[2023/06/16 22:52:11.321 +00:00] [INFO] [keyspace.go:704] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment after LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [current-start-id=3356] [load-keyspaces-len=120] [keyspace-num=120] [current-start-id=3356] [next-start-id=3476] [more-to-patrol=true]
[2023/06/16 22:52:11.322 +00:00] [INFO] [keyspace.go:766] ["SplitKeyspaceGroupByID before SaveKeyspaceGroup !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3240] [assigned=false]
[2023/06/16 22:52:11.322 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment before start transaction !!!!!!!!!!!!!!!!"] [next-start-id=3476]
[2023/06/16 22:52:11.325 +00:00] [INFO] [keyspace.go:689] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment before LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [next-start-id=3476]
[2023/06/16 22:52:11.330 +00:00] [INFO] [keyspace.go:704] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment after LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [current-start-id=3476] [load-keyspaces-len=120] [keyspace-num=120] [current-start-id=3476] [next-start-id=3683] [more-to-patrol=true]
[2023/06/16 22:52:11.330 +00:00] [INFO] [keyspace.go:766] ["SplitKeyspaceGroupByID before SaveKeyspaceGroup !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3360] [assigned=false]
[2023/06/16 22:52:11.331 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment before start transaction !!!!!!!!!!!!!!!!"] [next-start-id=3683]
[2023/06/16 22:52:11.333 +00:00] [INFO] [keyspace.go:689] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment before LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [next-start-id=3683]
[2023/06/16 22:52:11.338 +00:00] [INFO] [keyspace.go:704] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment after LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [current-start-id=3683] [load-keyspaces-len=120] [keyspace-num=120] [current-start-id=3683] [next-start-id=3803] [more-to-patrol=true]
[2023/06/16 22:52:11.338 +00:00] [INFO] [keyspace.go:766] ["SplitKeyspaceGroupByID before SaveKeyspaceGroup !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3480] [assigned=false]
[2023/06/16 22:52:11.339 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment before start transaction !!!!!!!!!!!!!!!!"] [next-start-id=3803]
[2023/06/16 22:52:11.342 +00:00] [INFO] [keyspace.go:689] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment before LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [next-start-id=3803]
[2023/06/16 22:52:11.346 +00:00] [INFO] [keyspace.go:704] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment after LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [current-start-id=3803] [load-keyspaces-len=120] [keyspace-num=120] [current-start-id=3803] [next-start-id=4008] [more-to-patrol=true]
[2023/06/16 22:52:11.346 +00:00] [INFO] [keyspace.go:766] ["SplitKeyspaceGroupByID before SaveKeyspaceGroup !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3600] [assigned=false]
[2023/06/16 22:52:11.347 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment before start transaction !!!!!!!!!!!!!!!!"] [next-start-id=4008]
[2023/06/16 22:52:11.350 +00:00] [INFO] [keyspace.go:689] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment before LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [next-start-id=4008]
[2023/06/16 22:52:11.354 +00:00] [INFO] [keyspace.go:704] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment after LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [current-start-id=4008] [load-keyspaces-len=120] [keyspace-num=120] [current-start-id=4008] [next-start-id=4836] [more-to-patrol=true]
[2023/06/16 22:52:11.355 +00:00] [INFO] [keyspace.go:737] ["SplitKeyspaceGroupByID before update defaultKeyspaceGroup.Keyspaces !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3610]
[2023/06/16 22:52:11.355 +00:00] [INFO] [keyspace.go:747] ["SplitKeyspaceGroupByID before SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3610]
[2023/06/16 22:52:11.355 +00:00] [INFO] [keyspace.go:753] ["SplitKeyspaceGroupByID after SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3610]
[2023/06/16 22:52:11.355 +00:00] [INFO] [keyspace.go:737] ["SplitKeyspaceGroupByID before update defaultKeyspaceGroup.Keyspaces !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3620]
[2023/06/16 22:52:11.355 +00:00] [INFO] [keyspace.go:747] ["SplitKeyspaceGroupByID before SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3620]
[2023/06/16 22:52:11.355 +00:00] [INFO] [keyspace.go:753] ["SplitKeyspaceGroupByID after SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3620]
[2023/06/16 22:52:11.355 +00:00] [INFO] [keyspace.go:737] ["SplitKeyspaceGroupByID before update defaultKeyspaceGroup.Keyspaces !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3630]
[2023/06/16 22:52:11.355 +00:00] [INFO] [keyspace.go:747] ["SplitKeyspaceGroupByID before SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3630]
[2023/06/16 22:52:11.355 +00:00] [INFO] [keyspace.go:753] ["SplitKeyspaceGroupByID after SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3630]
[2023/06/16 22:52:11.355 +00:00] [INFO] [keyspace.go:737] ["SplitKeyspaceGroupByID before update defaultKeyspaceGroup.Keyspaces !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3640]
[2023/06/16 22:52:11.355 +00:00] [INFO] [keyspace.go:747] ["SplitKeyspaceGroupByID before SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3640]
[2023/06/16 22:52:11.355 +00:00] [INFO] [keyspace.go:753] ["SplitKeyspaceGroupByID after SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3640]
[2023/06/16 22:52:11.355 +00:00] [INFO] [keyspace.go:737] ["SplitKeyspaceGroupByID before update defaultKeyspaceGroup.Keyspaces !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3650]
[2023/06/16 22:52:11.355 +00:00] [INFO] [keyspace.go:747] ["SplitKeyspaceGroupByID before SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3650]
[2023/06/16 22:52:11.355 +00:00] [INFO] [keyspace.go:753] ["SplitKeyspaceGroupByID after SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3650]

@binshi-bing
Copy link
Contributor Author

binshi-bing commented Jun 16, 2023

goroutine 214735 [semacquire]:
sync.runtime_SemacquireMutex(0x400a7188b8?, 0x14?, 0x400a718908?)
/usr/local/go/src/runtime/sema.go:77 +0x24
sync.(*Mutex).lockSlow(0x4011a859c8)
/usr/local/go/src/sync/mutex.go:171 +0x178
sync.(*Mutex).Lock(...)
/usr/local/go/src/sync/mutex.go:90
github.com/tikv/pd/pkg/utils/syncutil.(*LockGroup).Lock(0x4004fa84f8, 0x38ec7d0?)
/go/src/github.com/tikv/pd/pkg/utils/syncutil/lock_group.go:63 +0x1c8
github.com/tikv/pd/pkg/keyspace.(*Manager).PatrolKeyspaceAssignment.func2({0x38ec7d0, 0x40199545f0})
/go/src/github.com/tikv/pd/pkg/keyspace/keyspace.go:726 +0x6d0
github.com/tikv/pd/pkg/storage/kv.(*etcdKVBase).RunInTxn(0x4004fa8168, {0x38ec1e8?, 0x4000522b40}, 0x401ccd8cc0)
/go/src/github.com/tikv/pd/pkg/storage/kv/etcd_kv.go:209 +0xac
github.com/tikv/pd/pkg/storage/endpoint.(*StorageEndpoint).RunInTxn(0x40?, {0x38ec1e8?, 0x4000522b40?}, 0x40001df800?)
/go/src/github.com/tikv/pd/pkg/storage/endpoint/keyspace.go:102 +0x30
github.com/tikv/pd/pkg/keyspace.(*Manager).PatrolKeyspaceAssignment(0x40004cca80)
/go/src/github.com/tikv/pd/pkg/keyspace/keyspace.go:677 +0x318

It's blocked on manager.metaLock.Lock(ks.Id) below.


// PatrolKeyspaceAssignment is used to patrol all keyspaces and assign them to the keyspace groups.
func (manager *Manager) PatrolKeyspaceAssignment() error {
	var (
		// Some statistics info.
		start                  = time.Now()
		patrolledKeyspaceCount uint64
		assignedKeyspaceCount  uint64
		// The current start ID of the patrol, used for logging.
		currentStartID = manager.nextPatrolStartID
		// The next start ID of the patrol, used for the next patrol.
		nextStartID  = currentStartID
		moreToPatrol = true
		err          error
	)
	defer func() {
		log.Info("[keyspace] patrol keyspace assignment finished !!!!!!!!!!!!!!!!",
			zap.Duration("cost", time.Since(start)),
			zap.Uint64("patrolled-keyspace-count", patrolledKeyspaceCount),
			zap.Uint64("assigned-keyspace-count", assignedKeyspaceCount),
			zap.Int("batch-size", keyspacePatrolBatchSize),
			zap.Uint32("current-start-id", currentStartID),
			zap.Uint32("next-start-id", nextStartID),
		)
	}()
	log.Info("SplitKeyspaceGroupByID after PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!")
	for moreToPatrol {
		var defaultKeyspaceGroup *endpoint.KeyspaceGroup
		log.Info("SplitKeyspaceGroupByID in PatrolKeyspaceAssignment before start transaction !!!!!!!!!!!!!!!!", zap.Uint32("next-start-id", nextStartID))
		err = manager.store.RunInTxn(manager.ctx, func(txn kv.Txn) error {
			var err error
			defaultKeyspaceGroup, err = manager.kgm.store.LoadKeyspaceGroup(txn, utils.DefaultKeyspaceGroupID)
			if err != nil {
				return err
			}
			if defaultKeyspaceGroup == nil {
				return errors.Errorf("default keyspace group %d not found", utils.DefaultKeyspaceGroupID)
			}
			if defaultKeyspaceGroup.IsSplitting() {
				return ErrKeyspaceGroupInSplit
			}
			log.Info("SplitKeyspaceGroupByID in PatrolKeyspaceAssignment before LoadRangeKeyspace !!!!!!!!!!!!!!!!",
				zap.Uint32("next-start-id", manager.nextPatrolStartID))
			keyspaces, err := manager.store.LoadRangeKeyspace(txn, manager.nextPatrolStartID, keyspacePatrolBatchSize)
			if err != nil {
				return err
			}
			keyspaceNum := len(keyspaces)
			// If there are more than one keyspace, update the current and next start IDs.
			if keyspaceNum > 0 {
				currentStartID = keyspaces[0].GetId()
				nextStartID = keyspaces[keyspaceNum-1].GetId() + 1
			}
			// If there are less than `keyspacePatrolBatchSize` keyspaces,
			// we have reached the end of the keyspace list.
			moreToPatrol = keyspaceNum == keyspacePatrolBatchSize
			log.Info("SplitKeyspaceGroupByID in PatrolKeyspaceAssignment after LoadRangeKeyspace !!!!!!!!!!!!!!!!",
				zap.Uint32("current-start-id", manager.nextPatrolStartID),
				zap.Int("load-keyspaces-len", len(keyspaces)),
				zap.Int("keyspace-num", keyspaceNum),
				zap.Uint32("current-start-id", currentStartID),
				zap.Uint32("next-start-id", nextStartID),
				zap.Bool("more-to-patrol", moreToPatrol),
			)
			var (
				assigned            = false
				keyspaceIDsToUnlock = make([]uint32, 0, keyspaceNum)
			)
			defer func() {
				for _, id := range keyspaceIDsToUnlock {
					manager.metaLock.Unlock(id)
				}
			}()
			for _, ks := range keyspaces {
				if ks == nil {
					continue
				}
				patrolledKeyspaceCount++
				**manager.metaLock.Lock(ks.Id)**
				if ks.Config == nil {
					ks.Config = make(map[string]string, 1)
				} else if _, ok := ks.Config[TSOKeyspaceGroupIDKey]; ok {
					// If the keyspace already has a group ID, skip it.
					manager.metaLock.Unlock(ks.Id)
					continue
				}
				// Unlock the keyspace meta lock after the whole txn.
				keyspaceIDsToUnlock = append(keyspaceIDsToUnlock, ks.Id)
				if (patrolledKeyspaceCount % 10) == 0 {
					log.Info("SplitKeyspaceGroupByID before update defaultKeyspaceGroup.Keyspaces !!!!!!!!!!!!!!!!",
						zap.Uint64("patrolled-keyspace-count", patrolledKeyspaceCount))
				}
				// If the keyspace doesn't have a group ID, assign it to the default keyspace group.
				if !slice.Contains(defaultKeyspaceGroup.Keyspaces, ks.Id) {
					defaultKeyspaceGroup.Keyspaces = append(defaultKeyspaceGroup.Keyspaces, ks.Id)
					// Only save the keyspace group meta if any keyspace is assigned to it.
					assigned = true
				}
				if (patrolledKeyspaceCount % 10) == 0 {
					log.Info("SplitKeyspaceGroupByID before SaveKeyspaceMeta !!!!!!!!!!!!!!!!",
						zap.Uint64("patrolled-keyspace-count", patrolledKeyspaceCount))
				}
				ks.Config[TSOKeyspaceGroupIDKey] = strconv.FormatUint(uint64(utils.DefaultKeyspaceGroupID), 10)
				err = manager.store.SaveKeyspaceMeta(txn, ks)
				if (patrolledKeyspaceCount % 10) == 0 {
					log.Info("SplitKeyspaceGroupByID after SaveKeyspaceMeta !!!!!!!!!!!!!!!!",
						zap.Uint64("patrolled-keyspace-count", patrolledKeyspaceCount))
				}
				if err != nil {
					log.Error("[keyspace] failed to save keyspace meta during patrol",
						zap.Int("batch-size", keyspacePatrolBatchSize),
						zap.Uint32("current-start-id", currentStartID),
						zap.Uint32("next-start-id", nextStartID),
						zap.Uint32("keyspace-id", ks.Id), zap.Error(err))
					return err
				}
				assignedKeyspaceCount++
			}
			log.Info("SplitKeyspaceGroupByID before SaveKeyspaceGroup !!!!!!!!!!!!!!!!",
				zap.Uint64("patrolled-keyspace-count", patrolledKeyspaceCount),
				zap.Bool("assigned", assigned))
			if assigned {
				err = manager.kgm.store.SaveKeyspaceGroup(txn, defaultKeyspaceGroup)
				if err != nil {
					log.Error("[keyspace] failed to save default keyspace group meta during patrol",
						zap.Int("batch-size", keyspacePatrolBatchSize),
						zap.Uint32("current-start-id", currentStartID),
						zap.Uint32("next-start-id", nextStartID), zap.Error(err))
					return err
				}
			}
			return nil
		})
		if err != nil {
			return err
		}
		manager.kgm.groups[endpoint.StringUserKind(defaultKeyspaceGroup.UserKind)].Put(defaultKeyspaceGroup)
		// If all keyspaces in the current batch are assigned, update the next start ID.
		manager.nextPatrolStartID = nextStartID
	}
	log.Info("SplitKeyspaceGroupByID complete PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!", zap.Uint32("next-start-id", nextStartID))
	return nil
}

@binshi-bing
Copy link
Contributor Author

The keyspace ids aren't consecutive.
...
/pd/7187976276065784319/keyspaces/meta/00000516

/pd/7187976276065784319/keyspaces/meta/00000517

/pd/7187976276065784319/keyspaces/meta/00000601

/pd/7187976276065784319/keyspaces/meta/00000602
...

@binshi-bing
Copy link
Contributor Author

There are more holes in the keyspace range:
/pd/7187976276065784319/keyspaces/meta/00000862

/pd/7187976276065784319/keyspaces/meta/00000863

/pd/7187976276065784319/keyspaces/meta/00000864

/pd/7187976276065784319/keyspaces/meta/00000880

/pd/7187976276065784319/keyspaces/meta/00000899

/pd/7187976276065784319/keyspaces/meta/00000920

/pd/7187976276065784319/keyspaces/meta/00000921

@binshi-bing
Copy link
Contributor Author

The process stuck at MetaLock.Lock(ks-id=4520) which has the same last byte 0xA8 as the first keyspace id 4008 in this batch, because MetaLock.Lock(ks-id=4008) already acquired.

[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:704] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment after LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [current-start-id=4008] [load-keyspaces-len=120] [keyspace-num=120] [current-start-id=4008] [next-start-id=4836] [more-to-patrol=true]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4008]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4009]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4010]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4011]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4012]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4013]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4014]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4015]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4016]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4017]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:739] ["SplitKeyspaceGroupByID before update defaultKeyspaceGroup.Keyspaces !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3610]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:749] ["SplitKeyspaceGroupByID before SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3610]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:755] ["SplitKeyspaceGroupByID after SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3610]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4018]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4019]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4020]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4101]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4102]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4201]
[2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4202]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4203]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4204]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4205]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:739] ["SplitKeyspaceGroupByID before update defaultKeyspaceGroup.Keyspaces !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3620]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:749] ["SplitKeyspaceGroupByID before SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3620]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:755] ["SplitKeyspaceGroupByID after SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3620]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4206]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4301]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4401]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4402]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4403]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4404]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4405]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4406]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4407]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4408]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:739] ["SplitKeyspaceGroupByID before update defaultKeyspaceGroup.Keyspaces !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3630]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:749] ["SplitKeyspaceGroupByID before SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3630]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:755] ["SplitKeyspaceGroupByID after SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3630]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4409]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4410]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4411]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4412]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4501]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4502]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4503]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4504]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4505]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4506]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:739] ["SplitKeyspaceGroupByID before update defaultKeyspaceGroup.Keyspaces !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3640]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:749] ["SplitKeyspaceGroupByID before SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3640]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:755] ["SplitKeyspaceGroupByID after SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3640]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4507]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4508]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4509]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4510]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4511]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4512]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4513]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4514]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4515]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4516]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:739] ["SplitKeyspaceGroupByID before update defaultKeyspaceGroup.Keyspaces !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3650]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:749] ["SplitKeyspaceGroupByID before SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3650]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:755] ["SplitKeyspaceGroupByID after SaveKeyspaceMeta !!!!!!!!!!!!!!!!"] [patrolled-keyspace-count=3650]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4517]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4518]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4519]
[2023/06/16 23:54:08.241 +00:00] [INFO] [keyspace.go:726] ["SplitKeyspaceGroupByID before meta lock !!!!!!!!!!!!!!!!"] [ks-id=4520]

ti-chi-bot bot pushed a commit that referenced this issue Jun 19, 2023
close #6628

Enhance LockGroup with RemoveEntryOnUnlock.
Remove the lock of the given key from the lock group when unlock to keep minimal working set, which is suited for low qps, non-time-critical and non-consecutive large key space scenarios. One example of the last use case is that keyspace group split loads non-consecutive keyspace meta in batches and lock all loaded keyspace meta within a batch at the same time.

Signed-off-by: Bin Shi <binshi.bing@gmail.com>
rleungx pushed a commit to rleungx/pd that referenced this issue Aug 2, 2023
…#6629)

close tikv#6628

Enhance LockGroup with RemoveEntryOnUnlock.
Remove the lock of the given key from the lock group when unlock to keep minimal working set, which is suited for low qps, non-time-critical and non-consecutive large key space scenarios. One example of the last use case is that keyspace group split loads non-consecutive keyspace meta in batches and lock all loaded keyspace meta within a batch at the same time.

Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant