-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate js-db and concurrency testing #419
Conversation
fb83c35
to
1974f1e
Compare
👇 Click on the image for a new way to code review
Legend |
For better concurrency testing, as well as |
1a872fe
to
716bcd3
Compare
Rebased on master. This has now jest-extended, fast-check, DB 5.0.1. |
This means the
We're using the last parameter as the optional argument here... but it may be rolled into a general But this is the design I've found that works as this also allows the domain objects to be used atomically as each method will create their own transaction context level. |
@tegefaulkes @emmacasolin this PR is an exploration of the concepts of queue, db and tracing. Several PRs are likely to be split off from this PR which will target staging. I'll be setting up a framework from this PR, and then there will be pieces of work sliced out.
|
389e06a
to
05ff8b0
Compare
By using Doing this currently with the timestamped level for the tasks. |
When presenting task ids to the end user, in a similar vein to claim ids, we have to show multibase encoded IDs, as that's the only way for users to be able to easily copy-paste them around. But they are decoded to regular
|
I'm using |
This will be the PR where we do both the DB integration and Queue/Scheduler experiments initially, then depending on whichever one is ready, they can be separated out to be merged in. |
As discussed, I'll separate the changes here into two PRs, The queue changes will move into a new branch |
88c2ec7
to
3c1b8c8
Compare
You can separate the changes already here into several smaller commits too like introducing jest-extended, and moving |
Don't forget to rebase on staging too. We are then making a stacked PR from this. |
Summary of changes that need to be made for DB 5.x.x integration.
|
For all the places where you are using It's only when multiple concurrent calls are likely, as in the creation/destruction of non-singleton instances, and during method invocations likely triggered from multiple RPC handlers. |
Also make sure you're that you are sharing the same |
Wherever it is relevant, we can always add locks to ensure serialisation. In fact one could say that all handlers are serialised in some special way, but it is often too complex, and something that we want to minimise to where it is needed. Instead we are going to bubble up the transaction conflict up the handlers. Then we can then come down later from the PK client and add auto-retry where appropriate. |
So If I understand the usage of The example above is for renaming vaults were we update vaultId->vaultName, vaultName->vaultId and delete oldVaultName. Another example is the ACL where we have tight relationships between vaultId -> [nodeId, nodeId] --> permId -> permRef. So we'd want to use |
What's an example of a ACL write skew? But yes that's the general idea. Have a read on write skew on wikipedia again to refresh memory. It also depends on whether there are any consistency constraints that are being violated here. This is kind of why I wanted SSI to avoid this problem entirely since it is a complicated edge case. |
For ACL there are two aspects.
Here is a code snippet. /**
* Gets the record of node ids to permission for a given vault id
* The node ids in the record each represent a unique gestalt
* If there are no permissions, then an empty record is returned
*/
@ready(new aclErrors.ErrorACLNotRunning())
public async getVaultPerm(
vaultId: VaultId,
tran?: DBTransaction,
): Promise<Record<NodeId, Permission>> {
if (tran == null) {
return this.db.withTransactionF((tran) =>
this.getVaultPerm(vaultId, tran),
);
}
const nodeIds = await tran.get<Record<NodeId, null>>(
[...this.aclVaultsDbPath, vaultId.toBuffer()],
false,
);
if (nodeIds == null) {
return {};
}
const perms: Record<NodeId, Permission> = {};
const nodeIdsGc: Set<NodeId> = new Set();
for (const nodeIdString in nodeIds) {
const nodeId: NodeId = IdInternal.fromString(nodeIdString);
const permId = await tran.get(
[...this.aclNodesDbPath, nodeId.toBuffer()],
true,
);
if (permId == null) {
// Invalid node id
nodeIdsGc.add(nodeId);
continue;
}
const permRef = (await tran.get(
[...this.aclPermsDbPath, permId],
false,
)) as Ref<Permission>;
if (!(vaultId in permRef.object.vaults)) {
// Vault id is missing from the perm
nodeIdsGc.add(nodeId);
continue;
}
perms[nodeId] = permRef.object;
}
if (nodeIdsGc.size > 0) {
// Remove invalid node ids
for (const nodeId of nodeIdsGc) {
delete nodeIds[nodeId];
}
await tran.put(
[...this.aclVaultsDbPath, vaultId.toBuffer()],
nodeIds,
false,
);
}
return perms;
} |
Ok we have identified where write skews may exist:
If it is meant to be a 1:1 relationship, then this application constraint is being violated. It's not a transaction conflict. Solve by adding a lock to force serialisation. Then by adding a This applies to assigning a vault name to a vault id, and also creating a new claim on the sigchain, as the sigchain must be linear. |
I think this is good now but we'll have to see what the CI shows us. If it's good then I can start squashing and prepping to merge. |
This includes using `getForUpdate` for any counter updates.
32ecc60
to
1b109d9
Compare
Should be good to merge now. |
CI failures, a timeout and a transaction conflict error. |
Does this PR actually resolve #244? |
For #244, the main idea is that |
Long term we should remove all the object map locks in favour of the transaction locks. This is because locks don't compose, and all the object maps are just representations of the source of truth on disk. Ideally we should use the Generally I reckon we shouldn't have any operations that is mutating the in-memory state without also involving the underlying disk, so there shouldn't be a situation where they can't coordinate with the transaction locks. The transaction locks are far more useful generally because they are re-entrant, and they don't release until transaction destruction. Issue created: #443 |
I reckon we still need to fix that the key paths used for locking and also investigate why:
This is taking longer than 20s. Seems like a possible deadlock. |
Actually I saw that other places also take about 20 seconds there. So I guess it's not a big deal. So we can merge without addressing this for now. We got lots of things like queues, tasks, cancellation that is coming in to help with managing our asynchronous timelines and resources. |
Description
This integrates the new js-db 5.0.0 which brings in rocksdb. It now has correct implementation of optimistic transactions using Snapshot Isolation.
In bringing this, all of our concurrency issues should start with a correct foundation. This means any race conflict at the DB layer will throw an
ErrorDBTransactionConflict
exception. These exceptions will need to either be handled with an automatic retry where it is appropriate, or when it is not appropriate, it should be bubbled all the way the user interface (CLI and GUI) and the user can decide what to do.Write skews can be resolved with new methods
getForUpdate
which materialises the read-write conflict into a write-write conflict. Where these may occur, transaction locks should also be used.The
DBTransaction
supports both optimistic transactions and pessimistic transactions in the same transaction object. Locking within the transaction should only be done in order to guard against write-skew thrashing (like counter races).Because the
DBTransaction
has its own lock, and most operations should be done with respect to the DB, most usages of in-memory locking can be replaced with reifying it to the database. This means some in-memory constructs will be replaced with in-DB structures instead.Transaction mutual exclusion should start at the boundary of the program. This means at the GRPC service request handler level. A transaction will be created there, and this context will be shared by the transitive closure of all nested functions and methods. This should be sufficient for most usages.
The EFS will continue to use the old leveldb until it also transitions to using the new 5.0.0 js-db. The EFS runs its own database atm, so it's independent of the main PK database. Conversion of EFS to using the new db will be a separate PR, it requires separate testing because it has to remove its own locking in favour of using transaction-native locking.
While integrating this new DB, we will also start experimenting with more formal model-checking tests against our codebase in order to test concurrent phenomena.
Issues Fixed
NodeGraph
bucket operations #244Tasks
if(tran == null) db.withTranF/G
blocks. usetran => function()
to avoid code bloat. For generators where possible just pass the generator down with a arrow function. Generally try and avoid wrapping it in a withTransactionG if possible.ACL
andSigchain
implementation ofwithTransactionF
with the db version of it. TheSigchain
version needs to use the transaction's locking now.ts-ignore
comments for logger usage.Final checklist