Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New meta API #5012

Closed
BohuTANG opened this issue Apr 23, 2022 · 18 comments
Closed

New meta API #5012

BohuTANG opened this issue Apr 23, 2022 · 18 comments
Assignees
Labels
A-meta Area: databend meta serive

Comments

@BohuTANG
Copy link
Member

BohuTANG commented Apr 23, 2022

Summary

Consider moving the meta API to KV api, we have the follows operations:

  • Create a Database <db_name>
1. Begin transaction
2. Check the database by <__fd_databases/<tenant_id>/<db_name>
    2.1 If exists return error
    2.2 Else get a <database_id>
    2.3 Write <<__fd_databases/<tenant_id>/<db_name>, <value>> to KV
3. Commit
  • Drop a Database <db_name>
1. Begin transaction
2. Check the database by <__fd_databases/<tenant_id>/<db_name>
    2.1 If exists, get the <database_id> from the value
    2.2 Else return error
    2.3 Delete the record by the prefix `__fd_databases/<tenant_id>/<db_name>`
    2.4 Delete all the table record by the prefix `__fd_tables/<tenant_id>/<database_id>`
3. Commit
  • Alter a Database From <db_name> to <new_db_name>
1. Begin transaction
2. Check the database by <__fd_databases/<tenant_id>/<db_name>
    3.1 If exists, get the <database_id> from value
    3.2 Else return error
    3.3 Check the <new_db_name> exists or not
    3.3 Update the key: `__fd_databases/<tenant_id>/<new_db_name>`
4. Commit
  • Create a Table <table_name>
1. Begin transaction
2. Check the database by <__fd_databases/<tenant_id>/<db_name>
3. Get the <database_id>
4. Check the table name <table_name> is exists or not
    4.1 If exists return error
    4.2 Else get a <table_id>
    4.3 Write <<__fd_tables/<tenant_id>/<database_id>/<table_id>, <value>> to KV
5. Commit
  • Drop a Table <table_name>
1. Begin transaction
2. Check the database by <__fd_databases/<tenant_id>/<db_name>
3. Get the <database_id>
4. Check the table name <table_name> is exists or not
    4.1 If exists get the <table_id>
    4.2 Else return error
    4.3 Delete <<__fd_tables/<tenant_id>/<database_id>/<table_id>>
5. Commit

@drmingdrmer @lichuang Help to see if there is something wrong with these logic above? And would like to know what is the underlying logic of current meta API.

@BohuTANG BohuTANG added the A-meta Area: databend meta serive label Apr 23, 2022
@drmingdrmer
Copy link
Member

In the current impl, creating a database involves updating two keyspaces:
db_name -> db_id and db_id -> db_meta.

This will make renaming a DB easy and troubleless.

We're still using the two-keyspace design?

@BohuTANG
Copy link
Member Author

As #4918 discussed, with this issue's create table operation, there are some self defined function like get_database_id, how can we use it in the meta transaction api?

@drmingdrmer
Copy link
Member

To batch-delete tables, without the need to understand business for metasrv, tables have to be collected in a unique prefix: __fd_tables/<tenant_id>/<db_id>/.

The kv namespaces would look like:

__fd_tenants/<tenant_id>                          -> (ver, tenant_meta)

__fd_databases/<tenant_id>/by_name/<db_name>      -> (ver, db_id)
__fd_databases/<tenant_id>/by_id/<db_id>          -> (ver, db_meta)

__fd_tables/<tenant_id>/<db_id>/by_name/<tb_name> -> (ver, tb_id)
__fd_tables/<tenant_id>/<db_id>/by_id/<tb_id>     -> (ver, tb_meta)

And the operations would be done by constraining the versions of resources it depends on.
E.g., db depends on tenant, and table depends on both tenant and db.

Every txn is done by several read from metasrv and one conditional write to metasrv:

Create db:

create_db(tenant_id, db_name, db_meta) {

    read("__fd_tenants/<tenant_id>") -> (tenant_ver, tenant_meta);
    read("__fd_databases/<tenant_id>/by_name/<db_name>") -> (db_id_ver, _db_id)

    if db_id_ver != 0 {
        return;
    }

    read("__seq") -> (db_id); // fetch a new id 

    write(
        If(
            Record("__fd_tenants/<tenant_id>").ver                     == tenant_ver
         && Record("__fd_databases/<tenant_id>/by_name/<db_name>").ver == 0
        ),
        Then(
            Write("__fd_databases/<tenant_id>/by_name/<db_name>", (ver=auto_incr(), db_id)),
            Write("__fd_databases/<tenant_id>/by_id/<db_id>",     (ver=auto_incr(), db_meta)), 
        ),
        Else()
    )
}

Delete db and all its tables:

delete_db(tenant_id, db_name) {

    read("__fd_tenants/<tenant_id>") -> (tenant_ver, tenant_meta);
    read("__fd_databases/<tenant_id>/by_name/<db_name>") -> (db_id_ver, db_id)

    if db_id_ver == 0 {
        return;
    }

    read("__fd_databases/<tenant_id>/by_id/<db_id>") -> (db_ver, db_meta)

    if db_ver == 0 {
        return;
    }

    write(
        If(
            Record("__fd_tenants/<tenant_id>").ver                     == tenant_ver
         && Record("__fd_databases/<tenant_id>/by_name/<db_name>").ver == db_id_ver
         && Record("__fd_databases/<tenant_id>/by_id/<db_id>").ver     == db_ver
        ),
        Then(
            Delete("__fd_databases/<tenant_id>/by_name/<db_name>"),
            Delete("__fd_databases/<tenant_id>/by_id/<db_id>"),
            DeleteByPrefix("__fd_tables/<tenant_id>/<db_id>/")
        ),
        Else()
    )
}

Create table:

create_table(tenant_id, db_name, tb_name, tb_meta) {

    read("__fd_tenants/<tenant_id>") -> (tenant_ver, tenant_meta);
    read("__fd_databases/<tenant_id>/by_name/<db_name>") -> (db_id_ver, db_id)

    if db_id_ver == 0 {
        return;
    }

    read("__fd_databases/<tenant_id>/by_id/<db_id>") -> (db_ver, db_meta)

    if db_ver == 0 {
        return;
    }

    read("__fd_tables/<tenant_id>/<db_id>/by_name/<tb_name>") -> (tb_id_ver, tb_id)

    if tb_id_ver != 0 {
        return;
    }

    read("__seq") -> (tb_id); // fetch a new id 

    write(
        If(
            Record("__fd_tenants/<tenant_id>").ver                          == tenant_ver
         && Record("__fd_databases/<tenant_id>/by_name/<db_name>").ver      == db_id_ver
         && Record("__fd_databases/<tenant_id>/by_id/<db_id>").ver          == db_ver
         && Record("__fd_tables/<tenant_id>/<db_id>/by_name/<tb_name>").ver == 0
        ),
        Then(
            Write("__fd_tables/<tenant_id>/<db_id>/by_name/<tb_name>", (ver=auto_incr(), tb_id)), 
            Write("__fd_tables/<tenant_id>/<db_id>/by_id/<tb_id>", (ver=auto_incr(), tb_meta)), 
        ),
        Else()
    )
}

@drmingdrmer
Copy link
Member

As #4918 discussed, with this issue's create table operation, there are some self defined function like get_database_id, how can we use it in the meta transaction api?

@BohuTANG does the above comment explain what you want? 🤔

@BohuTANG
Copy link
Member Author

👍 Good to me.

@BohuTANG
Copy link
Member Author

To better meet the needs, welcome to describe the pseudo-code operation for the new meta api:

  1. Data Share operations(This should be one of the most complex case I think) @youngsofun
  2. Time Travel(Specific has not yet been determined, but we can describe in general terms) @dantengsky
  3. Alter database rename (draft in impl alter database rename #4984) @TCeason

@youngsofun
Copy link
Member

youngsofun commented Apr 24, 2022

I have some questions. take Create db for example
@drmingdrmer

  1. To my understanding, tenant_ver only reflect the change of tenant_meta? so no need to check tenant_ver here?
  2. why not do read("__seq") -> (db_id) in Then, since it will write something.

Create db:

  create_db(tenant_id, db_name, db_meta) {

    read("__fd_tenants/<tenant_id>") -> (tenant_ver, tenant_meta);
    read("__fd_databases/<tenant_id>/by_name/<db_name>") -> (db_id_ver, _db_id)

    if db_id_ver != 0 {
        return;
    }

    read("__seq") -> (db_id); // fetch a new id 

    write(
        If(
            Record("__fd_tenants/<tenant_id>").ver                     == tenant_ver
         && Record("__fd_databases/<tenant_id>/by_name/<db_name>").ver == 0
        ),
        Then(
            Write("__fd_databases/<tenant_id>/by_name/<db_name>", (ver=auto_incr(), db_id)),
            Write("__fd_databases/<tenant_id>/by_id/<db_id>",     (ver=auto_incr(), db_meta)), 
        ),
        Else()
    )
}

@drmingdrmer
Copy link
Member

  • the whole create_db is performed in one request using the small language, and the lowercase write is the Transactional part of it?
  • the whole create_db is performed in client, and the lowercase write is the Transactional part of it and performed in one request?

These two are similar: the reading part can be done on either the client-side or server-side.
As @flaneur2020 suggested, the first step is to provide a single conditional Write API to support txn.
And the conditional Write is actually part of the small language so that we could just extend it if needed.

  • To my understanding, tenant_ver only reflect the change of tenant_meta? so no need to check tenant_ver here?

I am not quite sure about this. But I thought a database probably relies on tenant.
E.g. maybe changes on some field of tenant will prevent the user from creating a db.
It can be removed.

  • why not do read("__seq") -> (db_id) in Then, since it will write something.

To write a value that is not a const value, it needs some other syntax to be well defined.
I was trying to define Write in the same semantic as the upsertkv RPC, to keep simplicity.

But it's totally feasible, IMHO.

@TCeason
Copy link
Collaborator

TCeason commented Apr 24, 2022

In my mind, Alter database db_name rename to new_db_name; like this:

rename_db(tenant_id, db_name, new_db_name) {
	read("__fd_tenants/<tenant_id>") -> (tenant_ver, tenat_meta);
	read("__fd_tenants/<tenant_id>/by_name/<db_name>") -> (db_id_ver, db_id);
	if db_id_ver == 0 {
		return UnknownDatabase;
	}
        
	read("__fd_databases/<tenant_id>/by_id/<db_id>") -> (db_ver, db_meta);
	read("__fd_tenants/<tenant_id>/by_name/<new_db_name>") -> (new_db_id_ver, new_db_mata);
	if new_db_id_ver != {
		return DatabaseAlreadyExists;
	}
	
	write(
	If(
            Record("__fd_tenants/<tenant_id>").ver == tenant_ver
         && Record("__fd_databases/<tenant_id>/by_name/<db_name>").ver == 0
        ),
        Then(
            Write("__fd_databases/<tenant_id>/by_name/<new_db_name>", (ver=auto_incr(), db_id)),
            Write("__fd_databases/<tenant_id>/by_id/<db_id>", (ver=auto_incr(), db_meta)), 
            Delete("__fd_databases/<tenant_id>/by_name/<db_name>"),
        ),
        Else()
	)
}

@drmingdrmer
Copy link
Member

@TCeason
Looks good to me :DD

rename_db(tenant_id, db_name, new_db_name) {
	read("__fd_tenants/<tenant_id>") -> (tenant_ver, tenat_meta);

	read("__fd_tenants/<tenant_id>/by_name/<db_name>") -> (db_id_ver, db_meta);

typo: db_meta -> db_id

             Write("__fd_databases/<tenant_id>/by_id/<db_id>", (ver=auto_incr(), db_meta)), 

IMHO, renaming a database does not need to update the version of db_meta, since it actually does not modify db_meta.

@youngsofun
Copy link
Member

youngsofun commented Apr 24, 2022



// same as create database
create_share(tenant_id, share_name, db_meta) {
 read("__fd_tenants/<tenant_id>") -> (tenant_ver, tenant_meta);
 read("__fd_shares/<tenant_id>/by_name/<share_name>") -> (share_id_ver, share_id)
 if share_id_ver != 0 {
     return ShareAlreadyExist;
 }
 read("__seq") -> (share_id); // fetch a new id 
 write(
     If(
         Record("__fd_tenants/<tenant_id>").ver                     == tenant_ver
      && Record("__fd_shares/<tenant_id>/by_name/<share_name>").ver == 0
     ),
     Then(
         Write("__fd_shares/<tenant_id>/by_name/<share_name>", (ver=auto_incr(), share_id)),
         Write("__fd_shares/<tenant_id>/by_id/<sahre_id>",     (ver=auto_incr(), share_meta)), 
     ),
     Else()
 )
}


grant_share_to_tenant(provider_id, consumer_id, share_name, share_outbound_meta, share_inbound_meta) {
 read("__fd_tenants/<consumer_id>") -> (consumer_ver, tenant_meta);
 if consumer_ver == 0 {
     return UnknownTenant;
 }
 
 read("__fd_tenants/<provider_id>") -> (provider_ver, tenant_meta);
 read("__fd_shares/<provider_id>/by_name/<share_name>") -> (share_id_ver, share_id)
 if share_id_ver == 0 {
     return UnknownShare;
 }
 read("__fd_shares/<provider_id>/by_id/<share_id>") -> (share_ver, share_meta)
 if share_ver == 0 {
     return UnknownShare;
 }
 
 read("__fd_share_outbounds/<provider_id>/<share_id>") -> (share_out_ver, share_meta)
 if share_out_ver != 0 {
     return;
 }

 write(
     If(
        Record("__fd_tenants/<consumer_id>").ver                      == consumer_ver
         Record("__fd_shares/<provider_id>/by_name/<share_name>").ver == share_id_ver
         Record("__fd_shares/<provider_id>/by_id/<share_id>").ver     == share_ver
      && Record("__fd_share_outbounds/<share_id>/<consumer_id>").ver  == 0
     ),
     Then(
         Write("__fd_share_outbounds/<share_id>/<consumer_id>", (ver=auto_incr(), share_outbound_meta)),
         Write("__fd_share_inbounds/<consumer_id>/<share_id>",  (ver=auto_incr(), share_inbound_meta)), 
     ),
     Else()
 )
 
}


create_database_from_share (consumer_id, provider_id, share_name, db_name) {
read("__fd_databases/<consumer_id>/by_name/<db_name>") -> (db_id_ver, _db_id)
 if db_id_ver != 0 {
     return DBAlreadyExists;
 }

 read("__fd_tenants/<consumer_id>") -> (tenant_ver, tenant_meta);
 read("__fd_shares/<provider_id>/by_name/<share_name>") -> (share_id_ver, share_id)
 if share_id_ver == 0 {
     return UnknownShare;
 }
 read("__fd_shares/<provider_id>/by_id/<share_id>") -> (share_ver, share_meta)
 if share_ver == 0 {
     return UnknownShare;
 }
 read("__fd_share_inbounds/<consumer_id>/<share_id>) -> (share_in_ver, share_in_meta)
 if share_in_ver == 0 {
     return UnknownShare;
 }
 
 if share_in_meta.db != None {
    return ShareAlreadyInUse;
 }
 
 share_in_meta.db = db_name
 db_meta = db_meta_from_share(provider_id, share_id)
 read("__seq") -> (db_id); // fetch a new id 
    
 write(
     If(
         Record("__fd_tenants/<tenant_id>").ver                     == tenant_ver
      && Record("__fd_share_inbounds/<consumer_id>/<share_id>").ver == share_in_ver
      && Record("__fd_databases/<tenant_id>/by_name/<db_name>").ver == 0
     ),
     Then(
         Write("__fd_share_inbounds/<consumer_id>/<share_id>", (ver=auto_incr(), share_in_meta))
         Write("__fd_databases/<tenant_id>/by_name/<db_name>", (ver=auto_incr(), db_id)),
         Write("__fd_databases/<tenant_id>/by_id/<db_id>",     (ver=auto_incr(), db_meta)), 
     ),
     Else()
 )
}

grant_privilege_to_share (tenat_id, share_name, db_name, obj, priv) {
 read("__fd_tenants/<tenat_id>") -> (tenate_ver, tenant_meta);
 read("__fd_shares/<tenat_id>/by_name/<share_name>") -> (share_id_ver, share_id)
 if share_id_ver == 0 {
     return UnknownShare;
 }
 read("__fd_shares/<provider_id>/by_id/<share_id>") -> (share_ver, share_meta)
 if share_ver == 0 {
     return UnknownShare;
 }
 read("__seq") -> (db_id); // fetch a new id 
 
 read("__fd_share_privilege/<share_id>/<obj.type>/<obj.name>") -> (priv_ver, old_priv)
 if priv in old_priv:
 	return
 priv = old_priv|priv

 write(
     If(
         Record("__fd_tenants/<tenant_id>").ver                     == tenant_ver
      && Record("__fd_databases/<tenant_id>/by_name/<db_name>").ver == 0
      && Record("__fd_share_privilege/<share_id>/<obj.type>/<obj.name>).ver == priv_ver, 
     ),
     Then(
         Write("__fd_share_privilege/<share_id>/<obj.type>/<obj.name>",  (ver=auto_incr(), priv)), 
     ),
     Else()
 )
}


revoke_share_from_tenant(provider_id, consumer_id, share_name) {
// delete outbound, inbound
// consumer delete db by itself?
}

delete_share(tenant_id, db_name, db_meta) {
 // delete  name, id, outbound, priv, inbound
 // consumer delete db by itself?
}

revoke_privilege_from_share {
 // same as grant_privilege_to_share
 // delete the kv if priv is none
}

@youngsofun
Copy link
Member

Use some short-circuit in If

  1. only checking existence if not care about the meta
  2. if change of a=> change of b, only check the ver of b
  3. if delete a => delete b, only check the ver of b

@drmingdrmer
Copy link
Member

grant_share_to_tenant(provider_id, consumer_id, share_name, share_outbound_meta, share_inbound_meta) {
    read("__fd_tenants/<consumer_id>") -> (consumer_ver, tenant_meta);
    if consumer_ver == 0 {
        return UnknownTenant;
    }
    
    read("__fd_tenants/<provider_id>") -> (provider_ver, tenant_meta);
    read("__fd_shares/<provider_id>/by_name/<share_name>") -> (share_id_ver, share_id)
    if share_id_ver == 0 {
        return UnknownShare;
    }
    read("__fd_shares/<provider_id>/by_id/<share_id>") -> (share_ver, share_meta)
    if share_ver == 0 {
        return UnknownShare;
    }
    
    read("__fd_share_outbounds/<provider_id>/<share_id>") -> (share_out_ver, share_meta)
    if share_out_ver != 0 {
        return;
    }

    write(
        If(
           Record("__fd_tenants/<consumer_id>").ver                      == consumer_ver
            Record("__fd_shares/<provider_id>/by_name/<share_name>").ver == share_id_ver
            Record("__fd_shares/<provider_id>/by_id/<share_id>").ver     == share_ver
         && Record("__fd_share_outbounds/<share_id>/<consumer_id>").ver  == 0
        ),
        Then(
            Write("__fd_share_outbounds/<share_id>/<consumer_id>", (ver=auto_incr(), share_outbound_meta)),
            Write("__fd_share_inbounds/<consumer_id>/<share_id>",  (ver=auto_incr(), share_inbound_meta)), 
        ),
        Else()
    )
}

@youngsofun
Does it need to assert __fd_share_inbounds/<consumer_id>/<share_id> is None? i.e., ther ver is 0.

@youngsofun
Copy link
Member

Does it need to assert __fd_share_inbounds/<consumer_id>/<share_id> is None? i.e., ther ver is 0.

I was wondering about that too.

the setting is: key1 and key2 are all ways created/deleted at the same time.

  1. is it ok to check one of them to save one operation? and assert another only in test, or in debug mode?
  2. what to do in the application level when meeting inconsistency (one exists and the other not) in addition to logging it? go on with the create/drop op?

@drmingdrmer
Copy link
Member

Does it need to assert __fd_share_inbounds/<consumer_id>/<share_id> is None? i.e., ther ver is 0.

I was wondering about that too.

the setting is: key1 and key2 are all ways created/deleted at the same time.

  1. is it ok to check one of them to save one operation? and assert another only in test, or in debug mode?

I do not know. It is up to the application and has to enumerate every possible write operation on them.
But checking both of them will always be consistent.

  1. what to do in the application level when meeting inconsistency (one exists and the other not) in addition to logging it? go on with the create/drop op?

It will be a severe bug IMHO. The best way is to stop doing anything :)

@lichuang
Copy link
Contributor

maybe this create_db impl has some problem:

read("__seq") -> (db_id); // fetch a new id

the db_id also may be acquired by other at the same time.

may be move this alloc db id action in Txn?

create_db(tenant_id, db_name, db_meta) {

    read("__fd_tenants/<tenant_id>") -> (tenant_ver, tenant_meta);
    read("__fd_databases/<tenant_id>/by_name/<db_name>") -> (db_id_ver, _db_id)

    if db_id_ver != 0 {
        return;
    }

    read("__seq") -> (db_id); // fetch a new id 

    write(
        If(
            Record("__fd_tenants/<tenant_id>").ver                     == tenant_ver
         && Record("__fd_databases/<tenant_id>/by_name/<db_name>").ver == 0
        ),
        Then(
            Write("__fd_databases/<tenant_id>/by_name/<db_name>", (ver=auto_incr(), db_id)),
            Write("__fd_databases/<tenant_id>/by_id/<db_id>",     (ver=auto_incr(), db_meta)), 
        ),
        Else()
    )
}

@drmingdrmer
Copy link
Member

the db_id also may be acquired by other at the same time.

may be move this alloc db id action in Txn?

A seq number generated by metasrv will never be reused. This is safe.

@drmingdrmer
Copy link
Member

Most of the implementation tasks are done, except for some refactoring and testing jobs.

Is it time to close this issue since the design of txn-based schema-API is nailed now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-meta Area: databend meta serive
Projects
None yet
Development

No branches or pull requests

5 participants