Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should things and channels entity ID be UUIDv4? #403

Closed
drasko opened this issue Sep 21, 2018 · 4 comments
Closed

Should things and channels entity ID be UUIDv4? #403

drasko opened this issue Sep 21, 2018 · 4 comments

Comments

@drasko
Copy link
Contributor

drasko commented Sep 21, 2018

Currently we use Postgres serial sequece number. This ensures consistence (no-conflicting UUID) but is not generic enough - i.e. switching to another DB like CockroachDB is changing the format of ID field.

Should we introduce UUIDv4 here, knowing the drawbacks: https://rclayton.silvrback.com/do-you-really-need-a-uuid-guid, or find some other solution in which we can keep ID derived broth from Postgres and Cockroach.

Also interesting article: https://rob.conery.io/2014/05/28/a-better-id-generator-for-postgresql/

@drasko
Copy link
Contributor Author

drasko commented Sep 21, 2018

@chombium @nmarcetic @anovakovic01 @dusanb94 @srados @blokovi what are your comments on this?

@drasko drasko changed the title Should things and channels entity ID be UUIDv4 Should things and channels entity ID be UUIDv4? Sep 21, 2018
@robconery
Copy link

If you need globally unique IDs then a UUID is just fine. PostgreSQL and other systems have a fine way of indexing them too, so that's not really a concern as long as your key is a type uuid. This has the added benefit of portability - so if you switch to another system for some ridiculous reason (I ❤️ Postgres) you can be confident your ID scheme will be portable.

My approach was aimed at bigint and huge data sets that might be distributed. Serials almost always work, are incredibly fast and pretty hard to blow up. I don't think most developers understand just how many ints they can pack into an int.

Finally - the article you referenced has a number of flame-waving weirdnesses. A uuid in PostgreSQL is NOT a 36-bit string, it won't degrade "performance" but you might have to consider a tradeoff with write speed vs. read. Writes will likely be a nanosecond slower, if that even matters. Indexing a uuid in PostgreSQL will be more time-consuming than an int, but if you index concurrently than who the f** cares. If you use a natural key for anything I will find you and pour sticky toffee in your socks. Finally: presenting an ID to a user is the very last of any DBA's concerns, and is a pile of steaming bullshit.

UUIDs are fine primary keys, particularly when they need to be truly, globally unique. Otherwise yeah - they're overkill.

@drasko
Copy link
Contributor Author

drasko commented Sep 23, 2018

@robconery UUIDv4 I mentioned are of type string, and unrelated to Postgres - generated by this package for example: https://github.com/satori/go.uuid. I am not sure that other DBs would have uuid type (but I am not really DB ninja), and idea here is to have possibility to eventually support other DBs.

TBH, for Mainflux is is not likely to happen, Postges here works just fine and it's a good use-case. We thought that metadata (identities) here needed strong consistency, although it might turn out in the future that people are ready to trade this consistency for easier scalability and flexibility of easily changing data models without complicated migrations.

The only thing that we did try and had very good results is replacing Postgres with CockroachDB v2, which is 1:1 replacement - we did not change one line of code. Cockroach is really easy to scale, and in our Kubernetes deployment it was behaving quite nice. BUt it generated different serial IDs than Postgres (non-icremental). As a note - we use int64 AFAIK to store these indexes - and it works fine with both Postgres and Cockroach.

Generating UUIDv4 with aforementioned library might in some rare cases lead to inconsistency - i.e. conflicts. This is why Twitter created project Snowflake:

which was later archived.

Another interesting article: https://preparingforcodinginterview.wordpress.com/2017/03/21/unique-id-generation-in-distributed-systems/

This project looks very interesting: https://github.com/segmentio/ksuid

There is also this: https://github.com/ericelliott/cuid

@robconery from what I see both ksuid and cuid are strings. Do you know how this would affect performance in Postgres?

@anovakovic01
Copy link
Contributor

This change will be done by resolving #475.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants