-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improve DB query performance #803
Conversation
- queries improved for MySQL and Postgres (including get-by-cred-name, find-by-name-like, delete-by-name) by performing case-insensitive search more efficiently details: - product requirements - The credential name itself is **case-sensitive** data (you store `/FooBar`, you get `/FooBar` back). But the cred search (credhub api's get-by-cred-name, get-by-cred-name-like, find-by-name endpoints) needs to be **case-insensitive** (you search `/fooBAR`, you still get `/FooBar` back). - CredHub has to work with MySQL, Postgres, and H2 (though H2 is only for testing). - problem statements (current state) - For MySQL, the queries currently uppercase or lowercase the search: `upper(credential.name) = upper(searched_term)` (sometimes it's uppercase, sometimes lowercase; it's not consistent), and without any relevant index, a full table scan is performed (I assume because it has to freshly compute the uppercased value of the name column for every single row). - For Postgres, a `lower(credential.name)` index is present, but some queries still do "uppercase" (those generated by Spring JPA lib from java code), resulting in full table scan. - solution: Add a new generated column that computes and stores the lowercased version of the credential names, and add index on it. - based on performance testing with local setup (240k cred entries in DB), this improves end-to-end performance by 3 times on average. Since this measurement includes fixed costs like local credhub cli processing time, local credhub server processing time, and local network latency, the DB query performance improvement must be much higher (more than 10 times based on MySQL team's analysis). - the generated column syntax in postgres & mysql does not quite work in H2 (despite H2 docs saying it works: https://h2database.com/html/features.html#generated_columns). Perhaps our H2 version is too old. But a similar syntax does work (tests passed). This is fine because we don't need to optimize H2 performance since it's just for testing. [internal tracking: https://vmw-jira.broadcom.net/browse/TPCF-26571]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a good performance improvement on mysql and postgres.
Just one question on H2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we have issues with name_lowercase not being populated when inserting new rows on H2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that name_lowercase is indeed being populated when inserting new rows. Otherwise our tests under H2 mode wouldn't be passing, since the code is searching by name_lowercase now. E.g. this test inserts a new row and then search for that (the underlying query is searching by the name_lowercase column).
- context: the performance improvement brought by indexing (#803) was reverted (#815) due to migration error (issue #813) for some users. This commit brings back that index, but doing so safely to avoid issues like #813 (#813 happened to some users due to migration file numbering conflict, this commit uses a fresh migration number (v59) to avoid that). - Unfortunately, for the users who happened to have installed credhub v2.12.94 successfully (aka did not encounter #813), their mysql DB would already have an index called "credential_name_lowercase". So in order for the migration in this commit to work for both those who have installed credhub v2.12.94 and those who have not, this commit's migration instead creates the index under a different name: "credential_name_lowercase_index" (mysql does not support the "CREATE INDEX...IF NOT EXIST" syntax). This would create some DB inefficiency (due to the presence of two functionally identical indexes) for those who have installed credhub v2.12.94 (which we think is a smaller subset of users), but the system would still work and likely a net improvement compared to before due to the presence of indexing. And we would instruct these users to manually drop the old, duplicate index ("credential_name_lowercase").
- context: the performance improvement brought by indexing (#803) was reverted (#815) due to migration error (issue #813) for some users. This commit brings back that index, but doing so safely to avoid issues like #813 (#813 happened to some users due to migration file numbering conflict, this commit uses a fresh migration number (v59) to avoid that). - Unfortunately, for the users who happened to have installed credhub v2.12.94 successfully (aka did not encounter #813), their mysql DB would already have an index called "credential_name_lowercase". So in order for the migration in this commit to work for both those who have installed credhub v2.12.94 and those who have not, this commit's migration instead creates the index under a different name: "credential_name_lowercase_index" (mysql does not support the "CREATE INDEX...IF NOT EXIST" syntax). This would create some DB inefficiency (due to the presence of two functionally identical indexes) for those who have installed credhub v2.12.94 (which we think is a smaller subset of users), but the system would still work and likely a net improvement compared to before due to the presence of indexing. And we would instruct these users to manually drop the old, duplicate index ("credential_name_lowercase").
details:
/FooBar
, you get/FooBar
back). But the cred search (credhub api's get-by-cred-name, get-by-cred-name-like, find-by-name endpoints) needs to be case-insensitive (you search/fooBAR
, you still get/FooBar
back).upper(credential.name) = upper(searched_term)
(sometimes it's uppercase, sometimes lowercase; it's not consistent), and without any relevant index, a full table scan is performed (I assume because it has to freshly compute the uppercased value of the name column for every single row).lower(credential.name)
index is present, but some queries still do "uppercase" (those generated by Spring JPA lib from java code), resulting in full table scan.[internal tracking: https://vmw-jira.broadcom.net/browse/TPCF-26571]