feat: improve DB query performance #803

peterhaochen47 · 2024-10-23T18:12:18Z

queries improved for MySQL and Postgres (including get-by-cred-name, find-by-name-like, delete-by-name) by performing case-insensitive search more efficiently

details:

product requirements
- The credential name itself is case-sensitive data (you store /FooBar, you get /FooBar back). But the cred search (credhub api's get-by-cred-name, get-by-cred-name-like, find-by-name endpoints) needs to be case-insensitive (you search /fooBAR, you still get /FooBar back).
- CredHub has to work with MySQL, Postgres, and H2 (though H2 is only for testing).
problem statements (current state)
- For MySQL, the queries currently uppercase or lowercase the search: upper(credential.name) = upper(searched_term) (sometimes it's uppercase, sometimes lowercase; it's not consistent), and without any relevant index, a full table scan is performed (I assume because it has to freshly compute the uppercased value of the name column for every single row).
- For Postgres, a lower(credential.name) index is present, but some queries still do "uppercase" (those generated by Spring JPA lib from java code), resulting in full table scan.
solution: Add a new generated column that computes and stores the lowercased version of the credential names, and add index on it, and search by that.
- based on performance testing with local setup (240k cred entries in DB), this improves end-to-end performance by 3 times on average. Since this measurement includes fixed costs like local credhub cli processing time, local credhub server processing time, and local network latency, the DB query performance improvement must be much higher (more than 10 times based on MySQL team's analysis).
- the generated column syntax in postgres & mysql does not quite work in H2 (despite H2 docs saying it works: https://h2database.com/html/features.html#generated_columns). Perhaps our H2 version is too old. But a similar syntax does work (tests passed). This is fine because we don't need to optimize H2 performance since it's just for testing.

[internal tracking: https://vmw-jira.broadcom.net/browse/TPCF-26571]

- queries improved for MySQL and Postgres (including get-by-cred-name, find-by-name-like, delete-by-name) by performing case-insensitive search more efficiently details: - product requirements - The credential name itself is **case-sensitive** data (you store `/FooBar`, you get `/FooBar` back). But the cred search (credhub api's get-by-cred-name, get-by-cred-name-like, find-by-name endpoints) needs to be **case-insensitive** (you search `/fooBAR`, you still get `/FooBar` back). - CredHub has to work with MySQL, Postgres, and H2 (though H2 is only for testing). - problem statements (current state) - For MySQL, the queries currently uppercase or lowercase the search: `upper(credential.name) = upper(searched_term)` (sometimes it's uppercase, sometimes lowercase; it's not consistent), and without any relevant index, a full table scan is performed (I assume because it has to freshly compute the uppercased value of the name column for every single row). - For Postgres, a `lower(credential.name)` index is present, but some queries still do "uppercase" (those generated by Spring JPA lib from java code), resulting in full table scan. - solution: Add a new generated column that computes and stores the lowercased version of the credential names, and add index on it. - based on performance testing with local setup (240k cred entries in DB), this improves end-to-end performance by 3 times on average. Since this measurement includes fixed costs like local credhub cli processing time, local credhub server processing time, and local network latency, the DB query performance improvement must be much higher (more than 10 times based on MySQL team's analysis). - the generated column syntax in postgres & mysql does not quite work in H2 (despite H2 docs saying it works: https://h2database.com/html/features.html#generated_columns). Perhaps our H2 version is too old. But a similar syntax does work (tests passed). This is fine because we don't need to optimize H2 performance since it's just for testing. [internal tracking: https://vmw-jira.broadcom.net/browse/TPCF-26571]

duanemay

This should be a good performance improvement on mysql and postgres.
Just one question on H2.

duanemay · 2024-10-23T21:00:19Z

...i/src/main/resources/db/migration/h2/V59__add_lowercase_credential_name_column_and_index.sql

Would we have issues with name_lowercase not being populated when inserting new rows on H2?

I assume that name_lowercase is indeed being populated when inserting new rows. Otherwise our tests under H2 mode wouldn't be passing, since the code is searching by name_lowercase now. E.g. this test inserts a new row and then search for that (the underlying query is searching by the name_lowercase column).

- context: the performance improvement brought by indexing (#803) was reverted (#815) due to migration error (issue #813) for some users. This commit brings back that index, but doing so safely to avoid issues like #813 (#813 happened to some users due to migration file numbering conflict, this commit uses a fresh migration number (v59) to avoid that). - Unfortunately, for the users who happened to have installed credhub v2.12.94 successfully (aka did not encounter #813), their mysql DB would already have an index called "credential_name_lowercase". So in order for the migration in this commit to work for both those who have installed credhub v2.12.94 and those who have not, this commit's migration instead creates the index under a different name: "credential_name_lowercase_index" (mysql does not support the "CREATE INDEX...IF NOT EXIST" syntax). This would create some DB inefficiency (due to the presence of two functionally identical indexes) for those who have installed credhub v2.12.94 (which we think is a smaller subset of users), but the system would still work and likely a net improvement compared to before due to the presence of indexing. And we would instruct these users to manually drop the old, duplicate index ("credential_name_lowercase").

duanemay approved these changes Oct 23, 2024

View reviewed changes

peterhaochen47 merged commit 56a4b42 into main Oct 24, 2024
2 checks passed

jochenehret mentioned this pull request Nov 11, 2024

Database migration in credhub-release 2.12.94 fails. #813

Closed

3 tasks

peterhaochen47 mentioned this pull request Nov 12, 2024

Roll back problematic mysql db migrations (issue #813) #814

Closed

peterhaochen47 mentioned this pull request Nov 12, 2024

feat: add back index on credential lowercase name #816

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve DB query performance #803

feat: improve DB query performance #803

peterhaochen47 commented Oct 23, 2024 •

edited

Loading

duanemay left a comment

duanemay Oct 23, 2024

peterhaochen47 Oct 23, 2024

feat: improve DB query performance #803

feat: improve DB query performance #803

Conversation

peterhaochen47 commented Oct 23, 2024 • edited Loading

duanemay left a comment

Choose a reason for hiding this comment

duanemay Oct 23, 2024

Choose a reason for hiding this comment

peterhaochen47 Oct 23, 2024

Choose a reason for hiding this comment

peterhaochen47 commented Oct 23, 2024 •

edited

Loading