Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve DB query performance #803

Merged
merged 1 commit into from
Oct 24, 2024

Conversation

peterhaochen47
Copy link
Member

@peterhaochen47 peterhaochen47 commented Oct 23, 2024

  • queries improved for MySQL and Postgres (including get-by-cred-name, find-by-name-like, delete-by-name) by performing case-insensitive search more efficiently

details:

  • product requirements
    • The credential name itself is case-sensitive data (you store /FooBar, you get /FooBar back). But the cred search (credhub api's get-by-cred-name, get-by-cred-name-like, find-by-name endpoints) needs to be case-insensitive (you search /fooBAR, you still get /FooBar back).
    • CredHub has to work with MySQL, Postgres, and H2 (though H2 is only for testing).
  • problem statements (current state)
    • For MySQL, the queries currently uppercase or lowercase the search: upper(credential.name) = upper(searched_term) (sometimes it's uppercase, sometimes lowercase; it's not consistent), and without any relevant index, a full table scan is performed (I assume because it has to freshly compute the uppercased value of the name column for every single row).
    • For Postgres, a lower(credential.name) index is present, but some queries still do "uppercase" (those generated by Spring JPA lib from java code), resulting in full table scan.
  • solution: Add a new generated column that computes and stores the lowercased version of the credential names, and add index on it, and search by that.
    • based on performance testing with local setup (240k cred entries in DB), this improves end-to-end performance by 3 times on average. Since this measurement includes fixed costs like local credhub cli processing time, local credhub server processing time, and local network latency, the DB query performance improvement must be much higher (more than 10 times based on MySQL team's analysis).
    • the generated column syntax in postgres & mysql does not quite work in H2 (despite H2 docs saying it works: https://h2database.com/html/features.html#generated_columns). Perhaps our H2 version is too old. But a similar syntax does work (tests passed). This is fine because we don't need to optimize H2 performance since it's just for testing.

[internal tracking: https://vmw-jira.broadcom.net/browse/TPCF-26571]

- queries improved for MySQL and Postgres (including get-by-cred-name,
find-by-name-like, delete-by-name) by performing case-insensitive search
more efficiently

details:
- product requirements
    - The credential name itself is **case-sensitive** data (you store
    `/FooBar`, you get `/FooBar` back). But the cred search (credhub api's
    get-by-cred-name, get-by-cred-name-like, find-by-name endpoints) needs
    to be **case-insensitive** (you search `/fooBAR`, you still get `/FooBar` back).
    - CredHub has to work with MySQL, Postgres, and H2 (though H2 is only for testing).
- problem statements (current state)
    - For MySQL, the queries currently uppercase or lowercase the search:
    `upper(credential.name) = upper(searched_term)` (sometimes it's uppercase,
    sometimes lowercase; it's not consistent), and without any relevant index,
    a full table scan is performed (I assume because it has to freshly compute
    the uppercased value of the name column for every single row).
    - For Postgres, a `lower(credential.name)` index is present, but
    some queries still do "uppercase" (those generated by Spring JPA lib from
    java code), resulting in full table scan.
- solution: Add a new generated column that computes and stores the lowercased
version of the credential names, and add index on it.
    - based on performance testing with local setup (240k cred entries in DB),
    this improves end-to-end performance by 3 times on average. Since this
    measurement includes fixed costs like local credhub cli processing time,
    local credhub server processing time, and local network latency,
    the DB query performance improvement must be much higher (more than 10 times
    based on MySQL team's analysis).
    - the generated column syntax in postgres & mysql does not quite work in H2
    (despite H2 docs saying it works: https://h2database.com/html/features.html#generated_columns).
    Perhaps our H2 version is too old. But a similar syntax does work (tests passed).
    This is fine because we don't need to optimize H2 performance since it's
    just for testing.

[internal tracking: https://vmw-jira.broadcom.net/browse/TPCF-26571]
Copy link
Member

@duanemay duanemay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a good performance improvement on mysql and postgres.
Just one question on H2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we have issues with name_lowercase not being populated when inserting new rows on H2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that name_lowercase is indeed being populated when inserting new rows. Otherwise our tests under H2 mode wouldn't be passing, since the code is searching by name_lowercase now. E.g. this test inserts a new row and then search for that (the underlying query is searching by the name_lowercase column).

@peterhaochen47 peterhaochen47 merged commit 56a4b42 into main Oct 24, 2024
2 checks passed
peterhaochen47 added a commit that referenced this pull request Nov 12, 2024
- context: the performance improvement brought by indexing (#803)
was reverted (#815) due to migration error (issue #813) for some users.
This commit brings back that index, but doing so safely to avoid issues
like #813 (#813 happened to some users due to migration file numbering conflict,
this commit uses a fresh migration number (v59) to avoid that).

- Unfortunately, for the users who happened to have installed
credhub v2.12.94 successfully (aka did not encounter #813),
their mysql DB would already have an index called
"credential_name_lowercase". So in order for the migration in this
commit to work for both those who have installed credhub v2.12.94
and those who have not, this commit's migration instead creates
the index under a different name: "credential_name_lowercase_index"
(mysql does not support the "CREATE INDEX...IF NOT EXIST" syntax).
This would create some DB inefficiency (due to the presence of two
functionally identical indexes) for those who have installed credhub v2.12.94
(which we think is a smaller subset of users), but the system
would still work and likely a net improvement compared to before
due to the presence of indexing. And we would instruct these users
to manually drop the old, duplicate index ("credential_name_lowercase").
peterhaochen47 added a commit that referenced this pull request Nov 13, 2024
- context: the performance improvement brought by indexing (#803)
was reverted (#815) due to migration error (issue #813) for some users.
This commit brings back that index, but doing so safely to avoid issues
like #813 (#813 happened to some users due to migration file numbering conflict,
this commit uses a fresh migration number (v59) to avoid that).

- Unfortunately, for the users who happened to have installed
credhub v2.12.94 successfully (aka did not encounter #813),
their mysql DB would already have an index called
"credential_name_lowercase". So in order for the migration in this
commit to work for both those who have installed credhub v2.12.94
and those who have not, this commit's migration instead creates
the index under a different name: "credential_name_lowercase_index"
(mysql does not support the "CREATE INDEX...IF NOT EXIST" syntax).
This would create some DB inefficiency (due to the presence of two
functionally identical indexes) for those who have installed credhub v2.12.94
(which we think is a smaller subset of users), but the system
would still work and likely a net improvement compared to before
due to the presence of indexing. And we would instruct these users
to manually drop the old, duplicate index ("credential_name_lowercase").
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

2 participants