-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refact: optimize stelae database by hashing composite keys; small ergonomic improvements #44
Conversation
Two notable changes: Remove `inserts.rs` and `queries.rs` modules and move existing methods to it's own `models/*/manager.rs` module. This change makes it really clear which model the method belongs to, and hopefully increase codebase legibility. It should now be quite obvious where in our codebase we're calling out to database operations. Introduce `TxManager` trait, which holds all the methods that work with a database transaction. Currently these methods are functions that insert change objects in the database. Ideally we would want the `Manager` and `TxManager` to be a single trait, but we would need to know which Pool or Transaction object we use at compile time. This seemed too tricky to figure out quickly, so instead opt to call out to `TxManager` when managing transactions, and `Manager` when managing regular connection calls (outside of a transaction). Fix insert for multiple stele. Previously everything was getting applied to a connection, instead of a transaction. This was because we used `DatabaseConnection` struct everywhere. What we instead wanted was the `DatabaseTransaction` struct, which runs the sqlx queries/inserts within a transaction. Add error logging to git2::walk method and fix the `find_blob` method to not fail during the git repo walk.
- This commit aims at optimizing the `db.sqlite3` size. - The sqlite db size is now down from 47 MB to 30 MB for one stele by hashing composite keys. - Redundant string columns are removed. - Add `utils/md5.rs` module for utility hashing. The composite columns are now hashed to make the db size more compact. - Update versions endpoint following the refactor - Update database model, structs and SQL statements following the optimization.
The purpose of `stelae update` will be twofold: 1) Pull down any changes to the default stelae using TAF updater. 2) Update database. Currently stelae only does the latter. Leave an inline comment with our long term goals.
Progress on #36 Added `CliError` enum that maps any/all errors related to working with stelae CLI. Updated `run` to not return any errors, but instead to decide how to stop the process. We expect to currently return error code 1 on any found stelae errors, and instruct the users to inspect the local logs. Still have issues with centralizing errors found during initializing actix `App` instance. The reason this error handling is tricky is because we're passing in the generic, opaque `App` type to both `app.rs` and `routes.rs`. I tried initializing the `init_app` in `app.rs` outside of the `HttpServer::new(..)`, but the problem was that `App` did not implement `Clone` trait, so could not get it working quickly. This is something we'd like to resolve in the future, so leave the process exits uncentralized for now, and leave a comment in `app.rs`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
The only thing I'm unclear on is why you have some single use traits for database transactions. For example the trait that defines find_all_document_versions_by_mpath_and_publication
, it's only ever implemented once. I usually think of a trait indicating that something is going to be implemented more than once.
|
||
/// Convert a `Status` enum to an integer. | ||
#[must_use] | ||
pub const fn to_int(&self) -> i64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor, but Postgres supports enums https://www.postgresql.org/docs/current/datatype-enum.html I don't know how easy it is to actually map Rust enums to Postgres enums, but it's something worth bearing in mind. Because seeing say ElementAdded
when debugging raw SQL output is much nicer than just seeing 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. We'll probably completely transition away from using Postgres. Then I'll remove all the Postgres statements and generic connections that we currently have in the code, so this might not be important?
tracing::error!("Error: {:?}", err); | ||
process::exit(1); | ||
tracing::error!("Error: {err:?}"); | ||
// NOTE: We should not need to exit code 1 here (or in any of the closures in `routes.rs`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I'm surprised the error here doesn't get propogated to the error in run()
that your already catching. But I don't think panicing here is so bad, because it's just at app startup anyway, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I'll say we leave it here for now
Thanks for the review. My thought for traits is that they were a convenient way of mapping out all the queries, because then we can mock/test the database queries in our testing framework. Do you think there might be a better way of doing it? I'll test my hypothesis by adding tests hopefully in the near future! |
Ah yes, traits make sense for testing. I can't say I've ever actually mocked DB calls, in any language. What are the advantages? I suppose the main one is that it speeds up tests? It seems a bit brittle to me, that the mocked responses could become stale, creating false positives. And that some extra coverage is gained by testing against a real database. But anyway, using traits to aid tests is a legitimate use case, so all good. |
Closes #39
A couple of notable changes:
Database improvements.
Moved database functions from
statements/inserts.rs
andstatements/queries.rs
tomodels/*/manager.rs
modules. This change shows a gigantic git diff for all those functions, but is mostly just codebase reorganizing.Add
md-5
crate that hashes the composite key columns as the primary key, instead of using composite keys everywhere.This improvement brings the database size down significantly, since we're not storing expensive duplicate column text values for tables.
Commits:
Ergonomics and QoL improvements:
stelae insert-history
intostelae update
.app.rs
androutes.rs
modules, since returning/propagating errors from closures when we return an opaqueApp
type was causing us compiling issues. I tried to explain the reasoning behind the issue in anapp.rs
comment. I think we can fix/resolve this at a later point?