Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Keep parsed sqlx-data.json in a cache instead of reparsing #1684

Merged

Conversation

CosmicHorrorDev
Copy link
Contributor

I didn't delve too much into how things are used, but from what I can tell when using the offline data cache each compile-time verified query will parse the entire sqlx-data.json file. This change switches to caching the parsed sqlx-data.json files in a two-layer cache where the first layer is keyed by the path to sqlx-data.json (since there can be multiple) and the second layer is keyed by the query hash.

This did involve changing some of the types returned for errors since this now uses Strings instead of serde::de::Err::custom. I didn't think this was a huge issue since the error type is a Box<dyn std::error::Error>.

Comparison

Note: the numbers here are ballpark around what I see at work. We have a workspace with over 200 queries that uses --merged to create a workspace level sqlx-data.json file. This file ends up being a hair over 500 KiB

Setup

Cargo.toml

[package]
name = "sqlx-can-make-checks-slow"
version = "0.1.0"
edition = "2021"

[dependencies]
sqlx = { version = "0.5.10", features = ["offline", "runtime-tokio-rustls", "sqlite"] }
tokio = { version = "1.16.1", features = ["full"] }

[patch.crates-io]
sqlx-macros = { path = "../../lib/sqlx/sqlx-macros" }

main.rs is generated with a python file called gen_main.py

#! /usr/bin/env python

import sys

PRELUDE = """
use sqlx::{Connection, SqliteConnection};

#[derive(Debug)]
#[allow(dead_code)]
struct QueryData {
    id: i64,
    val1: String,
    val2: i64,
    val3: Option<String>,
    val4: Option<i64>,
    val5: String,
    val6: i64,
    val7: Option<String>,
    val8: Option<i64>,
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut conn = SqliteConnection::connect("sqlite:repro.db").await?;
"""

MAIN_END = """

    Ok(())
}
"""

QUERY_FUNC_TEMPLATE = """
async fn query{i}(conn: &mut SqliteConnection) -> sqlx::Result<QueryData> {{
    sqlx::query_as!(
        QueryData,
        "
            -- Comment for uniqueness {i}
            SELECT
                id,
                val1,
                val2,
                val3,
                val4,
                val5,
                val6,
                val7,
                val8
            FROM Foo
        "
    ).fetch_one(conn).await
}}
"""


def main():
    num_queries = int(sys.argv[1])

    main_body = ""
    query_funcs = ""

    for i in range(1, num_queries + 1):
        main_body += f"\n    dbg!(query{i}(&mut conn).await?);"
        query_funcs += QUERY_FUNC_TEMPLATE.format(i=i)

    print(f"{PRELUDE}{main_body}{MAIN_END}{query_funcs}")


if __name__ == "__main__":
    main()

I used 200 queries for this test so ./gen_main.py 200 > main.rs makes the main. Here's an example of ./gen_main.py 1

use sqlx::{Connection, SqliteConnection};

#[derive(Debug)]
#[allow(dead_code)]
struct QueryData {
    id: i64,
    val1: String,
    val2: i64,
    val3: Option<String>,
    val4: Option<i64>,
    val5: String,
    val6: i64,
    val7: Option<String>,
    val8: Option<i64>,
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut conn = SqliteConnection::connect("sqlite:repro.db").await?;

    dbg!(query1(&mut conn).await?);

    Ok(())
}

async fn query1(conn: &mut SqliteConnection) -> sqlx::Result<QueryData> {
    sqlx::query_as!(
        QueryData,
        "
            -- Comment for uniqueness 1
            SELECT
                id,
                val1,
                val2,
                val3,
                val4,
                val5,
                val6,
                val7,
                val8
            FROM Foo
        "
    ).fetch_one(conn).await
}

and then here is the table for the sqlite database, just created on the command line with $ sqlite3 repro.db

create table Foo (
    id integer primary key autoincrement,
    val1 text not null,
    val2 int not null,
    val3 text,
    val4 int,
    val5 text not null,
    val6 int not null,
    val7 text,
    val8 int
);

Running

With all the setup out of the way here are some of the numbers. Both of the check times are from

$ hyperfine --warmup 1 --prepare 'touch src/main.rs' 'cargo check'
  • sqlx-data.json size: 305 KiB
  • cargo check on master: (mean ± σ): 6.298 s ± 0.035 s
  • cargo check on this branch: (mean ± σ): 574.8 ms ± 3.5 ms

@abonander
Copy link
Collaborator

You may also be interested in the discussion in #1598, and in particular, #1598 (comment)

@CosmicHorrorDev
Copy link
Contributor Author

I dove into the rabbit hole for a bit and it seems like there is a lot of discussion on restructuring how query information is returned from the database and potentially stored. The one-file-per-query implementation would mitigate the same issue this PR addresses, and keeping a cached pool for communicating with the DB is a similar approach to what is done with this PR (I was just looking into that today since I noticed each query opens a new connection. This makes running cargo sqlx prepare take over 3 minutes at my work, but most of the devs use local databases instead where it's not a noticable issue)

Having looked into the discussion more, I'm really not sure where this PR stands. It seems to be an improvement to the current offline workflow, but that may be changing drastically

@abonander
Copy link
Collaborator

I think this PR still has merit because we'll probably put out another point release or two before 0.6.0, especially since we have so many PRs in the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants