Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Prepared play nicer with connection pools #95

Closed
Abscissa opened this issue Mar 1, 2017 · 18 comments · Fixed by #157
Closed

Make Prepared play nicer with connection pools #95

Abscissa opened this issue Mar 1, 2017 · 18 comments · Fixed by #157

Comments

@Abscissa
Copy link

Abscissa commented Mar 1, 2017

As @Marenz pointed out over here, since prepared statements are (as per the way MySQL works) tied to a specific connection, using them in combination with connection pools is awkward and leads to questionable benefit.

Need some mechanism, probably a connection-independent wrapper for Prepared, to help alleviate those issues.

@Abscissa
Copy link
Author

Been doing a lot of thinking about prepared statements and connection pools, and wanted to put down my thoughts so far.

Current downsides:

Current design of prepared statements (while a vast improvement over the pre-v1.0.0 struct Command-based interface, IMHO), has a few downsides:

  1. They don't play well with connection pools (see here) because they are tied to a particular connection (which is inherent to the way the MySQL client/server protocol works).

  2. When using vibe.d, they cannot be used across fibers or after a LockedConnection is released, or bad things may happen (again, because they are inherently tied to a connection, as per the communications protocol). See Make sure it's clear/safe that Prepared shouldn't be shared across vibe fibers. #97. I suspect this might also be the root of Using prepared fails sometimes with "Acquiring waiter that is already in use " #153 (though I'm uncertain).

  3. They expose a duplicate exec/query interface that mirrors that of Connection. Not the most DRY, and leads to some awkwardness in the docs, IMO.

  4. Every instance of Prepared has both exec functions (for SELECT/etc) and query... functions (for INSERT/UPDATE/etc). But the SQL is already chosen at construction, so no Prepared will ever be able to use both exec and query...: For every Prepared, either exec is never valid or query... is never valid. This always stuck me as messy and uncomfortable.

  5. Prepared vs PreparedImpl: This situation is a bit messy, too. The real implementation and API of prepared statements is in PreparedImpl. Prepared is just a refcounted wrapper overtop that makes a mess of the ddox-generated documentation, AND no longer accomplishes anything anyway now that PreparedImpl no longer has any dtor (for various complicated reasons relating to struct dtors, the auto-purge feature from v1.1.4 and recognizing that manual release of statements from the server isn't stictly necessary, being per-connection after all).

Why these problems?

I believe, for the most part, these issues are ultimately symptomatic of the Prepared abstraction not accurately matching the reality. Thus, it needs high-level rethinking:

Currently, Prepared (just like the original functionality in the old pre-v1.0.0 Command struct) is designed around the low-level reality that MySQL ties prepared statements to individual connections. But then, instead of treating prepared statements as something owned by a connection (as they realistically are), Prepared flips this around and has a connection, instead of the reality that a connection has a prepared statement.

It's also wrong on a higher level: Conceptually, to a database's human user, a prepared statement is...an SQL statement...that just has special "slots" for parameters and provides certain benefits. To a user, that's it, nothing more. That these particular statements happen to be tied to individual connections is merely an implementation detail of the communications protocol.

I believe these disconnects are the main cause of the five above problems with Prepared.

Proposed solution for v2.0.0:

Originally, I was thinking about (if anything at all) maybe offering an additional abstraction over top Prepared which manages prepared statements across multiple connections. But that would only address the first problem above, not all five, and I now believe would liklely create additional mess and problems due to the original disconnect only being covered up, not resoloved.

So I think a re-design is warranted, and here's what I'm currently thinking:

  • Get rid of PreparedImpl. Just have a Prepared struct and be done with it. In the unlikely case anyone really does need deterministic server-side release of prepared statements, they can just make a trivial RefCounted wrapper over Prepared (as they would need to do since v1.1.4 anyway): Make an alias this object wrapping Prepared. Call Prepared.release in the dtor. Wrap it in RefCounted!T. Done.

  • Get rid of Prepared's reference to a connection, as well as all of Prepared's exec/query functions. To execute a prepared statement, instead of calling exec/query on a Prepared, you'd pass a Prepared in place of an SQL string to an overload of Connection.exec or Connection.query....

  • Since the reality is that "a connection HAS a prepared statement" and "prepared statements are a feature of a connection", not the other way around, each Connection would internally manage its own set of prepared statements, indexed by some connection-independent ID (probably just the SQL string itself, at least initially). If the user tries to use a statement that hasn't been prepared on this connection, the connection automatically creates it. If the user manually releases a prepared statement, the prepared statement updates some master "to release" list and each Connection can check this list whenever appropriate and release its own (similar to what Connection already does for the sake of auto-purge). Offering finer-grained per-connection releasing would be possible, if ever needed (or perhaps even preferable to shoehorning a "release this statement on all connections" solution). No Prepared would ever have to know that connection pools (or connections) even exist, let alone which one is being used. Neatly encapsulated.

I think that neatly addresses all five problems above.

Does all this mean mysql-native is getting too high-level for its charter?

Mysql-native was originally intended as a low-level client lib on which higher-level DB libs, ORMs, etc, could be based. There is need for and value in that.

Yes, IMO, modern mysql-native has been growing higher level than its original charter, and this Prepared redesign takes that further. However, I believe these higher-level interfaces are more than worthwhile and moving in a very good direction for the average D database user.

So what of the need for a stripped-down low-level API?

First off all, there's nothing stopping other high-level DB libs from basing their MySQL/MariaDB support on top of mysql-native's higher-level interfaces. That's entirely viable, and may even make things easier for the lib developers. And mysql-native does intend to reduce any overhead it does have and keep that to a minimum, even for high-level functionality (IMO, that's part of D's own charter, after all).

But aside from that: For the sake of sensible code hygeine and maintainability, my intention for the near-term future of this library is to clean up the internal design and separate all the low-level communications code out of the higher-level interfaces. This will have the additional benefit of opening the door for further cleanup of the internal API, which will then become a new, optional, low-level API - once which I think should work better than the old original pre-v1.0.0 design (which I believe had somewhat of a high/low-level identity crisis and the API had various dark dangerous corners because of that). In long-term, I'm hoping that will also open up the door for additional DB backends, such as Postgres.

@schveiguy
Copy link
Collaborator

Regarding prepared statements: I generally use them once. I only use prepared statements instead of SQL directly because of the security features and benefits of doing so (I'm used to building SQL strings with PHP and doing escapes, and it's really annoying). I think possibly the only time I used the advantage of executing a prepared statement more than once was to insert an array of data, but that's very few and far between.

Regarding the abstractions, I like the idea of the prepared statement being an "implementation detail" for a connection, and having the connection automatically build prepared statements when it needs to, freeing them when it wants to. You may even be able to tune the freeing of statements based on some last-used cache. However, I also agree that this makes the abstraction too high-level.

I've been wanting to port mysql-native to iopipe as a test to see how easy network protocols might be with iopipe, and I also think there is a large benefit to having direct access to the lower-level protocols. For one, you can avoid the whole Variant thing if you have all the pre-determined knowledge of the data layout. So I'm in favor of having the low-level protocol split out into its own piece, and having the high level abstractions (which are very nice) in their own library. But I'm not sure how that would fit into your plans. At worst, I may be able to port your higher-level code on top of my port.

For what it's worth, I'm doing a similar thing with iopipe, where I don't want to deal with the actual i/o, but just the nice layer on top of it. Eventually, iopipe will depend on some low-level i/o library like https://github.com/MartinNowak/io.

@MartinNowak
Copy link

MartinNowak commented Jan 16, 2018

They don't play well with connection pools (see here) because they are tied to a particular connection (which is inherent to the way the MySQL client/server protocol works).

It's inherent because they can't delete prepared statements otherwise.

Get rid of Prepared's reference to a connection, as well as all of Prepared's exec/query functions. To execute a prepared statement, instead of calling exec/query on a Prepared, you'd pass a Prepared in place of an SQL string to an overload of Connection.exec or Connection.query....

Yes, Cassandra-Driver works like that https://docs.datastax.com/en/developer/java-driver/3.1/manual/statements/prepared/ neatly.

Since the reality is that "a connection HAS a prepared statement" and "prepared statements are a feature of a connection", not the other way around, each Connection would internally manage its own set of prepared statements, indexed by some connection-independent ID (probably just the SQL string itself, at least initially). If the user tries to use a statement that hasn't been prepared on this connection, the connection automatically creates it.

Careful with implicit expenses, I'd prefer an explicit method over hidden performance issues.

If the user manually releases a prepared statement, the prepared statement updates some master "to release" list and each Connection can check this list whenever appropriate and release its own (similar to what Connection already does for the sake of auto-purge). Offering finer-grained per-connection releasing would be possible, if ever needed (or perhaps even preferable to shoehorning a "release this statement on all connections" solution). No Prepared would ever have to know that connection pools (or connections) even exist, let alone which one is being used. Neatly encapsulated.

This looks like one idea, a prepared statement cache.
http://www.theserverside.com/news/1365244/Why-Prepared-Statements-are-important-and-how-to-use-them-properly
You're going to need a cache for many tasks anyhow, so being able to lookup a prepared statement for a given query is useful even for a single connection.

Another good idea, make the ConnectionPool Thread/Fiber-affine, so application code always gets the same connection.
https://stackoverflow.com/a/6094798/2371032
People who want to juggle with multiple connections could deal with the issue on their own.

@quickfur
Copy link

For SQL statements that are only executed once, I much prefer Adam Ruppe's sqlite.d API, where you just bind SQL placeholders directly via a variadic template function:

import arsd.sqlite;
auto db = new SQLite("database.sqlite");
auto rs = db.query("SELECT * FROM mytable WHERE name=?, price<?", "John Doe", 3.14159);
foreach (row; rs) { ... }

Of course, this is not very efficient if you're running the same SQL very frequently. But conceivably, the query method could store a cache of prepared statements keyed by string (or the string's .ptr, if it's literally the same code that's running in a loop), and skip the SQL compilation step if there's already a prepared statement that just needs to be rebound.

@Marenz
Copy link

Marenz commented Jan 16, 2018

I like the idea of a connection-agnostic prepared statement. It's actually exactly what I have built on top of mysql-native currently to use them at all. My usage for a prepared statement looks like this:

            auto con = managers.mysql.get();

            alias Prep = GPrep!"SELECT `end` FROM `match`
                                 WHERE match_id =
                                     (SELECT MAX(match_id)
                                      FROM match_player
                                 WHERE player_id = ? AND
                                       match_id != ?)";

            auto prep = con.grep(Prep());

            prep.setArgs(usrobj.playerid, match_id);

            auto val = prep.queryValue();

Implement like this:

struct GPrep ( string Sql )
{
    import util.fnv1a;
    immutable size_t id = fnv1a64(Sql); // fast hash function
    immutable sql = Sql;
}

    auto gprep ( Prep ) ( ref typeof(this.get()) con, Prep prep )
        if (__traits(isSame, TemplateOf!Prep, GPrep))
    {
        if (auto prep_st = prep.id in con.prepared_statements)
        {
            assert (prep_st.sql == prep.sql);
            assert (prep_st.isPrepared);
            return *prep_st;
        }
        else
        {
            return con.prepared_statements[prep.id] =
                prepare(con, prep.sql);
        }
    }

@quickfur
Copy link

@schveiguy Adam Ruppe's sqlite.d uses Variant internally to store columns, and personally I find Variant a pain in the neck to use. It's a fine idea in theory, but in real-world code it's just painful to use because you need to know the exact underlying representation of the data in order to be able to get anything meaningful out of it. There's no way to just pass it to std.conv.to and say "I don't care what the DB driver's internal data type is, just convert from whatever it is to type T if it's possible, otherwise throw an exception".

In my latest fit of NIH, I reinvented sqlite.d to do something like this instead:

struct Row {
    Column opIndex(size_t col) { ... }
    ...
}

struct Column {
    T opCast(T)() {
        import std.conv : to;
        switch (sqliteType) {
            case SQLITE_INT: ... return value.to!T;
            case SQLITE_VARCHAR: ... return value.to!T;
            ... // etc.
        }
    }
}

This allows me to write stuff like:

auto rs = db.query("SELECT * FROM myTable");
foreach (row; rs) {
    int field1 = row[0].to!int;
    string field2 = row[1].to!string;
    ... // etc.
}

Of course, once that's working, I went further by allowing automatic transcription to struct if the field names matched the column names:

struct S { string name; float price; }
auto rs = db.query("SELECT name AS name, price AS price FROM myTable");
foreach (row; rs) {
    S s = row.to!S;
    ... // do stuff with s
}

And of course, taken to the logical conclusion, automatic conversion of the result set to an array:

struct S { ... }
auto rs = db.query("...");
S[] data = rs.to!(S[]);

No fuss, no muss, abstract away the dirty implementation details, and let the underlying code worry about how to actually map database types to struct fields.

@Abscissa
Copy link
Author

Some responses:

@schveiguy:

I've been wanting to port mysql-native to iopipe as a test to see how easy network protocols might be with iopipe

I've had some trouble trying to make changes to mysqln's low-level i/o code in the past (and it is a bit of a mess), so one of my current top priorities for mysqln is to seriously clean up the internal protocol code. First step is to move all comms-handling code into the "protocol" sub-package (#144). Then, some further cleanup, including some of the "consume..." stuff. (I've actually been wanting to get to this for a long while, but mysqln unittests were...umm...seriously lacking at the time so I needed to hold off until now, now that I can be reasonably sure low-level changes aren't causing major breakages.)

Porting mysqln to iopipe might be difficult and problematic before that's done, so I'd recommend waiting until then, though you're certainly free to try.

@quickfur:

For SQL statements that are only executed once, I much prefer Adam Ruppe's sqlite.d API, where you just bind SQL placeholders directly via a variadic template function:

Yea, @schveiguy IIRC has also suggested that, and I agree it's a good feature. It's on the roadmap for mysqln: #118

I don't want any further delays to v2.0.0 besides getting in this reworking of Prepared, but I'm hoping to have #118 in the following release.

@quickfur:

There's no way to just pass [a Variant] to std.conv.to and say "I don't care what the DB driver's internal data type is, just convert from whatever it is to type T if it's possible, otherwise throw an exception".

Doesn't Variant.coerce do that? If not, that is disappointing.

@quickfur
Copy link

Variant.coerce works for a limited set of built-in types, such as coercing an int to a long, or parsing a string to an int or vice versa, but it is unable to handle general conversions that, for example, std.conv.to is able to handle. For example:

import std.conv;
import std.variant;

struct MyValue {
        int impl;
        auto opCast(T : MyOtherValue)() { // std.conv.to knows how to use this
                return MyOtherValue(impl);
        }
}

struct MyOtherValue {
        int impl;
}

void main() {
        auto mv = MyValue(123);
        MyOtherValue mov2 = mv.to!MyOtherValue; // OK

        Variant v = mv;
        MyOtherValue mov = v.coerce!MyOtherValue; // NG
}

The line marked // NG causes a compile-time error that coerce doesn't support the desired target type.

@Abscissa
Copy link
Author

Some refinements to my Prepared plans for v2.0.0:

  • struct Prepared itself shouldn't have any "release" or "register" functionality of its own, at all. That should be fully considered a charactaristic of the communications channel, not a charactaristic of statements.

  • Connection should have a .release(Prepared) to manually release a prepared statement from an individual connection. It'll safely do nothing if the statement hasn't been registered on the connection, or has already been released (as defined by the statement's SQL string).

  • Any functionality to manually release a Prepared from ALL connections should be in ConnectionPool. I'm undecided whether this would be essential for v2.0.0 or could be deferred until v2.1.0 or so.

  • Connection and ConnectionPool should also have .register(Prepared) (or is there a better name?) to manually create a prepared statement on a connection, for when the user prefers an eager setup of Prepared over lazy as-needed setup. If register is NOT called manually, then a prepared statement will be still be automatically registered when its actually used.

  • When a statement is manually registered on a ConnectionPool (as opposed to a Connection), The ConnectionPool will manually register it on all currenty-open connections. After this point, the statement will automatically be registered on all new connections, immediately upon each new connection's creation, until ConnectionPool.release is called on the statement.

  • In the future, other functionality could be added: Connection[Pool].releaseAllPrepared, Connection[Pool].releaseStalePrepared, bool Connection[Pool].autoReleaseStalePrepared, etc.

  • To clarify: From the perspective of a Connection, the uniqueness or identity of a prepared statement is to be defined by the statement's SQL string, not by the whole Prepared object (which wouldn't be realistic anyway, since Prepared is a struct, not a class).

  • All registration/release of prepared statements are subject to be internally delayed, at the Connection's discresion, until the next time a statement is issued on the Connection. This is necessary because there may be data pending on the connection. This delay is also completely safe, because a registered prepared statement cannot be used anyway until pending data is purged (as of v1.1.4, the next statement issued automatically purges any remaining data pending).

@schveiguy
Copy link
Collaborator

Sounds awesome!

@schveiguy
Copy link
Collaborator

schveiguy commented Jan 31, 2018

Variant.coerce works for a limited set of built-in types

But why would this matter? MySQL uses standard types such as int or string, not specialized types, I would think.

More: std.conv.to should be able to work with Variant. I would think that's something that's quite easy, as long as you can switch over all the possible types with the type that's stored.

@quickfur
Copy link

It matters if the database backend uses, e.g., special structures for storing dates (if the database engine has native support for it). If I'm swapping backends from one that uses, say, Date to store dates, to another that uses plain strings, then I don't want to be bothered to have to change my code to rewrite all instances of Date as string. Especially if it's generic code, all I really want to be able to say is, "convert column x to Date for me, I don't care how, just do it if you can, throw if you can't". I shouldn't have to know what underlying types could possibly be used to store column x.

@schveiguy
Copy link
Collaborator

I think the solution is to get std.conv.to to work with Variant.

@schveiguy
Copy link
Collaborator

Porting mysqln to iopipe might be difficult and problematic before that's done, so I'd recommend waiting until then, though you're certainly free to try.

It's not a straightforward "just use a different streaming library type". So it fundamentally will require rethinking how it's done.

What I will appreciate, however, is having an example implementation I can look at and see how things actually are done.

I may even have to read the protocol spec anyway.

@Abscissa
Copy link
Author

Abscissa commented Feb 2, 2018

Me:

All registration/release of prepared statements are subject to be internally delayed, at the Connection's discresion, until the next time a statement is issued on the Connection. This is necessary because there may be data pending on the connection. This delay is also completely safe, because a registered prepared statement cannot be used anyway until pending data is purged (as of v1.1.4, the next statement issued automatically purges any remaining data pending).

Turns out that delaying statement registration until after the next purge has some ugly consequences I wasn't anticipating[*]. Not sure why I was thinking it was needed anyway though - auto-purging when a statement is created isn't going to cause the kind of nasty surprise that auto-purging on statement release could've caused.

So creating a prepared statement won't be delayed: It'll occur immediately, auto-purging any data that might be pending. As before, only release will be delayed to avoid a surprise auto-purging.

([*] It would've meant that an instance of Prepared could potentially be in a not-yet-ready state, lacking certain necessary info from the server. This state would have necessarily been distinct from "was created, but is now released", and it led to a bunch of mess.)

@quickfur
Copy link

quickfur commented Feb 2, 2018

@schveiguy I looked at the std.variant code before. It's not so easy to just "make it work with std.conv.to", because Variant is supposed to be an open-ended union of all possible types. This is currently done via runtime typeids, which is why you need to know the underlying type before you get extract it (TypeInfos do not have a "can convert to" method, after all, so the best you can do is "is typeid(X) == typeid(Y)?"). But std.conv.to requires knowledge of the source and target types at compile-time. Presently, I don't have any good ideas about how to put the two together.

In my NIH SQLite binding, what I did was essentially:

struct Column {
    T opCast(T)() {
        import std.conv : to;
        switch (sqliteType) {
            case SQLITE_INTEGER: return impl.getInteger().to!T;
            case SQLITE_VARCHAR: returun impl.getString().to!T;
            ...
        }
    }
}

I used opCast because that's what std.conv.to understands, so user code could just call to!X on a Column object to get whatever type X that they want.

Now, arguably, I could have use Algebraic to encode the SQLite types and returned that instead, and presumably I'd be able to loop over the possible types at runtime and figured out which one would convert to the user's requested type. But it seemed like so much work for what the above admittedly-hackish opCast can do, in addition to having an uglier user-facing API.

Not to mention that by deferring the actual call to the SQLite library inside opCast, I saved the cost of converting SQLite types to D types for every column if the user code doesn't actually need to read every column. (E.g., if column C in the result set is only accessed for a subset of rows.)

@schveiguy
Copy link
Collaborator

Well, maybe then the answer is to use Algebraic instead. Surely the set of types the database supports is finite.

@schveiguy
Copy link
Collaborator

And does std.conv.to support Algebraic?

Abscissa added a commit that referenced this issue Feb 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants