Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Sqlite support for sub-queries and CTE's #1816

Merged
merged 21 commits into from
Jun 7, 2022

Conversation

tyrelr
Copy link
Contributor

@tyrelr tyrelr commented Apr 18, 2022

Improves the datatype and null tracking for sub-queries and common table expressions through a series of independent changes:

  • separating the 'OpenEphemeral' and 'OpenAutoindex' operations from the regular/normal cursor read operations
  • unify tracking nullability, inline with the datatype for registers AND cursors, instead of a separate hashtable only tracking cursors, so that nullability will be retained when going through the register. Retrieve not-null column metadata at the same time, and include temp-tables schema too.
  • track all branches of OpCode execution, including following both sides of branch instructions, then merging the possible result rows.

@tyrelr
Copy link
Contributor Author

tyrelr commented Apr 18, 2022

Due to the size of the change, this could cause changes in the datatype for some queries. In particular:

  • Bind-variable values are now known to be nullable in the query output
  • If a query returns a combination of datatypes using the same column, it may pick a different datatype now.

Downsides of this approach include:

  • Issues with multi-type column handling are not resolved
  • Sqlite doesn't guarantee OpCode stability. Implementing more opcodes increases exposure to breaking changes.
  • Increased complexity makes reasoning about behaviour even more difficult, it may be worth extending the QueryLogger to be able log evidence? Trace-logging the program state at each 'break;' abandoning a particular path-of-execution, or at each 'ResultRow' could be helpful for diagnostics.

@tyrelr tyrelr changed the title Ephemeral tables Improve Sqlite support for sub-queries and CTE's Apr 18, 2022
@liningpan
Copy link
Contributor

This feature seems to be causing very severe performance and correctness regression. Nullability inference doesn't seem to work nearly as well as previous versions. Replacing HashMap with Vec in the logging module can only cut runtime by half, which is still several orders of magnitude slower than 0.5.x.

My main question is that is it really necessary to simulate all possible execution path? Meaning that can we reduce the search space by tracking unique (instruction, Vec<registry type>, Vec<cursor/pointer type>) tuples, which means as long as the register and cursor state stay the same, we don't need to search the an instruction multiple times.

@liningpan
Copy link
Contributor

Just for reference, the example query in this issue #1921 searched 1769536 states.

INSERT INTO x (a_id, b_id, c_id) VALUES ((SELECT 1 FROM a), (SELECT 2 FROM b), (SELECT 3 FROM c)) RETURNING id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants