- Implement an MVP version for running TPH benchmark read queries.
- Benchmark
- Improve
- Implement an MVP version for running TPH benchmark transaction queries.
- Benchmark
- Improve
- Rinse and Repeat
- implement .dbinfo command. Pass
tests/test_dot_commands.rs cli_dbinfo
test.
Scan
- Implement DbHeader
- SqliteContextProvider: parse db header from file for tables and schemas
- Create DiskManager for centralizing physical ops instead of using BufferPool.
- Centralize init process in Database.
- Implement DbMeta to be able to parse DbHeader and schema objects
- Page cell_ptrs()
- DbMeta parses leaf_table_cells for first page
- Implement SchemaObject::parse(&LeafTableCell)
- Add parsing Columns from sql statement to SchemaObject::parse
- Implement field DbMeta.schema_objects
- Implement this conversion in SqliteContextProvider::new. From schema objects we can get table (name, cols, data types) for SqliteContextProvider potentially we need to convert sqlite type to arrow_schema types.
- ExecScan: implement Physical TableScan
- Implement one page scan
- PhysicalPlanner.plan(): Replace hardcoded ExecApplesScan with actual Scan.
- Scan Many B+tree pages
- implement dfs or bfs for scanning across multiple pages.
Projection
- select * from table1
- Implement Project by Column Name in ExecProjection: select col1 from table1
- select col1, col2 from table1
Selection
- Where
select col1 from table1 where col2='value'
- Where
IN
Aggregation
- Count:
select name, count(1) from apples group by name;
- Max:
select name, max(color) from apples group by name;
- Average
Component
- basic SQL to Logical Plan
- basic Logical Plan execution
- physical plan
- ColumnValue and DataRecord
- Parsing database
- Parsing table
- replace hardcoded ExecApplesScan by actual sqlite table scan
TODO, not prioritized yet.
Layering
ExecScan (PhysicalPlan)
---
BTree Module
---
BufferPool
where is it?
- above BufferPool, calls by PhysicalPlan (i.e. ExecScan) to scan
all table pages when reading from table (i.e.
select * from apples
) - should not access DiskManager of File directly,
- should call BufferPool to get page on disk.
example sequence
- tbl_name="apples" is parsed from sql str like select name from apples
- exec_scan calls BTree module to scan specific table name "apples" from query
- maybe BTree returns TableLeafCell, and they are parsed to DataRecord in ExecScan?
TODO
- Stage 1: can work with normal table
- scan table when table is on 1 page (table
apples
ororanges
insample.db
).- pass test
cli_sql_scan_table_single_page
- pass test
- scan table that spans multiple pages: interior and table leaf pages.
- add CellTableInterior (similar to TableLeafCell)
- traverse tree with DFS or BFS to return all leaf cells by going thru pointer in interior cell.
- pass test
cli_sql_scan_table_multiple_pages
- scan table when table is on 1 page (table
- Stage 2: can work with index table
- IndexLeafCell and IndexInteriorCell
- ???
References
- Sqlite dev.to series: Part 3 - Understanding the B-Tree and its role on database design
- (optional) Chapter 6, book Subsankar in readings.
- https://jvns.ca/blog/2014/10/02/how-does-sqlite-work-part-2-btrees/
- https://saveriomiroddi.github.io/SQLIte-database-file-format-diagrams/
-
Buffer Pool is a in-memory cache of pages from the database file on disk.
-
All access methods (read and write) MUST go through the buffer pool and not the database file directly.
-
SqliteContextProvider
andDbMeta
when SQLite starts, use buffer pool for parsing db header and metadata from first page. -
maintain dirty-flag for each page, set if page is modified.
-
Buffer Replacement Policy
implement LRU policy for page eviction when buffer is full.- maintain timestamp of when page was last accessed.
- when buffer is full, evict page with oldest timestamp.
- store timestamp in a data structure that allows efficient sorting and retrieving smallest.
TODO
TODO
- implement LogRecord
- LogManager
- CheckpointManager
- LogRecovery: reads log file from disk, redo and undo.