Simple embedded database for java
This project is a practice for implementing a minimal database system. The practice includes different indexing mechanisms (B+Tree and Bitmaps for example), dealing with data storage on disk (Page and Page buffers), caching (LRU), locking (Reader-Writer Lock), parsing and other topics.
Notice: the idea of the project is still evolving! Anything that you may read here is open for changes in future.
The library implemented in this project repository empowers a java application to store data in collection-field
formats and it provides the sensation - and minimal features - of a database system.
The project is meant to be a practice so that the developer gets involved with some software engineering aspects and gain knowledge, and it may not solve a real world problem that other database implementations (including the embedded ones) don't solve already.
I started studying a tutorial called "Let's Build a Simple Database" which is about "writing a sqlite clone from scratch in C" till I met BTree algorithm section. Later, I understood more about BTree and B+Tree from "B Trees and B+ Trees. How they are useful in Databases" on Youtube, and eventually found myself working on this mini project.
The thought process, progress and more details of this project is explained on a youtube playlist called "Write a database from scratch". If you are visiting this repository from Youtube, welcome. If not, I suggest you to take a look at the playlist.
- Basic implementation of Tree-Based
Unique Index Management
using B+Tree - Cluster implementation of B+Tree
- All "collections" should have a default cluster B+Tree implementation despite their Primary key. Perhaps the Primary Key should point to the cluster id of a collection object (traditionally known as a row of a table).
- Implement UnsignedLong serializer, which would be cluster key type
- Update
SchemeManager
logic - Update
FieldIndexManagerProvider
logic (name should get updated as well, right?)
- Storage management of B+Tree
- Storage management using
CompactFileIndexStorageManager
andOrganizedFileIndexStorageManager
- Additional Storage Management implementation using
DatabaseStorageManager
(theDiskDatabaseStorageManager
usesPageBuffer
s)
- Storage management using
- Generic decorator pattern for
UniqueTreeIndexManager
- LRU Cache implementation for
UniqueTreeIndexManager
decorator. (More tests required) - Tree Index Manager Locks:
- Async decorator for
UniqueTreeIndexManager
to decorate multiple objects using a single decorator (locking operations of multiple IndexManagers using a single decorator) - Async decorator for
DuplicateIndexManager
to decorate multiple objects using a single decorator (locking operations of multiple IndexManagers using a single decorator) - The implementation should be able to combine lock for both Duplicate and Unique Index Managers (The final implementation may not even need the decorators, so not marking this as done for now, while it's implemented)
- Async decorator for
- Providing solution for differentiating
zero
(number) andnull
in a byte array. (Flag byte is used, which is not the best solution, bitmaps can help here) - Non-unique Index Manager
- B+Tree and
LinkedList
(binary) implementation - Bitmap implementation
- B+Tree and
- Database Storage Manager
- Disk Database Storage Manager Basics (Page Buffer)
- Write Queue to make writes Async (is it even safe/possible?)
- Query
- Implement
QueryableInterface
to support quicker query operations on IndexManagers (Done forLT
,LTE
,GT
,GTE
,EQ
) - Implement Query Class
- Implement
- Logging
Open Problems:
- Can't perform
query
operation on fields that are not indexed. Doing so right now will make us use cluster index, load objects into memory to perform comparisons, and the result would beIterator<V>
where V is cluster id. This means that these objects may later get loaded into memory again. We need a solution to avoid loading objects twice (once for query, once for the higher level operation such as read/update/delete) - Removed object tracer doesn't properly track pointers and size of the open space in the db file.
- If multiple sequential objects of different sizes are removed from DB file, multiple trace pointers are stored instead of a single one with larger size. Therefore, sequential removes should join as one.
- When a part trace object is used to refill db file, we should still store the remaining part as trace objects.