Skip to content

sepgh/testudo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Testudo

Simple embedded database for java

This project is a practice for implementing a minimal database system. The practice includes different indexing mechanisms (B+Tree and Bitmaps for example), dealing with data storage on disk (Page and Page buffers), caching (LRU), locking (Reader-Writer Lock), parsing and other topics.


Introduction

Notice: the idea of the project is still evolving! Anything that you may read here is open for changes in future.

The library implemented in this project repository empowers a java application to store data in collection-field formats and it provides the sensation - and minimal features - of a database system.

The project is meant to be a practice so that the developer gets involved with some software engineering aspects and gain knowledge, and it may not solve a real world problem that other database implementations (including the embedded ones) don't solve already.

I started studying a tutorial called "Let's Build a Simple Database" which is about "writing a sqlite clone from scratch in C" till I met BTree algorithm section. Later, I understood more about BTree and B+Tree from "B Trees and B+ Trees. How they are useful in Databases" on Youtube, and eventually found myself working on this mini project.

The thought process, progress and more details of this project is explained on a youtube playlist called "Write a database from scratch". If you are visiting this repository from Youtube, welcome. If not, I suggest you to take a look at the playlist.

Development Progress

  • Basic implementation of Tree-Based Unique Index Management using B+Tree
  • Cluster implementation of B+Tree
  • All "collections" should have a default cluster B+Tree implementation despite their Primary key. Perhaps the Primary Key should point to the cluster id of a collection object (traditionally known as a row of a table).
    • Implement UnsignedLong serializer, which would be cluster key type
    • Update SchemeManager logic
    • Update FieldIndexManagerProvider logic (name should get updated as well, right?)
  • Storage management of B+Tree
    • Storage management using CompactFileIndexStorageManager and OrganizedFileIndexStorageManager
    • Additional Storage Management implementation using DatabaseStorageManager (the DiskDatabaseStorageManager uses PageBuffers)
  • Generic decorator pattern for UniqueTreeIndexManager
  • LRU Cache implementation for UniqueTreeIndexManager decorator. (More tests required)
  • Tree Index Manager Locks:
    • Async decorator for UniqueTreeIndexManager to decorate multiple objects using a single decorator (locking operations of multiple IndexManagers using a single decorator)
    • Async decorator for DuplicateIndexManager to decorate multiple objects using a single decorator (locking operations of multiple IndexManagers using a single decorator)
    • The implementation should be able to combine lock for both Duplicate and Unique Index Managers (The final implementation may not even need the decorators, so not marking this as done for now, while it's implemented)
  • Providing solution for differentiating zero (number) and null in a byte array. (Flag byte is used, which is not the best solution, bitmaps can help here)
  • Non-unique Index Manager
    • B+Tree and LinkedList (binary) implementation
    • Bitmap implementation
  • Database Storage Manager
    • Disk Database Storage Manager Basics (Page Buffer)
    • Write Queue to make writes Async (is it even safe/possible?)
  • Query
    • Implement QueryableInterface to support quicker query operations on IndexManagers (Done for LT, LTE, GT, GTE, EQ)
    • Implement Query Class
  • Logging

Open Problems:

  • Can't perform query operation on fields that are not indexed. Doing so right now will make us use cluster index, load objects into memory to perform comparisons, and the result would be Iterator<V> where V is cluster id. This means that these objects may later get loaded into memory again. We need a solution to avoid loading objects twice (once for query, once for the higher level operation such as read/update/delete)
  • Removed object tracer doesn't properly track pointers and size of the open space in the db file.
    • If multiple sequential objects of different sizes are removed from DB file, multiple trace pointers are stored instead of a single one with larger size. Therefore, sequential removes should join as one.
    • When a part trace object is used to refill db file, we should still store the remaining part as trace objects.

About

Simple embedded java database

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages