Skip to content

This is an educational project to understand the basic concepts of Log Structured Data Bases. It uses txt files to store key value pairs with two different files.

Notifications You must be signed in to change notification settings

sineshashi/SSTDBEngine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Basic Database Implementation

This project implements a basic database-like storage solution in Python. It is designed for educational purposes to gain an understanding of key database concepts, design patterns, and code modularity.

Objective

The main objective of this project is to create a simplified database-like system that demonstrates essential concepts, including key-value storage, sorted blocks, temporary and persistent storage, and basic read and write operations.

Motivation

The motivation behind this project arose during my study of data-intensive applications and the concept of SSTables (Sorted String Tables). As I delved into SSTables, I was intrigued by their fundamental role in various database systems for efficient data storage and retrieval. This led me to embark on a journey to comprehend the core principles of SSTables and their implementations.

Features

  • Line Class: Represents individual key-value pairs.
  • SortedBlock Class: Manages sorted blocks of key-value pairs, with support for getting, setting, and merging operations.
  • TemporaryWrite Class: Provides temporary write storage for incoming data, with automatic transfer to persistent storage on overflow.
  • PersistentRead Class: Handles reading and merging of data from a persistent storage file.
  • Storage Class: Integrates temporary and persistent storage to provide a complete storage solution.

Usage

  1. Clone the repository.
  2. Navigate to the project directory
  3. Run the unit tests: python -m unittest discover tests

Current State and Limitations

  • The current implementation is a basic educational project and may not be suitable for production use.
  • This loads all the data in RAM when started and the uses binary search to get which can be bottleneck and can be resolved by distributed read files instead of maintaining single persistent file.
  • The system's efficiency and performance may not be optimized for handling large datasets and high concurrency.

Potential Improvements

  • Add the distributed persistent read files instead of single file because loading all the data to the disc may not be feasible.
  • Implement advanced indexing and data structures to enhance data retrieval efficiency.
  • Explore parallel processing and optimized storage mechanisms for better performance with larger datasets.
  • Consider implementing caching mechanisms to reduce disk I/O and improve read speeds.
  • Introduce concurrency control mechanisms to handle simultaneous read and write operations.

About

This is an educational project to understand the basic concepts of Log Structured Data Bases. It uses txt files to store key value pairs with two different files.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages