The KLiK Engine is a C++ Powered File Search Engine for the Enron Email Sample Dataset
C++
PHP 5.6.40
SQL 14.0
HTML5
CSS3
- Enron Email Dataset
- Size: 1.54 GB
Visual Studio 2017
WampServer Stack 3.0.6
Windows 10
MySQL Database 8.0.13
MySQLx APIs
C++ Boost Library
BootStrap v4.2.1
Details of important Features of the Application
- Forward Indexing:
300000 files/s
Incremental Processing (10000 files): 10 min
Total Time: 3hr
- Reverse Indexing
352000 files/s
Incremental Processing (10000 files): 15 min
Total Time: 2.4hr
- Querying
Single Word Querying: 0.1 - 0.7 sec
Multi Word Querying: 0.4 - 2.3 sec
- Implementation of
C++ Boost Library
to facilitate in I/O processes, since the dataset had many small files. - Email Files loaded into memory at an
increment of 10000
, followed by mass processing of all loaded files. After that, the memory was freed and the process was started anew for the next 10000 files. Stopping Words
filtered out of the email files- Implementation of
MySQLx APIs
for SQL connections. - Implementation of
unordered maps
for memory performance enhancement Time calculation
of entire as well as the incremental processes.
- Implementation of forward Indexer for reverse index creation
Incremental File Processing
like in forward indexing.Time Calculation
for the incremental and complete processes- Implementation of
ranking
to ease in later searching - Implementation of
Relevance Ranking
- Implementation of
Search Normalization
to prevent misuse of the ranking system by too many same words in a common file.
-
Implementation of
reverse index
in searching -
Calculation of
document score
andinverse document score
forrelevance ranking
. -
Retrieval of
search query/string
from theGUI
-
Top 15
results returned from calculated search results. -
Stopping words
safely removed fromsearch string
score
calculation of each result and ordering in descending order.
score
of results concerningkey-words
belonging to same filesmultiplied
to get common score.- implementation of
ordered maps
for automatic ordering of results with respect to their scores
- Created in
PHP
/HTML5
&CSS3
- implementation of
BootStrap4
Framework for a presentable interface - Passing of input search query to the
C++ Searcher script
and receiving list of results as output. - Display of all results with
email subject
astitle
along with the file path - The result titles are
file links
redirecting to a new browser windows displaying all of the relevantfile content
. - Implementation of
time calculation
on the GUI so user can see thequery time
as well
- Optimization (in components like indexing)
- Implementing of more advanced
indexing
andranking
algorithms - Continuous Bug fixes and improvements
A huge thanks to the wonderful team without which this entire project would not have been possible. Check out their profiles and star their repos! :)
msaad1999 | mshaharyar17 | ahmed | aitasadduq |
Check out the complete project for this login system. KLiK is a complete Social Media website, along with a Complete Login/Registration system, Profile system, Chat room, Forum system and Blog/Polls/Event Management System.
Check out KLiK here
Do star my projects! :)
If you liked my work, please show support by
starring
the repository! It means a lot to me, and is all im asking for.