Simple Search Engine

Royal Institute of Technology KTH - Stockholm

Simple Search Engine

A simple search engine to index a corpus of documents and search for words with specific query paramteres. This project is part of the course ID1020 Algorithms and Data Structures.

This repository contains code written during the fall semester 2016 by Simone Stefani

Structure

Description

Index: a HashMap that contains all the indexed words as word-list_of_postings key-value pairs.
ResultDocument: an object that links a word (or a set of word) with a document that contains it. It refers to a specific document and carries properties related to the words such as hits, populairty and relevance (as tf-idf).

The search engine contains other two HashMaps:

DocumentsLength: keeps track of the length of each processed document.
Cache: contains cached queries

The the postings (resultDocuments) for each word are sorted dynamically at insertion. Consequently they can be retrieved through binary search.

When the user input query string is processed a parsedQuery is returned in the form of nested sub-query objects. Consequently when searching for a complex query, the parsedQuery can be analysed recursively and the fundamental queries can be then combined with operators.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Royal Institute of Technology KTH - Stockholm

Simple Search Engine

Structure

Description

About

Releases

Packages

Languages

License

SimoneStefani/simple-search-engine

Folders and files

Latest commit

History

Repository files navigation

Royal Institute of Technology KTH - Stockholm

Simple Search Engine

Structure

Description

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages