This project is Parallel and Distributed Computing course's J-component ( at VIT Chennai). Here main idea is to match and search words efficiently among text files in a directory. The files are indexed using multiprocessing. The indexed files are then searched using a Trie. In other words, this project is word autocomplete using multiprocessing and trie.
Click here to read the report.
- Execute following command to download the project.
git clone https://github.com/krunalmk/TriePDC.git
- Extract the zip.
- Open terminal in the extracted folder.
- Execute following
- for indexing the text files execute
python3 reindexthefiles.py
- to get parallel prefix match for your input execute
python3 main.py <your word>
python3 main.py guten #Example: to get autocomplete suggestions from the text files for the word "guten".
- The texts from text files in the current directory are read.
- Characters like '.', ''', ',', ';', etc. are removed.
- The cleaned text from step 2 is stored in JSON format in a file. The structure of the JSON ( data.json) is as follows:
{ word: {
"File": {
"filename1.txt": {
"Line": [ i1, i2, i3, ..., in]
},
"filename2.txt": {
"Line": [ j1, j2, j3, ..., jn]
},
}
}
}
4. Multiprocessing concept is used to index the files efficiently.
- The data from JSON ( data.json) is read and stored in Trie.
- The Trie eases the process of searching. It is very efficient. For more information on Trie, click here
- Now the query prefix ( entered by user in terminal/ console) is matched in the Trie.
- If match is found then file name along with line numbers of word is returned. You have got the results! Yayy!