This is the second lab project of course COMP6521. The purpose is to merge and deduplicate two files with the same format using Bitmap indexes and analyze the time and io number of the process. The line number of files is about 10,000 and 5,000. They need to be put in the file /src/Data_Files
.
- Language: Java
- Method: Bitmap index
- Test Framework: JUnit4
- Generate three Bitmap indexes based on EmpID, Gender, and Dept information.
- Compress three Bitmap indexes to generate three compressed files.
- According to the information of the EmpID Bitmap index, find the corresponding lines in the original file and output them to a new file. This process can achieve sorting and deduplication.
- Merge two files to get the final output file. This process can deduplicate data again.