Folder/file Purpose
20230401T120858Z-001 - Data that is used src - Source Code of the Assignment my_solution.ipynb - Test case of my solution in ipynb Alternate_Solution.ipynb - Alternate solution Requirements.txt - Requirements to Run Reuslt.png - Auto generated output
Objective: Design & develop a pipeline to Extract data from unstructured documents
Here I proposed two solutions :
-
my_solution.ipynb : It is in traditional way where first Preprocessing of image with opencv and then use of OCR libraries to gain information from images.
-
Alternate_Solution.ipynb : It is a Deep Learning Pre traind Model with text detection and recognisation with pyTorch and OCR , which can be customized as per our use. I selected this because of the robust ML pipeline that is used first text detection (localizing words), then text recognition (identify all characters in the word), which gives the edge to extract information seamlessly.