Skip to content

Design & develop a pipeline to Extract data from unstructured documents

Notifications You must be signed in to change notification settings

pitbuk101/Text-Recognizing-with-Trained-Model

Repository files navigation

Folder/file Purpose

20230401T120858Z-001 - Data that is used src - Source Code of the Assignment my_solution.ipynb - Test case of my solution in ipynb Alternate_Solution.ipynb - Alternate solution Requirements.txt - Requirements to Run Reuslt.png - Auto generated output

Objective: Design & develop a pipeline to Extract data from unstructured documents

Here I proposed two solutions :

  • my_solution.ipynb : It is in traditional way where first Preprocessing of image with opencv and then use of OCR libraries to gain information from images.

  • Alternate_Solution.ipynb : It is a Deep Learning Pre traind Model with text detection and recognisation with pyTorch and OCR , which can be customized as per our use. I selected this because of the robust ML pipeline that is used first text detection (localizing words), then text recognition (identify all characters in the word), which gives the edge to extract information seamlessly.

About

Design & develop a pipeline to Extract data from unstructured documents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published