- Table detection: Using SOTA detectron2
- Table Line: Using architecture Unet + rule base
- OCR: Using SOTA easyocr
Data is private not public, you can learn on internet about tabular data, You can label data by labelme (wkentaro/labelme: Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation). (github.com))
Refers datasets
:
- https://www.icst.pku.edu.cn/cpdp/sjzy/index.htm
- https://paperswithcode.com/dataset/icdar-2013
- https://doc-analysis.github.io/tablebank-page/
Config params: file base_config.yaml
bash sh scripts/train.sh
bash sh scripts/infer_table_line.sh
Step 1: Table detection
Step 2: Table Line
Input:
Output:
Step 1: Table detection
Step 2: Table line
Step 3: Crop image according line
Step 4: OCR
Step 5: Save file csv/excel
sh scripts/infer_table_ocr.sh
Input: ./datasets/demo_examples/demo2.png
Output: ./results/demo.csv
docker run --name table_extraction nam157/table_extraction:v1.0.0