Skip to content

Fine-tuned the VGG16 model for real-time recognition of handwritten mathematical notations, incorporating dynamic bounding boxes and multi-symbol segmentation for enhanced accuracy.

License

Notifications You must be signed in to change notification settings

Roodaki/Math-Vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Math Vision:
Computer Vision for Handwritten Mathematical Notation Recognition

demo.mp4

Table of Contents

Intruduction

Math Vision is an advanced application that utilizes computer vision techniques for the recognition and classification of handwritten mathematical notations. The project leverages a convolutional neural network model, specifically transfer learning with VGG16 and fine-tuning using the Kaggle "Handwritten Math Symbols" dataset, to achieve high accuracy in recognition of a wide variety of mathematical symbols and expressions. This enables applications in educational tools, digital note-taking systems, and automated grading platforms.

Features

  1. Data Processing and Preparation:
    • The dataset used is the Kaggle "Handwritten Math Symbols" dataset, which consists of 100,000+ 45x45 pixel JPEG files. These files contain English alphanumeric symbols, math operators, set operators, and basic predefined math functions.
    • The application loads, normalizes and splits images into training, development, and test sets. Preprocessed data is saved to a compressed .npz file for training or fine-tuning a CNN model.
  2. Transfer Learning with VGG16: Model training leverages transfer learning with VGG16, a convolutional neural network model pre-trained on ImageNet. The model is fine-tuned using the pre-processed dataset to recognize mathematical symbols and expressions. Trained model weights and the entire model are saved in the h5 format for future use.
  3. Drawing Functionality: Users can draw on the canvas with various brush sizes and utilize undo/redo actions, along with a clear button for erasing all content.
  4. Fast Prediction: The application swiftly predicts the corresponding math symbols by processing the drawn handwriting on the canvas through the fine-tuned CNN. The accuracy of each prediction is displayed with colored text: green for predictions with accuracy above 90%, yellow for accuracy above 80%, and red for accuracy above 60%.
  5. Dynamic Bounding Box: The canvas dynamically draws bounding boxes around the drawn symbols with padding in real-time, supporting multiple bounding boxes in a single canvas, enhancing the visual feedback for users.
  6. Recognition of Multiple Symbols: The model recognizes and classifies multiple handwritten symbols captured in a single image, allowing users to input complex mathematical expressions in one go.

Project Structure

The project follows a specific structure to organize its files and directories:

math-notation-recognition-app/
├── data/
│   ├── dataset/                       # Directory containing the original dataset for math notation recognition.
│   └── processed_data/
│       └── math_notation_dataset.npz  # Compressed numpy file containing preprocessed data.
│
├── models/
│   ├── saved_models/
│   │   ├── model_weights.weights.h5   # Weights of the trained model.
│   │   └── trained_model.h5           # Complete trained model including architecture and weights.
│   │
│   └── train_model.py                 # Python script for training the model using transfer learning with VGG16.
│
├── ui/
│   ├── resources/
│   │   └── styles/
│   │       └── stylesheet.qss         # Stylesheet file for UI styling.
│   │
│   ├── canvas_widget.py               # Widget for drawing on the canvas.
│   ├── main_window.py                 # Main application window.
│   └── prediction_result_widget.py    # Widget for displaying prediction results.
│
├── utils/
│   ├── bounding_box.py                # Utility functions for calculating bounding boxes.
│   ├── data_processing.py             # Module for loading, preprocessing, and splitting image data.
│   ├── constants.py                   # File for storing constant values.
│   ├── image_processing_utils.py      # Utility functions for image processing.
│   └── symbol_segmentation_utils.py   # Utility functions for symbol segmentation.
│
├── main.py                            # Main script file responsible for initializing the application and setting up the main window.
├── .gitignore                         # Specifies which files and directories should be ignored by Git version control.
├── requirements.txt                   # Lists the project's dependencies.
└── README.md                          # Documentation file providing information about the project.

Getting Started

Requirements

  • Python libraries:
    • Numpy
    • TensorFlow
    • scikit-learn
    • OpenCV
    • Pillow
    • PyQt
  • "Handwritten Math Symbols" Dataset: Download the dataset from Kaggle and place it in data/dataset/.
  • System Requirements
    • OS: Any platform supported by PyQt5, OpenCV, TensorFlow, and Keras.
    • Memory: At least 8 GB RAM is recommended for training models.

Environment Setup

  1. Install Python: Ensure Python 3.X is installed. If not, download and install it from python.org.
  2. Clone the Repository:
    git clone https://github.com/Roodaki/Math-Vision.git
    cd math-vision
    
  3. Install required packages: Use pip to install the necessary Python libraries:
    pip install -r requirements.txt
    
  4. Run the program:
    python main.py
    

Future Developments

  • Real-time Prediction: Implement real-time prediction capabilities to classify symbols as they are drawn on the canvas. This enhancement will provide immediate feedback to users and improve the interactive experience.
  • Camera Support: Integrate camera support to enable users to capture handwritten math symbols directly from a webcam or camera-equipped device. This feature will expand the application's usability and facilitate real-time input.
  • Integration with Additional Datasets: Incorporate additional datasets containing handwritten math symbols to further train and validate the model. This step will improve the model's accuracy and robustness across a wider range of handwritten styles and symbols.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Fine-tuned the VGG16 model for real-time recognition of handwritten mathematical notations, incorporating dynamic bounding boxes and multi-symbol segmentation for enhanced accuracy.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages