demo.mp4
- Table of Contents
- Intruduction
- Features
- Project Structure
- Getting Started
- Future Developments
- License
Math Vision is an advanced application that utilizes computer vision techniques for the recognition and classification of handwritten mathematical notations. The project leverages a convolutional neural network model, specifically transfer learning with VGG16 and fine-tuning using the Kaggle "Handwritten Math Symbols" dataset, to achieve high accuracy in recognition of a wide variety of mathematical symbols and expressions. This enables applications in educational tools, digital note-taking systems, and automated grading platforms.
Data Processing and Preparation
:- The dataset used is the Kaggle "Handwritten Math Symbols" dataset, which consists of 100,000+ 45x45 pixel JPEG files. These files contain English alphanumeric symbols, math operators, set operators, and basic predefined math functions.
- The application loads, normalizes and splits images into training, development, and test sets. Preprocessed data is saved to a compressed .npz file for training or fine-tuning a CNN model.
Transfer Learning with VGG16
: Model training leverages transfer learning with VGG16, a convolutional neural network model pre-trained on ImageNet. The model is fine-tuned using the pre-processed dataset to recognize mathematical symbols and expressions. Trained model weights and the entire model are saved in the h5 format for future use.Drawing Functionality
: Users can draw on the canvas with various brush sizes and utilize undo/redo actions, along with a clear button for erasing all content.Fast Prediction
: The application swiftly predicts the corresponding math symbols by processing the drawn handwriting on the canvas through the fine-tuned CNN. The accuracy of each prediction is displayed with colored text: green for predictions with accuracy above 90%, yellow for accuracy above 80%, and red for accuracy above 60%.Dynamic Bounding Box
: The canvas dynamically draws bounding boxes around the drawn symbols with padding in real-time, supporting multiple bounding boxes in a single canvas, enhancing the visual feedback for users.Recognition of Multiple Symbols
: The model recognizes and classifies multiple handwritten symbols captured in a single image, allowing users to input complex mathematical expressions in one go.
The project follows a specific structure to organize its files and directories:
math-notation-recognition-app/
├── data/
│ ├── dataset/ # Directory containing the original dataset for math notation recognition.
│ └── processed_data/
│ └── math_notation_dataset.npz # Compressed numpy file containing preprocessed data.
│
├── models/
│ ├── saved_models/
│ │ ├── model_weights.weights.h5 # Weights of the trained model.
│ │ └── trained_model.h5 # Complete trained model including architecture and weights.
│ │
│ └── train_model.py # Python script for training the model using transfer learning with VGG16.
│
├── ui/
│ ├── resources/
│ │ └── styles/
│ │ └── stylesheet.qss # Stylesheet file for UI styling.
│ │
│ ├── canvas_widget.py # Widget for drawing on the canvas.
│ ├── main_window.py # Main application window.
│ └── prediction_result_widget.py # Widget for displaying prediction results.
│
├── utils/
│ ├── bounding_box.py # Utility functions for calculating bounding boxes.
│ ├── data_processing.py # Module for loading, preprocessing, and splitting image data.
│ ├── constants.py # File for storing constant values.
│ ├── image_processing_utils.py # Utility functions for image processing.
│ └── symbol_segmentation_utils.py # Utility functions for symbol segmentation.
│
├── main.py # Main script file responsible for initializing the application and setting up the main window.
├── .gitignore # Specifies which files and directories should be ignored by Git version control.
├── requirements.txt # Lists the project's dependencies.
└── README.md # Documentation file providing information about the project.
- Python libraries:
- Numpy
- TensorFlow
- scikit-learn
- OpenCV
- Pillow
- PyQt
- "Handwritten Math Symbols" Dataset: Download the dataset from Kaggle and place it in
data/dataset/
. - System Requirements
- OS: Any platform supported by PyQt5, OpenCV, TensorFlow, and Keras.
- Memory: At least 8 GB RAM is recommended for training models.
- Install Python: Ensure Python 3.X is installed. If not, download and install it from python.org.
- Clone the Repository:
git clone https://github.com/Roodaki/Math-Vision.git cd math-vision
- Install required packages: Use pip to install the necessary Python libraries:
pip install -r requirements.txt
- Run the program:
python main.py
Real-time Prediction
: Implement real-time prediction capabilities to classify symbols as they are drawn on the canvas. This enhancement will provide immediate feedback to users and improve the interactive experience.Camera Support
: Integrate camera support to enable users to capture handwritten math symbols directly from a webcam or camera-equipped device. This feature will expand the application's usability and facilitate real-time input.Integration with Additional Datasets
: Incorporate additional datasets containing handwritten math symbols to further train and validate the model. This step will improve the model's accuracy and robustness across a wider range of handwritten styles and symbols.
This project is licensed under the MIT License - see the LICENSE file for details.