Computer Vision Projects

Welcome to the Computer Vision Projects repository! This collection of projects demonstrates the power of computer vision techniques applied to real-world problems. From image classification to object detection, each project is designed to help you explore and learn the fundamentals of computer vision.

About

This repository contains a diverse range of computer vision projects that utilize state-of-the-art models and libraries. Each project is structured with clean and well-documented code, making it easy to understand and replicate the results.

Whether you're a beginner looking to learn the basics or an experienced practitioner, these projects will help you deepen your knowledge of computer vision concepts such as image recognition, segmentation, and object detection.

Projects

1. Image Classification

Description: A project that classifies images into predefined categories using convolutional neural networks (CNNs).
Key Features:
- Utilizes transfer learning with pre-trained models.
- Achieves high accuracy on various image datasets.
Technologies Used: Python, TensorFlow, Keras

2. Object Detection

Description: Detect and classify objects in images and videos in real time.
Key Features:
- Implements YOLO and SSD models.
- Real-time object tracking and bounding box creation.
Technologies Used: Python, OpenCV, PyTorch

3. Image Segmentation

Description: Segment different regions of an image using deep learning techniques.
Key Features:
- U-Net architecture for accurate pixel-wise segmentation.
- Applications in medical imaging, autonomous driving, etc.
Technologies Used: Python, PyTorch, OpenCV

(Add additional projects here as needed)

Model Explanations

To better understand the working of these projects, here are some details on the key architectures:

1. YOLO (You Look Only Once):

Most models perform object detection in two parts- they make predictions about the presence and then the location of the object in the image. As opposed to this, the YOLO architecture treats the tasks of object identification and classification as a single step. Hence, they are faster, making them suitable for real-time object detection. YOLO Architecture models are open source, so there is a supportive community out there that fosters the computer vision family. Read more on the GitHub Page.

How it Works:

Image Division: The image is divided into an N x N grid, with each grid cell responsible for detecting objects within it.

Bounding Boxes and Class Scores: For each grid cell, YOLO predicts bounding boxes (coordinates, height, and width) and confidence scores. The confidence score indicates how likely an object is present.
Class Prediction: Each bounding box is assigned a probability for each class (like person, smoke, ball), indicating what type of object it likely contains. Non-Maximum Suppression: Finally, YOLO applies a post-processing step to remove duplicate boxes, ensuring only the most accurate detections are kept.

Applications of YOLO Architecture: Owing to its high inference speed, it is used in video surveillance, self-driving cars, etc. (Fun fact: YOLOv3 was used during COVID-19 for estimating social distance violations in public.)

The Future for YOLO:

While the speed of YOLO is ideal, its limitations at present are its rigid grid system and lack of accuracy with small and crowded objects. Future versions may use dynamic grid systems adapted to scene complexity. Better feature extraction and context understanding can be done by the integration of transformer architectures and attention mechanisms. The next frontier for YOLO includes neural architecture search for optimal design and hardware-specific optimization. This would provide models that result in higher accuracy while keeping the computation cost at a minimum.

2. SSD (Single Shot Multibox Detector) Model:

This model, like YOLO, combines object detection and classification in a single step. The key difference is that it uses multiple layers to scan the image at different scales. The process consists of two parts- extraction of feature maps and applying convolution filters for object detection. Here is what the architecture looks like:

Why use SSD?: As compared to YOLO, the SSD model is better at detecting smaller or larger objects present in images accurately, even in complex scenarios. One should use this model if they want the perfect tradeoff between model accuracy and speed.

How it Works:

Image Division: A series of grids at multiple scales using several feature maps are used to divide the image. As mentioned above, the grids can detect objects of varying sizes.
Anchor Boxes: Each grid cell contains multiple anchor boxes of different shapes and sizes, guessing where an object might appear. Which the help of the boxes, one can predict multiple objects present the same area.
Class and Location Predictions: For each anchor box, SSD predicts the class and the precise location of the object.
Non-Maximum Suppression: To remove duplicate detections, it keeps only the most confident predictions to ensure accurate results.

Applications of SSD Models:

This model is best used for traffic monitoring (vehicle classification), manufacturing quality control (to detect minor defects in different types of products), and retail (instant inventory counts, crowded shelves with different items can be analysed too).

The Future of SSD

The future of SSD lies in the removal of dependence on its fixed feature pyramids and static default boxes. Adaptive architectures, that mould according to the features of their input, the static boxes as a learnable framework adapted to particular detection scenarios are some improvements to look for. Another major improvement area is the enhancement of features through cross-layer interaction and semantic-aware fusion. Lightweight backbones and feature reuse strategies will efficiently reduce computational overhead with minimal loss in detection accuracy. Integration with edge computing and custom hardware accelerators will expand deployment possibilities. Learn more about SSD Architecture here.

3. U-Net:

This convolutional Neural Network Architecture is used for image segmentation. This architecture is popular since it works well even on limited training data. U-Net was heavily used in the healthcare field (such as identifying tumors), but now is applied in various other domains as well.

Why use U-Net?:

U-Net is particularly effective for tasks that require precise outline of objects as it can generate high-quality segmentation maps. The architecture allows it to retain important spatial information while downsampling, ideal for scenarios where fine detail is necessary, like in medical imaging or satellite imagery.

How it Works:

Two-Part Structure: U-Net consists of an encoder and a decoder. The encoder reduces the image's size and extracts important features. The decoder then enlarges this reduced representation back to the original size, focusing on reconstructing the details.
Skip Connections: This directly links the encoder and decoder layers, helping preserve important features lost during downsampling so that that the output retains necessary details for accurate segmentation.
Pixel-by-Pixel Classification: At the final stage, U-Net evaluates each pixel in the image to classify it into specific categories (like background, tumor, etc.).

Applications of U-Net:

Apart for being used in the medical field for tumor detection, organ segmentation and research in pathology, U-Net is also used in agriculture for weed detection and crop health monitoring; in image restoration where we remove noisy elements, and even in artistic applications to transfer and combine different styles or enhance features.

The Future of U-Net:

As U-Net continues to evolve, its integration with advanced techniques like generative adversarial networks (GANs) could enhance its capabilities in generating high-quality segmentation maps. You can also explore this repository to learn more about U-Net and its applications.

Installation

To get started with any of these projects, clone this repository to your local machine:

git clone https://github.com/Aryan-Chharia/Computer-Vision-Projects.git

Then, navigate to the project directory and install the required dependencies:

cd <project-folder>
pip install -r requirements.txt

Make sure you have Python 3.8+ installed on your system.

Usage

Each project contains detailed instructions in its respective folder. For general usage:

Navigate to the project folder.
Run the main script to start training/inference.
Follow the instructions in the README of each project.

Example for running an object detection model:

python object_detection.py --input <input_image_or_video>

Technologies Used

Languages: Python
Frameworks: TensorFlow, PyTorch, Keras
Libraries: OpenCV, Scikit-learn, Matplotlib, NumPy

Contributing

We welcome contributions from the community! To contribute:

Fork the repository.
Create a new branch.
```
git checkout -b feature-branch
```
Make your changes and commit
```
git commit -m 'Add new feature
```
Push to the branch
```
git push origin feature-branch
```
Open a pull request. Check the Contributing Guidelines for more details.

👥 Our Valuable Contributors ❤️✨

Thanks to all the amazing people who have contributed to Computer-Vision-Projects! 💖

License

This repository is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 369 Commits
Real Time Oject Measurement		Real Time Oject Measurement
.github		.github
.idea		.idea
.venv		.venv
AI Anime Avatar Generator		AI Anime Avatar Generator
AI Super-Resolution and Image Restoration		AI Super-Resolution and Image Restoration
AI Virtual Painter		AI Virtual Painter
Age Detection		Age Detection
Ai 3d Construction		Ai 3d Construction
Anomaly Detection in Industrial Inspection		Anomaly Detection in Industrial Inspection
Auto-Encoder for Noising and Denoising Images		Auto-Encoder for Noising and Denoising Images
Automatic Number Plate Recognition using Yolov9		Automatic Number Plate Recognition using Yolov9
Blinking and Drowsiness Detection		Blinking and Drowsiness Detection
Blood Cell Cancer Detection using CNN and EfficientNetB3		Blood Cell Cancer Detection using CNN and EfficientNetB3
Brain Tumor Detection		Brain Tumor Detection
CLIP Image Classification		CLIP Image Classification
Camera Calibration		Camera Calibration
Car Counter		Car Counter
Car Detection using Drone		Car Detection using Drone
CarParkProject		CarParkProject
Clothing Classifier		Clothing Classifier
Computer-Vision-Projects		Computer-Vision-Projects
Covid_Facemask_Detection		Covid_Facemask_Detection
Deep-Fake-Video-Detection		Deep-Fake-Video-Detection
Defective Captcha Image Recognition		Defective Captcha Image Recognition
Detection through OpenCV functions		Detection through OpenCV functions
Diabetes-Prediction		Diabetes-Prediction
Digital Image Tampering Detection		Digital Image Tampering Detection
Digital-whiteboard-reader-model		Digital-whiteboard-reader-model
Document Scanner		Document Scanner
Dog breed classification		Dog breed classification
ECG_Report_Analyzer		ECG_Report_Analyzer
Eye Diseases Detection		Eye Diseases Detection
Face-Liveness-Detection-System		Face-Liveness-Detection-System
Face_Landmark		Face_Landmark
Facial Emotion Recognition		Facial Emotion Recognition
Find Coordinates		Find Coordinates
Fire-Detection-YOLOv8		Fire-Detection-YOLOv8
Format Converter		Format Converter
Fruit Classification		Fruit Classification
Generate ML model with video		Generate ML model with video
Gesture Volume		Gesture Volume
Gesture-Based Keyboard Interface Using Webcam		Gesture-Based Keyboard Interface Using Webcam
Hand Game Controller		Hand Game Controller
Image Captioning		Image Captioning
Image Segment in real time		Image Segment in real time
Image Stitching Project		Image Stitching Project
Image_Deblurring		Image_Deblurring
Image_Segmentation		Image_Segmentation
Images_Read_Me		Images_Read_Me
Lenses detector		Lenses detector
Leukaemia Classification		Leukaemia Classification
LipNet		LipNet
Lung-Cancer-Detection		Lung-Cancer-Detection
Mouse Control using Eye Tracking		Mouse Control using Eye Tracking
Multimodal VQA using BLIP & CLIP		Multimodal VQA using BLIP & CLIP
Object Size Detection		Object Size Detection
Object detection		Object detection
Plant_Disease_Detection_System		Plant_Disease_Detection_System
QR Code Scanner		QR Code Scanner
QuizMaster		QuizMaster
Real time Boxers Detection		Real time Boxers Detection
Real-Time Neck Posture Analysis with Health Alerts		Real-Time Neck Posture Analysis with Health Alerts
Real-Time-Human-Detection-Counting		Real-Time-Human-Detection-Counting
Real_Time_Sketch_Effect		Real_Time_Sketch_Effect
Road Condition Monitoring		Road Condition Monitoring
Rock-Paper-Scissor-Game		Rock-Paper-Scissor-Game
Self Driving Car		Self Driving Car
Sign-Language-to-Speech		Sign-Language-to-Speech
Signature Verification System		Signature Verification System
Style-Transfer		Style-Transfer
Sudoku Solver		Sudoku Solver
Terrain_Classification		Terrain_Classification
Traffic Sign Detection		Traffic Sign Detection
Virtual Calculator		Virtual Calculator
WEAPON-DETECTION-SYSTEM		WEAPON-DETECTION-SYSTEM
Waste Classification		Waste Classification
data		data
image-to-3D model		image-to-3D model
image-to-text model		image-to-text model
pose detection		pose detection
smart_attendance_system		smart_attendance_system
unconditional image generation		unconditional image generation
.dvcignore		.dvcignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
model_version_control.py		model_version_control.py
results.html		results.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Computer Vision Projects

Table of Contents

About

Projects

1. Image Classification

2. Object Detection

3. Image Segmentation

Model Explanations

1. YOLO (You Look Only Once):

2. SSD (Single Shot Multibox Detector) Model:

3. U-Net:

Installation

Usage

Technologies Used

Contributing

👥 Our Valuable Contributors ❤️✨

License

About

Releases

Packages

Contributors 60

Languages

License

Aryan-Chharia/Computer-Vision-Projects

Folders and files

Latest commit

History

Repository files navigation

Computer Vision Projects

Table of Contents

About

Projects

1. Image Classification

2. Object Detection

3. Image Segmentation

Model Explanations

1. YOLO (You Look Only Once):

2. SSD (Single Shot Multibox Detector) Model:

3. U-Net:

Installation

Usage

Technologies Used

Contributing

👥 Our Valuable Contributors ❤️✨

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 60

Languages

Packages