This project focuses on the classification of urban noise using various machine learning algorithms. The aim is to develop robust models capable of accurately identifying different types of urban sounds, leveraging the UrbanSound8K dataset. The implemented models include Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) networks, and Random Forest (RF).
- Project Description
- Key Features
- Dataset
- Implemented Algorithms
- Methodology
- Results and Discussion
- Challenges
- Future Directions
- System Requirements
- Software Requirements
- Installation
- Usage
- License
- Acknowledgements
Urban noise has grown to be a major problem for both policymakers and city inhabitants as an inevitable outcome of urbanization and industry. This project aims to classify urban noises using machine learning techniques to aid in noise monitoring and management.
- Dataset: Utilized the UrbanSound8K dataset, which contains 8,732 labeled sound excerpts from 10 different classes.
- Machine Learning Models: Implemented models include DNN, CNN, LSTM, and Random Forest.
- Feature Extraction: Used Mel-frequency cepstral coefficients (MFCCs), chroma features, spectral contrast, and more.
- Evaluation Metrics: Evaluated models using accuracy, precision, recall, F1-score, and confusion matrices.
The UrbanSound8K dataset contains 8,732 labeled sound excerpts from 10 different classes:
- Air Conditioner
- Car Horn
- Children Playing
- Dog Bark
- Drilling
- Engine Idling
- Gun Shot
- Jackhammer
- Siren
- Street Music
- Normalization: Ensured all audio files had the same sampling rate and format.
- Noise Reduction: Applied techniques to filter out background noise and improve audio quality.
- Data Splitting: Divided the dataset into training (70%), validation (15%), and test (15%) sets.
- Architecture: Input layer, several hidden layers with ReLU activation, and an output layer with softmax activation.
- Optimization: Used the Adam optimizer and categorical cross-entropy loss function.
- Performance: Achieved the highest accuracy at 94.5%.
- Architecture: Multiple convolutional layers followed by pooling layers, dropout layers for regularization, and dense layers.
- Feature Extraction: Effective in capturing spatial hierarchies in audio spectrograms.
- Performance: Achieved 90% accuracy.
- Architecture: Sequential layers with LSTM units to capture temporal dependencies in audio data.
- Challenge: Struggled with capturing long-range dependencies.
- Performance: Achieved 79% accuracy.
- Ensemble Learning: Combines multiple decision trees to reduce overfitting and improve generalization.
- Feature Importance: Capable of evaluating the significance of each feature.
- Performance: Achieved 87% accuracy.
-
Data Preprocessing:
- Loaded the UrbanSound8K dataset and handled missing or corrupted audio files.
- Normalized audio levels and applied noise reduction techniques.
- Split the data into training, validation, and test sets.
-
Feature Extraction:
- Converted audio data into a format suitable for machine learning algorithms.
- Extracted features such as MFCCs, chroma features, spectral contrast, and more.
-
Model Building:
- Implemented DNN, CNN, LSTM, and RF models with appropriate architectures and hyperparameters.
- Used libraries like TensorFlow/Keras for neural network models and scikit-learn for Random Forest.
-
Model Training:
- Trained the models using the training dataset and optimized hyperparameters.
- Monitored validation performance to prevent overfitting.
-
Model Evaluation:
- Evaluated the models using accuracy, precision, recall, F1-score, and confusion matrices to determine performance.
The performance metrics for each model are summarized below:
Model | Accuracy | Precision | Recall | F1-score |
---|---|---|---|---|
DNN | 94.5% | 95.1% | 94.0% | 94.6% |
CNN | 90% | 91.0% | 92.1% | 91.5% |
Random Forest | 87% | 86.5% | 87.8% | 87.1% |
LSTM | 79% | 78.7% | 79.1% | 77.4% |
- DNN: Demonstrated superior performance with the highest accuracy and F1-score. Effective in capturing both low-level and high-level features.
- CNN: Performed well with a high accuracy, leveraging convolutional layers to extract spatial features from audio spectrograms.
- Random Forest: Showed good generalization capabilities with balanced performance metrics.
- LSTM: Highlighted the importance of capturing temporal dependencies, though performance indicated a need for further optimization.
The high performance of DNN and CNN models suggests that these architectures are well-suited for urban noise classification, potentially aiding in the development of robust and scalable noise monitoring systems.
- Data Quality: Addressing imbalances and noise in the dataset through data augmentation and transfer learning.
- Computational Limits: Implementing model compression techniques to enable real-time processing on resource-constrained devices.
- Model Robustness: Developing models that can adapt to varying environmental conditions and maintain high performance.
- Automated Sound Event Detection: Enhance the automation of sound event detection in continuous audio streams.
- Feature Optimization: Improve feature selection and extraction processes for better model efficiency.
- Edge Computing: Explore the deployment of models on edge devices for real-time noise classification.
- Community Involvement: Utilize crowdsourcing for data collection to diversify and expand the dataset.
- Processor: Intel Core i7-13650HX (20 CPUs) @ 2.60GHz
- Memory: 16 GB RAM
- Graphics: NVIDIA GeForce RTX 4060
- Operating System: Windows 11
- Storage: 1 TB SSD
- Python Version: 3.11.9
- Libraries:
- librosa - 0.10.2.post1
- numpy - 1.26.4
- pandas - 2.2.2
- scikit-learn - 1.4.2
- seaborn - 0.13.2
- matplotlib - 3.9.0
- tensorflow - 2.16.1
- keras - 3.3.3
- warnings (part of the Python standard library)
- IDE: Visual Studio Code
-
Clone the repository:
git clone https://github.com/yourusername/Urban-Noise-Classification-ML.git cd Urban-Noise-Classification-ML
-
Install the required libraries:
pip install -r requirements.txt
-
Run the Jupyter Notebook:
jupyter notebook
- Open the provided Jupyter Notebook (
ehb328eUrbanSound.ipynb
) to explore the implementation details. - Follow the steps in the notebook to preprocess the data, extract features, train the models, and evaluate their performance.
This project is licensed under the MIT License - see the LICENSE file for details.
- The UrbanSound8K dataset creators
- Istanbul Technical University, Electronics and Communications Engineering Department