Python library for handling audio datasets.
-
Updated
Jul 6, 2023 - Python
Python library for handling audio datasets.
Trainable categorization tool
💧 In memory dataset filtering
Multi-Language Dataset Cleaner/Creator for Mozilla's DeepSpeech Framework
Cleaning discord data for NLP
Command-line filter for GitHub repositories that contain "samples", instead of real project or framework or library
🚀 Whenever you need to look through huge pile of images and cannot use force of file explorer, or you just work on a remote headless machine, you can use this tool. It also allows to move files from one folder to another, creating destination if it does not exist. Work in progress.
[ACL 2024 (Findings)] ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation
Face recognition approach by exploring information jointly in space, scale and orientation.
A simple library that wraps common data processing tasks into an easy to use preprocessing engine. The library currently supports transformation of csv files loaded into Pandas dataframe.
A set of tools to generate and label dataset from academic papers
Module of the Open City Toolkit to visualize use of open datasets by applications:
Fast Spark Expression - Write column expressions quickly and easily like a string
Data Cleaning - A project which takes all colleges in the US, and narrows down the suitable colleges by slicing, dicing and concatenating startup activity data and crime statistics.
Compare pictures, keep 2
Add a description, image, and links to the dataset-filtering topic page so that developers can more easily learn about it.
To associate your repository with the dataset-filtering topic, visit your repo's landing page and select "manage topics."