A corpus builder for evaluation of plagiarism detection tools
-
Updated
Dec 12, 2016 - PHP
A corpus builder for evaluation of plagiarism detection tools
Information Retrieval Lab
A clean Fusha Arabic tagged corpus.
Scrimshaw parses IRC logs stored in the driftwood format for quotes attributable to a given user. Written in Rust.
Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!
The canonical resources to build the backend for a corpus/repository management framework for Crow, the Corpus and Repository of Writing
Create a corpus for fine-tuning an OpenAI model
Generate pseudo-English sentences for research in semantic composition
Natively log WeeChat channel and private messages, CTCP, and notices, in the driftwood standard. Written in Python.
A prototype for generating language in a grounded simulation of a simple hunter-gatherer world
A set of corpus-based sampling & analysis M4L devices
A full-text article retrieval pipeline for biomedical literature.
A parser for annotated MuseScore 3 files.
A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.
Augmentation scripts for the bAbI Dialog Tasks dataset
AutoCorpus is a tool backed by a large language model (LLM) for automatically generating corpus files for fuzzing.
golden arabic corpus build for test Assem's arabicstemmer and other arabic stemmers
Bitextor generates translation memories from multilingual websites
Add a description, image, and links to the corpus-generator topic page so that developers can more easily learn about it.
To associate your repository with the corpus-generator topic, visit your repo's landing page and select "manage topics."