Skip to content

Various tools for manipulating Pony Preservation Project related data

Notifications You must be signed in to change notification settings

effusiveperiscope/PPPDataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is this

Tools for manipulating PPP voice and text data and an attempt to make combined synthetic training data for pony-related LLMs, made by a codelet.

ppp.py

Point it at your Sliced Dialogue directory. Can be used to reformat datasets for training voice models (example is for PITS). Also has an updated version of horsewords.clean for ARPAbet substitutions.

FiMFiction Tools

Various tools for trimming/sampling the fimficOmegaV3 dataset.

fimfarchive.py

A tool for accessing and manipulating a downloaded copy of the fimfarchive archives.

Text training data

Tools for scraping sources like the FiMFiction wiki for episode summaries, episode titles/transcripts, Wikipedia episode synopsis, as well as experiments for using LLMs via oobabooga api to create synthetic training data.

About

Various tools for manipulating Pony Preservation Project related data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published