A large collection (1.25 billion lines) of fan fiction text.
This is a flattened, text-only, tokenized representation of the entire corpus concatenated together (i.e., no delineation between different stories) with dialog heuristrically removed by filtering regions between quote marks.