diff --git a/current/aut-spark-submit-app.md b/current/aut-spark-submit-app.md index d8ae11a..2b3a455 100644 --- a/current/aut-spark-submit-app.md +++ b/current/aut-spark-submit-app.md @@ -137,7 +137,7 @@ spark-submit --class io.archivesunleashed.app.CommandLineAppRunner path/to/aut-f ## Plain Text This extractor outputs a directory of CSV files or a single CSV file with the -following columns: `crawl_date`, `domain`, `url`, and `text`. +following columns: `content` (Boilerplate, HTTP headers, and HTML removed). Directory of CSV files: