We automatically pull daily news data from major national news sites: ABC, CBS, CNN, LA Times, NBC, NPR, NYT, Politico, ProPublica, USA Today, and WaPo using Github Workflows. Refer to the respective json files for the latest version.
Script for downloading article text and parsing some features, e.g., publication date, authors, etc. https://gist.github.com/dwillis/7e6a2571d64688243879ed349e88787c
The June 2023 full text dump is here: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ZNAKK6