When working with large data, it ususally necessary to specify what foramt/method one needs to use to pull the data from an existing database. It's easy to retreive data using API request. In this project, we've utilize the python scraping code to retrieve all files from the webiste since there was no API for the original EDA . See file under code folder for details.
I tried to document in a specific way the data managment plans for this project. This provides guide in the entire project lifecylce. These include data collection, documentation, storage, sharing, and preservation.
- Data Collection - file formats, naming conventions, version control
- Script, data, results, docs
- Documentation and Metadata - methodology, code, data dictionaries, metadata standard, README files
- Storage and Backup - requirements, backup and retention schedules, access controls
- 3-2-1 rule
- Preservation - see https://zenodo.org/records/10316549
- Sharing and Reuse
The visual representations are saved under data_visualization folder. You can also check the code folder too see and test the code on your own data or on EDA.
This is data visualization project examining Dickinson's textual variants. There are interesting connections and relationship that base texts share with their corresponding variants.