Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add shakespeare data #748

Merged
merged 6 commits into from
Sep 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ If you are using TidyTuesday to teach data-related skills, [please let us know](
| 35 | `2024-08-27` | [The Power Rangers Franchise](data/2024/2024-08-27/readme.md) | [Power Rangers: Seasons and episodes data](https://www.kaggle.com/datasets/karetnikovn/power-rangers-dataset/data) | [National Power Rangers Day (August 28)](https://www.nationaldaycalendar.com/national-day/national-power-rangers-day-august-28) |
| 36 | `2024-09-03` | [Stack Overflow Annual Developer Survey 2024](data/2024/2024-09-03/readme.md) | [Stack Overflow Annual Developer Survey 2024](https://survey.stackoverflow.co/) | [Stack Overflow Annual Developer Survey Results](https://survey.stackoverflow.co/2024/) |
| 37 | `2024-09-10` | [Economic Diversity and Student Outcomes](data/2024/2024-09-10/readme.md) | [Opportunity Insights: College-Level Data for 139 Selective American Colleges](https://opportunityinsights.org/data/) | [Economic diversity and student outcomes at the University of Texas at Dallas](https://www.nytimes.com/interactive/projects/college-mobility/university-of-texas-at-dallas) |
| 38 | `2024-09-17` | [Shakespeare Dialogue](data/2024/2024-09-17/readme.md) | [The Complete Works of William Shakespeare](https://shakespeare.mit.edu/) | [shakespeare](https://github.com/nrennie/shakespeare) |

***

Expand Down
4,218 changes: 4,218 additions & 0 deletions data/2024/2024-09-17/hamlet.csv

Large diffs are not rendered by default.

2,554 changes: 2,554 additions & 0 deletions data/2024/2024-09-17/macbeth.csv

Large diffs are not rendered by default.

Binary file added data/2024/2024-09-17/macbeth.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 22 additions & 0 deletions data/2024/2024-09-17/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
title: "Shakespeare Dialogue"
article:
title: "shakespeare"
url: "https://github.com/nrennie/shakespeare"
data_source:
title: "The Complete Works of William Shakespeare"
url: "https://shakespeare.mit.edu/"
images:
# Please include at least one image, and up to three images
- file: "macbeth.png"
alt: >
Bar chart of number of lines per scene in Macbeth. Act IV Scene III has the most with around 275 lines, while Act I Scene I and Act V Scene VI have the least with fewer than 25 lines each.
- file: "romeo_juliet.png"
alt: >
Bar chart of number of lines per scene in Romeo and Juliet. Act V Scene III has the most with over 300 lines, while the Act I and Act II Prologues have the least with fewer than 25 lines each.
credit:
# We want to thank you for curating this dataset! If you do not want a
# particular type of credit, please delete the related line.
post: "Nicola Rennie"
linkedin: "https://www.linkedin.com/in/nicola-rennie"
mastodon: "@nrennie@fosstodon.org"
github: "https://github.com/nrennie"
76 changes: 76 additions & 0 deletions data/2024/2024-09-17/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Shakespeare Dialogue

This week we're exploring dialogue in Shakespeare plays. The dataset this week comes from [shakespeare.mit.edu](https://shakespeare.mit.edu/) (via [github.com/nrennie/shakespeare](https://github.com/nrennie/shakespeare)) which is the Web's first edition of the Complete Works of William Shakespeare. The site has offered Shakespeare's plays and poetry to the internet community since 1993.

Dialogue from Hamlet, Macbeth, and Romeo and Juliet are provided for this week. Which play has the most stage directions compared to dialogue? Which play has the longest lines of dialogue? Which character speaks the most?

Thank you to [Nicola Rennie](https://github.com/nrennie) for curating this week's dataset.

## The Data

```r
# Option 1: tidytuesdayR package
## install.packages("tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load('2024-09-17')
## OR
tuesdata <- tidytuesdayR::tt_load(2024, week = 38)

hamlet <- tuesdata$hamlet
macbeth <- tuesdata$macbeth
romeo_juliet <- tuesdata$romeo_juliet

# Option 2: Read directly from GitHub

hamlet <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-09-17/hamlet.csv')
macbeth <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-09-17/macbeth.csv')
romeo_juliet <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-09-17/romeo_juliet.csv')
```

## How to Participate

- [Explore the data](https://r4ds.hadley.nz/), watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about **causation** in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a [shiny app](https://shiny.posit.co/), or some other piece of data-science-related output, using R or another programming language.
- [Share your output and the code used to generate it](../../../sharing.md) on social media with the #TidyTuesday hashtag.
- [Submit your own dataset!](../../../.github/pr_instructions.md)

### Data Dictionary

# `hamlet.csv`

|variable |class |description |
|:-----------|:---------|:-------------------------------------------------------------|
|act |character |Act number. |
|scene |character |Scene number. |
|character |character |Name of character speaking or whether it's a stage direction. |
|dialogue |character |Text of dialogue or stage direction. |
|line_number |double |Dialogue line number. |

# `macbeth.csv`

|variable |class |description |
|:-----------|:---------|:-------------------------------------------------------------|
|act |character |Act number. |
|scene |character |Scene number. |
|character |character |Name of character speaking or whether it's a stage direction. |
|dialogue |character |Text of dialogue or stage direction. |
|line_number |double |Dialogue line number. |

# `romeo_juliet.csv`

|variable |class |description |
|:-----------|:---------|:-------------------------------------------------------------|
|act |character |Act number. |
|scene |character |Scene number. |
|character |character |Name of character speaking or whether it's a stage direction. |
|dialogue |character |Text of dialogue or stage direction. |
|line_number |double |Dialogue line number. |

### Cleaning Script

```r
# Clean data provided by <https://github.com/nrennie/shakespeare/tree/main/data>. No cleaning was necessary.
hamlet <- readr::read_csv("https://raw.githubusercontent.com/nrennie/shakespeare/main/data/hamlet.csv")
romeo_juliet <- readr::read_csv("https://raw.githubusercontent.com/nrennie/shakespeare/main/data/romeo_juliet.csv")
macbeth <- readr::read_csv("https://raw.githubusercontent.com/nrennie/shakespeare/main/data/macbeth.csv")
```
Loading