diff --git a/Join_Reshape_Reports/Join_Reshape_Reports.html b/Join_Reshape_Reports/Join_Reshape_Reports.html index 0cdfe0f..80a0028 100644 --- a/Join_Reshape_Reports/Join_Reshape_Reports.html +++ b/Join_Reshape_Reports/Join_Reshape_Reports.html @@ -95,29 +95,23 @@
pivot_longer
pivot_wider
StudentID GPA_change
-1 1 2.4470452
-2 2 0.6414608
+1 1 1.573612
+2 2 -0.469586
Dataset 2
@@ -225,20 +219,20 @@ StudentID GPA_change Semester
-1 3 -1.1902074 Spring
-2 4 -0.5641581 Fall
+ StudentID GPA_change Semester
+1 3 -0.07326713 Spring
+2 4 -2.57229916 Fall
Datasets combined
bind_rows(a0,a1)
StudentID GPA_change Semester
-1 1 2.4470452 <NA>
-2 2 0.6414608 <NA>
-3 3 -1.1902074 Spring
-4 4 -0.5641581 Fall
+ StudentID GPA_change Semester
+1 1 1.57361180 <NA>
+2 2 -0.46958602 <NA>
+3 3 -0.07326713 Spring
+4 4 -2.57229916 Fall
When we refer to merging or joining, we usually do not mean appending or adding observations to a dataset in this way. Instead, joining usuallly intends to add columns or variables to our dataframe. R does have a bind_cols function, but in the context below, using bind_cols is unhelpful and results in mismatched records.
@@ -644,302 +638,176 @@pivot_wider
R Markdown provides a straightforward way to create reports that combine code and the output from that code with text commentary. This allows for the creation of automated, reproducible reports. R Markdown can knit together your analysis results with text and output it directly into HTML, PDF, or Word documents. In fact, we have been using R Markdown to generate the webpage for all of our R Open Labs workshops!
+Quarto provides a straightforward way to create reports that combine code and the output from that code with text commentary. This allows for the creation of automated, reproducible reports. Quarto can knit together your analysis results with text and output it directly into HTML, PDF, or Word documents. In fact, we have been using Quarto to generate the webpage for all of our R Open Labs workshops!
+Quarto is very similar to an older tool, R Markdown, that these workshops were originally created in. Quarto and R Markdown syntax and behind the scenes functionality are similar, but Quarto is designed to be more compatible with other languages like Python and Julia. In most cases, you can convert old R Markdown .Rmd documents into Quarto documents with no changes.
+R Markdown has three components.
+Quarto has three components.
```
To create a new R Markdown document (.Rmd), select File -> New File -> R Markdown.
-You will have the option to select the output: we’ll use the default HTML for this workshop. Give the document a title and enter your name as author: this will create the header for you at the top of your new .html page! RStudio will create a new R Markdown document filled with examples of code chunks and text.
+To create a new Quarto document (.qmd), select File -> New File -> Quarto Document.
+You will have the option to select the output: we’ll use the default HTML for this workshop. Give the document a title and enter your name as author: this will create the header for you at the top of your new .html page! RStudio will create a new Quarto document filled with examples of code chunks and text.
At the top of the page is the optional Yet Another Markup Language (YAML) header. This header is a powerful way to edit the formatting of your report (e.g. figure dimensions, presence of a table of contents, identifying the location of a bibliography file).
+
+ ---
+ title: "beginR: Joining, Reshaping & Reproducible Reports"
+ author:
+ name: University of North Carolina at Chapel Hill
+ execute:
+ echo: true
+ format:
+ html:
+ theme: spacelab
+ toc: true
+ toc-location: left
+ page-layout: article ---
At the top of the page is the optional Yet Another Markup Language (YAML) header. This header is a powerful way to edit the formatting of your report (e.g. figure dimensions, presence of a table of contents, identifying the location of a bibliography file).
R code chunks are surrounded by ```
. Inside the curly braces, it specifies that this code chunk will use R code (other programming languages are supported), then it names this chunk “setup”. Names are optional.
After the name, you specify options on whether you want the code or its results to be displayed in the final document. For this chunk, the include=FALSE
options tells R Markdown that we want this code to run, but we do not want it to be displayed in the final HTML document. The R code inside the chunk knitr::opts_chunk$set(echo = TRUE)
tells R Markdown to display the R code along with the results of the code in the HTML output for all code chunks below.
+ ```{r}
+ #| label: setup
+ #| warning: false
+ library(tidyverse) ```
R code chunks are surrounded by ```
. Inside the curly braces, it specifies that this code chunk will use R.
#|
precedes options for this code chunk. In this case:
#| label: setup
names this chunk “setup”. (Names are optional)#| warning: false
tells Quarto to hide any warnings generated by our code in the final HTML document.Finally the code library(tidyverse)
is executed as usual. When creating a document, you can use the buttons at the top right of the code chunk to run all code before and run the code in this block respectively.
Use CTRL+ALT+i (PC) or CMD+OPTION+i (Mac) to insert R code blocks.
+ ## Quarto
+
+ Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see <https://quarto.org>.
+
+ ## Running Code
+ When you click the **Render** button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
This is plain text with simple formatting added. The ##
tells R Markdown that “R Markdown” is a section header. The **
around “Knit” tells R Markdown to make that word bold.
The RStudio team has helpfully condensed these code chunk and text formatting options into a cheatsheet.
-You can get pretty far with options in the R Markdown cheatsheet, but R Markdown is a very powerful, flexible language that we do not have time to fully cover. More detailed references are:
-https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf
-https://bookdown.org/yihui/rmarkdown
+This is plain text with simple formatting added. The ##
tells Quarto that “Quarto” and “Running Code are section headers. The **
around”Render” tells Quarto to make that word bold.
The Posit team has helpfully condensed these code chunk and text formatting options into a cheatsheet.
+You can get pretty far with options in the Quarto cheatsheet, but Quarto is a very powerful, flexible language that we do not have time to fully cover. More detailed references are available here: https://quarto.org/docs/authoring/markdown-basics.html
Newer versions of R Studio provide a visual editor for R markdown documents. This can be accessed by toggling between the “Source” and “Visual” options in the top left corner of your Rmd script editor pane. (Note: In some older versions of R Studio, this is available as a compass-shaped icon in the top right corner instead).
+R Studio provides a visual editor for Quarto documents. This can be accessed by toggling between the “Source” and “Visual” options in the top left corner of your Qmd script editor pane.
Once activated, this interface is similar to a word processing software like Microsoft Word - shortcuts for bolding, italics, etc. are usually the same and there are icons and drop down menus available for lists, bullets, links, and more.
You can still use CTRL+ALT+i (PC) or CMD+OPTION+i (Mac) to insert R code blocks, or use the Insert>Code Chunk>R menu in the visual editor.
Read more about the visual editor here:
-https://rstudio.github.io/visual-markdown-editing/
Click the Knit button, and R Studio will generate an HTML report based on your R Markdown document.
-Let’s try creating an R Markdown document to explore the US cheese consumption data and review what we learned in weeks 1-3.
-<- read_csv("data/clean_cheese.csv") consumption
Rows: 48 Columns: 17
-── Column specification ────────────────────────────────────────────────────────
-Delimiter: ","
-dbl (17): Year, Cheddar, American Other, Mozzarella, Italian other, Swiss, B...
-
-ℹ Use `spec()` to retrieve the full column specification for this data.
-ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
-Include one of the following in your document. We’ve used eval = FALSE
here to prevent this code chunk from running!
head(consumption)
-tail(consumption)
-summary(consumption)
knitr::kable
By default, R Markdown will display tables the way they appear in the R console. We can use knitr::kable function
to get cleaner tables.
::kable(head(consumption), caption = "The first six rows of the cheese consumption data") knitr
Year | -Cheddar | -American Other | -Mozzarella | -Italian other | -Swiss | -Brick | -Muenster | -Cream and Neufchatel | -Blue | -Other Dairy Cheese | -Processed Cheese | -Foods and spreads | -Total American Chese | -Total Italian Cheese | -Total Natural Cheese | -Total Processed Cheese Products | -
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1970 | -5.79 | -1.20 | -1.19 | -0.87 | -0.88 | -0.10 | -0.17 | -0.61 | -0.15 | -0.41 | -3.32 | -2.20 | -7.00 | -2.05 | -11.37 | -5.53 | -
1971 | -5.91 | -1.42 | -1.38 | -0.92 | -0.94 | -0.11 | -0.19 | -0.62 | -0.15 | -0.41 | -3.55 | -2.31 | -7.32 | -2.29 | -12.03 | -5.86 | -
1972 | -6.01 | -1.67 | -1.57 | -1.02 | -1.06 | -0.10 | -0.22 | -0.63 | -0.17 | -0.56 | -3.51 | -2.62 | -7.68 | -2.59 | -13.01 | -6.13 | -
1973 | -6.07 | -1.76 | -1.76 | -1.03 | -1.06 | -0.11 | -0.21 | -0.66 | -0.18 | -0.65 | -3.31 | -2.68 | -7.83 | -2.80 | -13.49 | -5.99 | -
1974 | -6.31 | -2.16 | -1.86 | -1.09 | -1.18 | -0.11 | -0.23 | -0.70 | -0.16 | -0.61 | -3.42 | -2.92 | -8.47 | -2.95 | -14.41 | -6.34 | -
1975 | -6.04 | -2.11 | -2.11 | -1.12 | -1.09 | -0.09 | -0.24 | -0.74 | -0.16 | -0.57 | -3.35 | -3.35 | -8.15 | -3.23 | -14.27 | -6.69 | -
We’ve covered two ways to add a new variable to a dataframe.
-Note: R allows non-standard variable names that include spaces, parentheses, and other special characters. The way to refer to variable names that contain wonky symbols is to use the backtick symbol `
, found at the top left of your keyboard with the tilde ~
.
The base R way covered in lesson 1 using the $
operator and with()
function
#Base R way, covered in lesson 1
-$amer_ital_ratio <- with(consumption, `Total American Cheese` / `Total Italian Cheese`) consumption
Error in eval(substitute(expr), data, enclos = parent.frame()): object 'Total American Cheese' not found
-Oops. Better check the variable names.
-$amer_ital_ratio1 <- with(consumption, `Total American Chese` / `Total Italian Cheese`) consumption
The tidyverse way covered in lesson 3 using the mutate()
function
#Tidyverse way, covered in lesson 3
-<- mutate(consumption, amer_ital_ratio2 = `Total American Chese` / `Total Italian Cheese`) consumption
<- select(consumption, Year, Cheddar, Mozzarella, `Cream and Neufchatel`) consumption
<- rename(consumption, Cream_and_Neufchatel = `Cream and Neufchatel`) consumption
ggplot(consumption, aes(x = Year)) +
-geom_point(aes(y = Cheddar, col = "Cheddar")) +
- geom_point(aes(y = Mozzarella, col = "Mozzarella")) +
- geom_point(aes(y = Cream_and_Neufchatel, col = "Cream and Neufchatel")) +
- ylab("Consumption in Pounds Per Person")
Click the Render button, and R Studio will generate an HTML report based on your document.
+Let’s try creating an Quarto document to explore the US cheese consumption data and review what we learned in weeks 1-3.
+
+ ###Data Import
+
+ ```{r}
+ consumption <- read_csv("data/clean_cheese.csv")
+ ```
+
+ ### Useful functions for exploring dataframes
+
+ Include one of the following in your document. We've used `eval = FALSE` here to prevent this code chunk from running!
+
+ ```{r}
+ #| label: misc
+ #| eval: FALSE
+ head(consumption)
+ tail(consumption)
+ summary(consumption)
+ ```
+
+ ### Tables and `knitr::kable`
+
+ By default, Quarto will display tables the way they appear in the R console. We can use `knitr::kable function` to get cleaner tables.
+
+ ```{r}
+ #| label: kable
+ knitr::kable(head(consumption), caption = "The first six rows of the cheese consumption data")
+ ```
+
+ ### Adding a new variable
+
+ We've covered two ways to add a new variable to a dataframe.
+
+ **Note:** R allows non-standard variable names that include spaces, parentheses, and other special characters. The way to refer to variable names that contain wonky symbols is to use the backtick symbol `` ` ``, found at the top left of your keyboard with the tilde `~`.
+
+ The base R way covered in lesson 1 using the `$` operator and `with()` function
+
+ ```{r}
+ #| label: ratio1
+ #| error: TRUE
+ #Base R way, covered in lesson 1
+ consumption$amer_ital_ratio <- with(consumption, `Total American Cheese` / `Total Italian Cheese`)
+ ```
+
+ Oops. Better check the variable names.
+
+ ```{r}
+ #| label: ratio2
+ consumption$amer_ital_ratio1 <- with(consumption, `Total American Chese` / `Total Italian Cheese`)
+ ```
+
+ The tidyverse way covered in lesson 3 using the `mutate()` function
+
+ ```{r}
+ #| label: ratio3
+ #Tidyverse way, covered in lesson 3
+ consumption <- mutate(consumption, amer_ital_ratio2 = `Total American Chese` / `Total Italian Cheese`)
+ ```
+
+ ### Selecting Columns
+
+ ```{r}
+ #| label: select
+ consumption <- select(consumption, Year, Cheddar, Mozzarella, `Cream and Neufchatel`)
+
+ ```
+
+ ### Renaming Columns
+
+ ```{r}
+ #| label: rename
+ consumption <- rename(consumption, Cream_and_Neufchatel = `Cream and Neufchatel`)
+ ```
+
+ ### Plotting
+
+ ```{r}
+ #| label: plot1
+ #| fig.width: 8
+ #| fig.height: 5
+ ggplot(consumption, aes(x = Year)) +
+ geom_point(aes(y = Cheddar, col = "Cheddar")) +
+ geom_point(aes(y = Mozzarella, col = "Mozzarella")) +
+ geom_point(aes(y = Cream_and_Neufchatel, col = "Cream and Neufchatel")) +
+ ylab("Consumption in Pounds Per Person")
+ ```
R Markdown also provides a nifty way to incorporate a bibliography and references. We’ll haven an example of this in the exercises, but here’s a brief summary of the steps required to use a BibTex bibliography.
+Quarto also provides a nifty way to incorporate a bibliography and references. We’ll haven an example of this in the exercises, but here’s a brief summary of the steps required to use a BibTex bibliography.
bibliography: references.bib
bibliography: references.bib
Use pivot_wider
to create a new dataframe with a row for each customer_state
and a column for each product_category_name_english
. Name this dataframe products
.
Run the two lines of code below (make sure your dataframe from step 4 is called products
!)
<- ungroup(products) #remove grouping
- products
-is.na(products)] <- 0 #replace missing data with zeroes products[
<- ungroup(products) #remove grouping
+ products
+is.na(products)] <- 0 #replace missing data with zeroes products[
Use ggpairs
or other Exploratory Data Analysis techniques to look for relationships between purchases of small_appliances
,consoles_games
,air_conditioning
, and construction_tools_safety
. (Remember to run library(GGally)
before using ggpairs
).
Repeat problems 3-6 with order_products_value
(i.e. the amount spent vs the quantity purchased). Do you see different patterns? Explore other product categories.
Choose 5 states. Which of these states have the most similar patterns of spending (as measured by correlation)?
Download the cheese RStudio Project file and extract the R Project contained within. Then, knit the cheeseConsumption.Rmd report. It should generate an HTML report for you.
In the cheeseConsumption.Rmd file, find the code chunk named setup. change echo=FALSE
to echo=TRUE
. Try knitting the document again. What changed? Did this affect the whole document?
In the cheeseConsumption.Rmd file, find the code chunk named import. change message=FALSE
to message=TRUE
. Try knitting the document again. What changed? Did this affect the whole document?
Create another R Markdown document analyzing cheese production data contained in the state_milk_productions.csv file. You can use the data dictionary found here to make sense of the different variables. Hint: you’ll need to use the group-by %>% summarize idiom we learned in Week 3 to sum up all the state level data within each year. You’ll probably want to feed the output from that group-by %>% summarize step into knitr::kable() to get a prettier table for your report.
Once you have created an R Markdown report analyzing cheese production, send the entire R Project to a friend (or us!) and ask them to knit that .Rmd document. If they have RStudio and the tidyverse installed, they should be able to seamlessly generate the exact report you generated, without having to make any changes.
Download the cheese RStudio Project file and extract the R Project contained within. Then, render the cheeseConsumption.qmd report. It should generate an HTML report for you.
In the cheeseConsumption.qmd file, find the code chunk named setup. change #| echo: FALSE
to #| echo: TRUE
. Try knitting the document again. What changed? Did this affect the whole document?
In the cheeseConsumption.qmd file, find the code chunk named import. change #| message: FALSE
to #| message: TRUE
. Try knitting the document again. What changed? Did this affect the whole document?
Create another Quarto document analyzing cheese production data contained in the state_milk_productions.csv file. You can use the data dictionary found here to make sense of the different variables. Hint: you’ll need to use the group-by %>% summarize idiom we learned in Week 3 to sum up all the state level data within each year. You’ll probably want to feed the output from that group-by %>% summarize step into knitr::kable() to get a prettier table for your report.
Once you have created an Quarto report analyzing cheese production, send the entire R Project to a friend (or us!) and ask them to knit that .qmd document. If they have RStudio and the tidyverse installed, they should be able to seamlessly generate the exact report you generated, without having to make any changes.