Merge pull request #10 from unsw-edu-au/import

import updates
unsw-edu-au · Jun 13, 2024 · f2d9a7b · f2d9a7b
2 parents 9e1507f + ab350d0
commit f2d9a7b
Show file tree

Hide file tree

Showing 4 changed files with 80 additions and 57 deletions.
diff --git a/appendix.qmd b/appendix.qmd
@@ -4,10 +4,9 @@ format: html
 editor: visual
 ---
 
-# how to install R and RStudio on your machine 
-
-The marvellous Danielle Navarro has LOTS of useful R learning resources on her YouTube channel. [This playlist](https://www.youtube.com/playlist?list=PLRPB0ZzEYegOZivdelOuEn-R-XUN-DOjd) about how to install R and RStudio is particularly useful; no matter which operating system you are dealing with... Dani has you covered.  
+# how to install R and RStudio on your machine
 
+The marvellous Danielle Navarro has LOTS of useful R learning resources on her YouTube channel. [This playlist](https://www.youtube.com/playlist?list=PLRPB0ZzEYegOZivdelOuEn-R-XUN-DOjd) about how to install R and RStudio is particularly useful; no matter which operating system you are dealing with... Dani has you covered.
 
 # how to install packages
 
@@ -19,7 +18,6 @@ Install a package by typing the following command with the name of the package y
 
 ```         
 install.packages("packagename")
-
 ```
 
 ## option 2
@@ -32,35 +30,26 @@ Alternatively, search for the package you would like to install in the packages
 
 ## useful packages for psychology
 
-
-- `tidyverse` this is a cluster of super helpful data wrangling and visualisation tools. 
-- `here` this package helps direct R to the correct place for files based on the current working directory.
-- `janitor` this package helps us clean up data - especially awkward variable names.
-- `qualtRics` this package is helpful in reading in data files from Qualtrics... except for .sav SPSS format files! (see next)
-- `haven` this package is a good one for reading in .sav SPSS format files
-- `sjplot` this package is helpful for making a 'codebook' of your variables and values from imported .sav files
-- `surveytoolbox` this package is helpful in drawing out the value labels of variables imported in .sav format
--- note: because `surveytoolbox` is on github and not CRAN, you'll want to do the following two steps *in the console*. Note that we do this in the console since we only need to do it once! If the install asks you about updating packages, go ahead and do it!
----(1) install the `devtools` package: install.packages("devtools") 
----(2) install via github: devtools::install_github("martinctc/surveytoolbox") 
-- `ufs` this package (short for user friendly science) is a nice tool for computing the internal reliability of scales
--- note: one of the commands we will use in `ufs` requires the `psych` package to be installed (but doesn't need to be loaded via `library()`). Ensure you install that first. Two steps:
-----(1) install the `remotes`` package: install.packages("remotes") 
-----(2) install via github_lab: remotes::install_gitlab('r-packages/ufs') 
-- `apa` nice for making statistical output into APA style
-- `gt` nice for making your tables look pretty
-- `apaTables` makes nice APA-styled tables of correlation, ANOVA, regression etc. output
-- `report` is a package to help with results reporting
-- `psych` is an umbrella package for lots of common psych tasks
-- `ez` is a great package for stats, including analysis of variance
-- `emmeans` is helpful for comparing specific means in a factorial design
-
+-   `tidyverse` this is a cluster of super helpful data wrangling and visualisation tools.
+-   `here` this package helps direct R to the correct place for files based on the current working directory.
+-   `janitor` this package helps us clean up data - especially awkward variable names.
+-   `qualtRics` this package is helpful in reading in data files from Qualtrics... except for .sav SPSS format files! (see next)
+-   `haven` this package is a good one for reading in .sav SPSS format files
+-   `sjplot` this package is helpful for making a 'codebook' of your variables and values from imported .sav files
+-   `surveytoolbox` this package is helpful in drawing out the value labels of variables imported in .sav format -- note: because `surveytoolbox` is on github and not CRAN, you'll want to do the following two steps *in the console*. Note that we do this in the console since we only need to do it once! If the install asks you about updating packages, go ahead and do it! ---(1) install the `devtools` package: install.packages("devtools") ---(2) install via github: devtools::install_github("martinctc/surveytoolbox")
+-   `ufs` this package (short for user friendly science) is a nice tool for computing the internal reliability of scales -- note: one of the commands we will use in `ufs` requires the `psych` package to be installed (but doesn't need to be loaded via `library()`). Ensure you install that first. Two steps: ----(1) install the \`remotes\`\` package: install.packages("remotes") ----(2) install via github_lab: remotes::install_gitlab('r-packages/ufs')
+-   `apa` nice for making statistical output into APA style
+-   `gt` nice for making your tables look pretty
+-   `apaTables` makes nice APA-styled tables of correlation, ANOVA, regression etc. output
+-   `report` is a package to help with results reporting
+-   `psych` is an umbrella package for lots of common psych tasks
+-   `ez` is a great package for stats, including analysis of variance
+-   `emmeans` is helpful for comparing specific means in a factorial design
 
 # using inline code
 
 > JR maybe this piece needs to go in a separate chapter about writing with RMarkdown, papaja etc
 
-
 ```{r eval = FALSE}
 
 #pulls from the exclusions_summary tabyl created above
@@ -79,21 +68,16 @@ Use of inline code is really helpful in avoiding transcription errors and saving
 
 > INSERT INLINE EXAMPLE HERE
 
-
 # helpful console commands
 
-- names(objectname) - returns a list of variable names for that dataframe, making it less likely you will type things incorrectly
-- getwd() - returns the path to the current working directory. Run this in the console.
-- rm(objectname) - removes the object from your global environment. Can be helpful in cleaning up any 'test' objects you make while troubleshooting code.
-- ?package - brings up the Help info for that package
-- ?function - brings up the Help info for that function
+-   names(objectname) - returns a list of variable names for that dataframe, making it less likely you will type things incorrectly
+-   getwd() - returns the path to the current working directory. Run this in the console.
+-   rm(objectname) - removes the object from your global environment. Can be helpful in cleaning up any 'test' objects you make while troubleshooting code.
+-   ?package - brings up the Help info for that package
+-   ?function - brings up the Help info for that function
 
 # useful keyboard shortcuts
 
-Option-Command-I = inserts a new code chunk
-Command-Enter = runs the chunk of code that your cursor is in
-
+Option-Command-I = inserts a new code chunk Command-Enter = runs the chunk of code that your cursor is in
 
 # commonly encountered errors
-
-
diff --git a/data/my_csv_data.csv b/data/my_csv_data.csv
@@ -0,0 +1,6 @@
+age,gender,score
+24,M,45
+22,F,67
+21,M,33
+18,M,44
+23,F,78
diff --git a/data/my_excel_data.xlsx b/data/my_excel_data.xlsx
diff --git a/import.qmd b/import.qmd
@@ -2,40 +2,59 @@
 
 # Packages for this chapter
 
+
 ```{r}
+#| warning: false
+#| message: false
 library(tidyverse)
+library(readxl)
 library(here)
 library(janitor)
 library(haven)
 library(sjPlot)
+library(surveytoolbox) 
+
+# note surveytoolbox installs from github
 # remotes::install_github("martinctc/surveytoolbox")
-library(surveytoolbox)
 ```
 
+# How to read data into R
 
-## Reading in Excel spreadsheets
+The code you need to read your data into R depends on the kind of data you are dealing with. Here we will demo how to .csv files, Excel spreadsheets, and then go deep on how to deal with data from Qualtrics. 
 
-This is gobbledygook.
+## Reading in .csv files
 
-### Reading in .csv
+Exporting data in its simplest form (comma separated values) means that your data is readable by most software, trackable by version control systems, and lightweight. 
 
-## Reading in SPSS 
+Use the following code to read in a csv file. Remember that we use the `here()` function to tell R where to find the data file, relative to the top level of the project file. 
 
-## Reading in Qualtrics data
+```{r}
+data1 <- read_csv(here("data", "my_csv_data.csv"))
+```
+> NOTE: the message above prints in red in RStudio, but its not an error. Just a message telling you that this dataset has 5 rows and 3 columns. It also has information about the type of data that R thinks each variable is. Here R thinks the gender variable is character (strings/text) and the age and score variables are double (R speak for numeric). 
 
-### Reading in .sav
+## Reading in Excel spreadsheets
 
-# Read in the data
+Sometimes your data is in .xlsx format. You can use the `readxl` package to read spreadsheets into R. You can get a sense for the first few rows of your dataframe using the `head()` function. 
 
-Remember the file setup described above? This is where that starts to be important. Remember, our working directory (i.e., where R thinks "here" is) was set via the Rproj file -- so it is the "Williams Lab Core R" folder. You can check this by typing `getwd()` or `here()` in the console. 
+```{r}
+data2 <- read_xlsx(here("data", "my_excel_data.xlsx"))
 
-For most of this core script, we'll be using data from a file called sampledata.sav, which should be in the data subfolder from the zipped file. If not, sort that out now!
+head(data2)
 
-A .sav file is in SPSS format. When you export from Qualtrics into .sav/SPSS format, it retains helpful information like question wording and response labels. If you export straight to .csv, you lose that info and will find yourself cross-checking back to Qualtrics. So, strong word of advice to always export to .sav.
+```
 
-The code below uses the `here` command to direct R to the data folder *from the working directory*, and then the .sav file within it.
 
-The `glimpse` command gives a nice overview of the variables, their type, and a preview of the data.
+## Reading in Qualtrics data
+
+If you are collecting survey data, you are probably using Qualtrics. You can export your Qualtrics data in lots of different formats, but we advocate for exporting it as a .sav file. 
+
+Yes this is typically the format used in SPSS. When you export from Qualtrics into .sav/SPSS format, it retains helpful information like question wording and response labels. If you export straight to .csv, you lose that info and will find yourself cross-checking back to Qualtrics. So, strong word of advice to always export to .sav.
+it is handy because it keeps extra information about your variables in a set of labels, that you can use down the track. 
+
+From here, we'll be using data from a file called sampledata.sav, which you can find in the `data` folder. We are using `read_sav()` from the `haven` package. 
+
+The `glimpse` function gives a nice overview of the variables, their type, and a preview of the data.
 
 ```{r}
 
@@ -45,18 +64,22 @@ glimpse(data)
 
 ```
 
-
 These variable names won't be very nice to work with with awkward and inconsistent capitalisation. Actual Qualtrics exports are even messier!
 
 The `clean_names` function from `janitor` helps clean them up!
 
-`data_cleanednames <-` at the start of the line saves the change to a new dataframe. Alternately, you could write it back to the same dataframe (e.g., `data <-` ), but this should be done very intentionally as it makes it harder to backtrack to the source of an error. The general rule is to create a new dataframe each time you implement a big change on the data.
+Here we take our data dataframe, and pipe %>% it into `clean_names`, and then assign (<-) to a new object called data_cleanednames 
+
+Alternately, you could write it back to the same dataframe (e.g., data <- ), but this should be done very intentionally as when you overwrite dataframes, it can be difficult to debug your code when you get errors. 
+
+The general rule is to create a new dataframe each time you implement a big change on the data.
 
 The `glimpse` command here shows you that you effectively cleaned the variable names!
 
 ```{r}
 
-data_cleanednames <- clean_names(data)
+data_cleanednames <- data %>%
+  clean_names()
 
 glimpse(data_cleanednames)
 
@@ -68,6 +91,10 @@ If you look at the variable types at the right of the `glimpse` output, you'll s
 
 Having this information on hand is really helpful when working with your data!
 
+## Codebooks & Data dictionaries
+
+When you have a really large survey dataset, information about what each variable refers to is essential to a reproducable analysis. Often it is helpful to create a codebook or data dictionary that you can share alongside the datafile, that helps the user understand what the numbers in the file refer to and where they came time. 
+
 The `view_df` function from the `sjPlot` package creates a really nicely formatted html file that includes variable names, question wording, response options, and response labelling. This code saves the html file to the `output_files` folder using the `here` package (which starts where your Rproj file is). This html file is nice as a reference for your own use or to share with a supervisor or collaborator!
 
 ```{r }
@@ -85,14 +112,20 @@ datadictionary <- data_cleanednames %>%
   data_dict()
 ```
 
-Let's say you just want to know the question wording or response labels for a particular variable, you can do this via code rather than checking the whole dataset. The `extract_vallab` command from `surveytoolbox` returns the value labels for a given variable.
+Let's say you just want to know the question wording or response labels for a particular variable, you can do this with code rather than checking the whole dataset. The `extract_vallab` command from `surveytoolbox` returns the value labels for a given variable.
 
+Here we are interested in what the values in the demographicscateg variable 
+refer to. 
 ```{r }
 data_cleanednames %>%
   extract_vallab("demographicscateg")
 ```
 
-There are (evidently) times when packages *do not like* labelled data. So, here are a few tools for removing labels from the `haven` package. Keep these up your sleeve for problem solving later! `zap_labels` and `zap_label` not surprisingly removes the labels - the first removes the value labels and the second removes the variable labels! The code below makes a new data dictionary of the zapped dataframe and glimpses the new dataframe to confirm the labels are gone.
+There are (evidently) times when packages *do not like* labelled data. So, here are a few tools for removing labels from the `haven` package. Keep these up your sleeve for problem solving later! 
+
+`zap_labels` and `zap_label` each remove labels. Yes it would be nice if those functions were easier to distinguish! The first zaps variable labels, and the second zaps value labels. 
+
+The code below makes a new data dictionary of the zapped dataframe and glimpses the new dataframe to confirm the labels are gone.
 
 ```{r }