diff --git a/00-introduction.md b/00-introduction.md index 87f7f1b4..55e0b010 100644 --- a/00-introduction.md +++ b/00-introduction.md @@ -275,7 +275,7 @@ First, lets see what directory we are in. To do so, type the following command into the script: -```r +``` r getwd() ``` @@ -299,7 +299,7 @@ not try to interpret as code. Edit your script to include a comment on the purpose of commands you are learning, e.g.: -```r +``` r # this command shows the current working directory getwd() ``` @@ -330,7 +330,7 @@ What if you weren't? You can set your home directory using the `setwd()` command. Enter this command in your script, but *don't run* this yet. -```r +``` r # This sets the working directory setwd() ``` @@ -343,7 +343,7 @@ advantage of RStudio's Tab-autocompletion method, to select `home`, `dcuser`, and `dc_genomics_r` directory. The path in your script should look like this: -```r +``` r # This sets the working directory setwd("/home/dcuser/dc_genomics_r") ``` @@ -423,12 +423,12 @@ function's behavior. For example the function `round()` will round a number with a decimal: -```r +``` r # This will round a number to the nearest integer round(3.14) ``` -```output +``` output [1] 3 ``` @@ -440,7 +440,7 @@ do this, but you may first need to read the help to find out how. To see the hel name: -```r +``` r ?round() ``` @@ -451,11 +451,11 @@ also see what arguments we can pass to this function to modify its behavior. You can also see a function's argument using the `args()` function: -```r +``` r args(round) ``` -```output +``` output function (x, digits = 0, ...) NULL ``` @@ -469,11 +469,11 @@ a different value. We can explicitly set the digits parameter when we call the function: -```r +``` r round(3.14159, digits = 2) ``` -```output +``` output [1] 3.14 ``` @@ -483,18 +483,18 @@ when we used `args()`. In the case below that means that `x` is 3.14159 and digits is 2. -```r +``` r round(3.14159, 2) ``` -```output +``` output [1] 3.14 ``` Finally, what if you are using `?` to get help for a function in a package not installed on your system, such as when you are running a script which has dependencies. -```r +``` r ?geom_point() ``` diff --git a/01-r-basics.md b/01-r-basics.md index 9f02c60a..99b98bb2 100644 --- a/01-r-basics.md +++ b/01-r-basics.md @@ -111,7 +111,7 @@ assign '1' to the object 'first_value' as shown. Remember to leave a comment in above (using the '#') to explain what you are doing: -```r +``` r # this line creates the object 'first_value' and assigns it the value '1' first_value <- 1 @@ -166,7 +166,7 @@ Create the following objects; give each object an appropriate name Here as some possible answers to the challenge: -```r +``` r human_chr_number <- 23 gene_name <- 'pten' ensemble_url <- 'ftp://ftp.ensemblgenomes.org/pub/bacteria/release-39/fasta/bacteria_5_collection/escherichia_coli_b_str_rel606/' @@ -232,7 +232,7 @@ may or may not be a good thing depending on how you look at it. -```r +``` r # gene_name has the value 'pten' or whatever value you used in the challenge. # We will now assign the new value 'tp53' gene_name <- 'tp53' @@ -242,7 +242,7 @@ You can also remove an object from R's memory entirely. The `rm()` function will delete the object. -```r +``` r # delete the object 'gene_name' rm(gene_name) ``` @@ -334,44 +334,44 @@ their modes. Try to guess what the mode will be before you look at the solution -```r +``` r mode(chromosome_name) ``` -```output +``` output [1] "character" ``` -```r +``` r mode(od_600_value) ``` -```output +``` output [1] "numeric" ``` -```r +``` r mode(chr_position) ``` -```output +``` output [1] "character" ``` -```r +``` r mode(spock) ``` -```output +``` output [1] "logical" ``` -```r +``` r mode(pilot) ``` -```error +``` error Error in eval(expr, envir, enclos): object 'pilot' not found ``` @@ -395,44 +395,44 @@ to check their classes. -```r +``` r class(chromosome_name) ``` -```output +``` output [1] "character" ``` -```r +``` r class(od_600_value) ``` -```output +``` output [1] "numeric" ``` -```r +``` r class(chr_position) ``` -```output +``` output [1] "character" ``` -```r +``` r class(spock) ``` -```output +``` output [1] "logical" ``` -```r +``` r class(pilot) ``` -```error +``` error Error in eval(expr, envir, enclos): object 'pilot' not found ``` @@ -453,22 +453,22 @@ called `pilot` that was the **name** "Earhart", we need to enclose `Earhart` in quotation marks. -```r +``` r pilot <- "Earhart" mode(pilot) ``` -```output +``` output [1] "character" ``` -```r +``` r pilot <- "Earhart" typeof(pilot) ``` -```output +``` output [1] "character" ``` @@ -492,11 +492,11 @@ can be added, multiplied, divided, etc. R provides several mathematical These can be used with literal numbers: -```r +``` r (1 + (5 ** 0.5))/2 ``` -```output +``` output [1] 1.618034 ``` @@ -506,13 +506,13 @@ by R) a numeric object: -```r +``` r # multiply the object 'human_chr_number' by 2 human_chr_number * 2 ``` -```output +``` output [1] 46 ``` @@ -530,11 +530,11 @@ functions. Hint: remember the `round()` function can take 2 arguments. ## Solution -```r +``` r round((1 + sqrt(5))/2, digits = 3) ``` -```output +``` output [1] 1.618 ``` @@ -557,7 +557,7 @@ ways to create a vector is to use the `c()` function - the "concatenate" or multiple values, separate each value with a comma: -```r +``` r # Create the SNP gene name vector snp_genes <- c("OXTR", "ACTN3", "AR", "OPRM1") @@ -569,28 +569,28 @@ Another useful function that gives both of these pieces of information is the `str()` (structure) function. -```r +``` r # Check the mode, length, and structure of 'snp_genes' mode(snp_genes) ``` -```output +``` output [1] "character" ``` -```r +``` r length(snp_genes) ``` -```output +``` output [1] 4 ``` -```r +``` r str(snp_genes) ``` -```output +``` output chr [1:4] "OXTR" "ACTN3" "AR" "OPRM1" ``` @@ -604,7 +604,7 @@ when we start working with data frames. Let's create a few more vectors to play around with: -```r +``` r # Some interesting human SNPs # while accuracy is important, typos in the data won't hurt you here @@ -619,12 +619,12 @@ the name of the vector followed by square brackets. In those square brackets we place the index (e.g. a number) in that bracket as follows: -```r +``` r # get the 3rd value in the snp vector snps[3] ``` -```output +``` output [1] "rs6152" ``` @@ -633,13 +633,13 @@ through to the final number of items in your vector. You can also retrieve a range of numbers: -```r +``` r # get the 1st through 3rd value in the snp vector snps[1:3] ``` -```output +``` output [1] "rs53576" "rs1815739" "rs6152" ``` @@ -648,13 +648,13 @@ a vector, you pass a **vector of indices**; a vector that has the numbered positions you wish to retrieve. -```r +``` r # get the 1st, 3rd, and 4th value in the snp vector snps[c(1, 3, 4)] ``` -```output +``` output [1] "rs53576" "rs6152" "rs1799971" ``` @@ -664,13 +664,13 @@ examples](https://thomasleeper.com/Rcourse/Tutorials/vectorindexing.html)). Also, several of these subsetting expressions can be combined: -```r +``` r # get the 1st through the 3rd value, and 4th value in the snp vector # yes, this is a little silly in a vector of only 4 values. snps[c(1:3,4)] ``` -```output +``` output [1] "rs53576" "rs1815739" "rs6152" "rs1799971" ``` @@ -680,7 +680,7 @@ Once you have an existing vector, you may want to add a new item to it. To do so, you can use the `c()` function again to add your new value: -```r +``` r # add the gene "CYP1A1" and "APOA5" to our list of snp genes # this overwrites our existing vector snp_genes <- c(snp_genes, "CYP1A1", "APOA5") @@ -689,11 +689,11 @@ snp_genes <- c(snp_genes, "CYP1A1", "APOA5") We can verify that "snp\_genes" contains the new gene entry -```r +``` r snp_genes ``` -```output +``` output [1] "OXTR" "ACTN3" "AR" "OPRM1" "CYP1A1" "APOA5" ``` @@ -701,35 +701,35 @@ Using a negative index will return a version of a vector with that index's value removed: -```r +``` r snp_genes[-6] ``` -```output +``` output [1] "OXTR" "ACTN3" "AR" "OPRM1" "CYP1A1" ``` We can remove that value from our vector by overwriting it with this expression: -```r +``` r snp_genes <- snp_genes[-6] snp_genes ``` -```output +``` output [1] "OXTR" "ACTN3" "AR" "OPRM1" "CYP1A1" ``` We can also explicitly rename or add a value to our index using double bracket notation: -```r +``` r snp_genes[6]<- "APOA5" snp_genes ``` -```output +``` output [1] "OXTR" "ACTN3" "AR" "OPRM1" "CYP1A1" "APOA5" ``` @@ -770,11 +770,11 @@ F) True There is one last set of cool subsetting capabilities we want to introduce. It is possible within R to retrieve items in a vector based on a logical evaluation or numerical comparison. For example, let's say we wanted get all of the SNPs in our vector of SNP positions that were greater than 100,000,000. We could index using the '>' (greater than) logical operator: -```r +``` r snp_positions[snp_positions > 100000000] ``` -```output +``` output [1] 154039662 ``` @@ -801,11 +801,11 @@ can be better understood if you examine what the expression "snp\_positions > 10 evaluates to: -```r +``` r snp_positions > 100000000 ``` -```output +``` output [1] FALSE FALSE FALSE TRUE ``` @@ -813,11 +813,11 @@ The output above is a logical vector, the 4th element of which is TRUE. When you pass a logical vector as an index, R will return the true values: -```r +``` r snp_positions[c(FALSE, FALSE, FALSE, TRUE)] ``` -```output +``` output [1] 154039662 ``` @@ -836,11 +836,11 @@ We can use the `which()` function to return the indices of any item that evaluates as TRUE in our comparison: -```r +``` r which(snp_positions > 100000000) ``` -```output +``` output [1] 4 ``` @@ -852,12 +852,12 @@ pre-determined value (e.g 100000000) we can use an object that can take on whatever value we need. So for example: -```r +``` r snp_marker_cutoff <- 100000000 snp_positions[snp_positions > snp_marker_cutoff] ``` -```output +``` output [1] 154039662 ``` @@ -876,14 +876,14 @@ but the `is.NA()` function will return a logical vector, with TRUE for any NA value: -```r +``` r # current value of 'snp_genes': # chr [1:7] "OXTR" "ACTN3" "AR" "OPRM1" "CYP1A1" NA "APOA5" is.na(snp_genes) ``` -```output +``` output [1] FALSE FALSE FALSE FALSE FALSE FALSE ``` @@ -893,7 +893,7 @@ will return TRUE for any value in your collection that is in the vector you are searching: -```r +``` r # current value of 'snp_genes': # chr [1:7] "OXTR" "ACTN3" "AR" "OPRM1" "CYP1A1" NA "APOA5" @@ -903,7 +903,7 @@ the vector you are searching: c("ACTN3","APOA5") %in% snp_genes ``` -```output +``` output [1] TRUE TRUE ``` @@ -938,27 +938,27 @@ c. `snp_positions` ## Solution -```r +``` r mode(snps) ``` -```output +``` output [1] "character" ``` -```r +``` r mode(snp_chromosomes) ``` -```output +``` output [1] "character" ``` -```r +``` r mode(snp_positions) ``` -```output +``` output [1] "numeric" ``` @@ -980,30 +980,30 @@ c. To the `snp_positions` vector add: 116792991 ## Solution -```r +``` r snps <- c(snps, "rs662799") snps ``` -```output +``` output [1] "rs53576" "rs1815739" "rs6152" "rs1799971" "rs662799" ``` -```r +``` r snp_chromosomes <- c(snp_chromosomes, "11") # did you use quotes? snp_chromosomes ``` -```output +``` output [1] "3" "11" "X" "6" "11" ``` -```r +``` r snp_positions <- c(snp_positions, 116792991) snp_positions ``` -```output +``` output [1] 8762685 66560624 67545785 154039662 116792991 ``` @@ -1030,13 +1030,13 @@ b. Add 2 NA values to the end of `snp_genes` ## Solution -```r +``` r snp_genes <- snp_genes[-5] snp_genes <- c(snp_genes, NA, NA) snp_genes ``` -```output +``` output [1] "OXTR" "ACTN3" "AR" "OPRM1" "APOA5" NA NA ``` @@ -1060,12 +1060,12 @@ Using indexing, create a new vector named `combined` that contains: ## Solution -```r +``` r combined <- c(snp_genes[1], snps[1], snp_chromosomes[1], snp_positions[1]) combined ``` -```output +``` output [1] "OXTR" "rs53576" "3" "8762685" ``` @@ -1084,11 +1084,11 @@ What type of data is `combined`? ## Solution -```r +``` r typeof(combined) ``` -```output +``` output [1] "character" ``` @@ -1109,7 +1109,7 @@ tutorial](https://r4ds.had.co.nz/vectors.html#lists). In this one example, we wi a named list and show you how to retrieve items from the list. -```r +``` r # Create a named list using the 'list' function and our SNP examples # Note, for easy reading we have placed each item in the list on a separate line # Nothing special about this, you can do this for any multiline commands @@ -1125,7 +1125,7 @@ snp_data <- list(genes = snp_genes, str(snp_data) ``` -```output +``` output List of 4 $ genes : chr [1:7] "OXTR" "ACTN3" "AR" "OPRM1" ... $ refference_snp: chr [1:5] "rs53576" "rs1815739" "rs6152" "rs1799971" ... @@ -1136,26 +1136,26 @@ List of 4 To get all the values for the `position` object in the list, we use the `$` notation: -```r +``` r # return all the values of position object snp_data$position ``` -```output +``` output [1] 8762685 66560624 67545785 154039662 116792991 ``` To get the first value in the `position` object, use the `[]` notation to index: -```r +``` r # return first value of the position object snp_data$position[1] ``` -```output +``` output [1] 8762685 ``` ::::::::::::::::::::::::::::::::::::::::: diff --git a/03-basics-factors-dataframes.md b/03-basics-factors-dataframes.md index e5da1bcf..b32037ac 100644 --- a/03-basics-factors-dataframes.md +++ b/03-basics-factors-dataframes.md @@ -168,7 +168,7 @@ use tab autocompletion. **If you use tab autocompletion you avoid typos and errors in file paths.** Use it! -```r +``` r ## read in a CSV file and save it as 'variants' variants <- read.csv("/home/dcuser/r_data/combined_tidy_vcf.csv") @@ -192,13 +192,13 @@ including some summary statistics as well as well as the "structure" of the data frame. Let's examine what each of these functions can tell us: -```r +``` r ## get summary statistics on a data frame summary(variants) ``` -```output +``` output sample_id CHROM POS ID Length:801 Length:801 Min. : 1521 Mode:logical Class :character Class :character 1st Qu.:1115970 NA's:801 @@ -268,7 +268,7 @@ There is a lot to work with, so we will subset the first three columns into a new data frame using the `data.frame()` function. -```r +``` r ## put the first three columns of variants into a new data frame called subset subset <- data.frame(variants[, c(1:3, 6)]) @@ -278,13 +278,13 @@ Now, let's use the `str()` (structure) function to look a little more closely at how data frames work: -```r +``` r ## get the structure of a data frame str(subset) ``` -```output +``` output 'data.frame': 801 obs. of 4 variables: $ sample_id: chr "SRR2584863" "SRR2584863" "SRR2584863" "SRR2584863" ... $ CHROM : chr "CP000819.1" "CP000819.1" "CP000819.1" "CP000819.1" ... @@ -319,22 +319,22 @@ Ok, thats a lot up unpack! Some things to notice. - ```r + ``` r mode(variants) ``` - ```output + ``` output [1] "list" ``` - ```r + ``` r class(variants) ``` - ```output + ``` output [1] "data.frame" ``` @@ -361,7 +361,7 @@ factors. To do this we'll take a look at just the alternate alleles. We can use to access or extract a column by its name in data frames (or to extract objects within named lists). -```r +``` r ## extract the "ALT" column to a new object alt_alleles <- subset$ALT @@ -370,11 +370,11 @@ alt_alleles <- subset$ALT Let's look at the first few items in our factor using `head()`: -```r +``` r head(alt_alleles) ``` -```output +``` output [1] "G" "T" "T" "CTTTTTTTT" "CCGCGC" "T" ``` @@ -383,7 +383,7 @@ single-nucleotide alleles (SNPs). We can use some of the vector indexing skills from the last episode. -```r +``` r snps <- c(alt_alleles[alt_alleles == "A"], alt_alleles[alt_alleles=="T"], alt_alleles[alt_alleles=="G"], @@ -397,23 +397,23 @@ example, we can try to generate a plot of this character vector as it is right now: -```r +``` r plot(snps) ``` -```warning +``` warning Warning in xy.coords(x, y, xlabel, ylabel, log): NAs introduced by coercion ``` -```warning +``` warning Warning in min(x): no non-missing arguments to min; returning Inf ``` -```warning +``` warning Warning in max(x): no non-missing arguments to max; returning -Inf ``` -```error +``` error Error in plot.window(...): need finite 'ylim' values ``` @@ -423,18 +423,18 @@ as categories (i.e. a factor vector); we will create a new object to avoid confusion using the `factor()` function: -```r +``` r factor_snps <- factor(snps) ``` Let's learn a little more about this new type of vector: -```r +``` r str(factor_snps) ``` -```output +``` output Factor w/ 4 levels "A","C","G","T": 1 1 1 1 1 1 1 1 1 1 ... ``` @@ -454,11 +454,11 @@ the first few items in our factor are all "A"s. We can see how many items in our vector fall into each category: -```r +``` r summary(factor_snps) ``` -```output +``` output A C G T 211 139 154 203 ``` @@ -485,7 +485,7 @@ values. For example, suppose we want to know how many of our variants had each possible SNP we could generate a plot: -```r +``` r plot(factor_snps) ``` @@ -501,7 +501,7 @@ What if we wanted to order our plot according to the numerical value (i.e., in descending order of SNP frequency)? We can enforce an order on our factors: -```r +``` r ordered_factor_snps <- factor(factor_snps, levels = names(sort(table(factor_snps)))) ``` @@ -522,7 +522,7 @@ to see why this works): Now we see our plot has be reordered: -```r +``` r plot(ordered_factor_snps) ``` @@ -547,19 +547,19 @@ it will look for a CRAN repository to install from. So, for example, to install (which you'll do in the next few lessons), you would use the following command: -```r +``` r # install a package from CRAN install.packages("ggplot2") ``` -```output +``` output The following package(s) will be installed: - ggplot2 [3.5.1] These packages will be installed into "~/work/genomics-r-intro/genomics-r-intro/renv/profiles/lesson-requirements/renv/library/linux-ubuntu-jammy/R-4.4/x86_64-pc-linux-gnu". # Installing packages -------------------------------------------------------- - Installing ggplot2 ... OK [linked from cache] -Successfully installed 1 package in 6.7 milliseconds. +Successfully installed 1 package in 6.5 milliseconds. ``` :::::::::::::::::::::::::::::::::::::::::::::::::: @@ -610,44 +610,44 @@ l. `variants[variants$REF == "A",]` a. -```r +``` r variants[1, 1] ``` -```output +``` output [1] "SRR2584863" ``` b. -```r +``` r variants[2, 4] ``` -```output +``` output [1] NA ``` c. -```r +``` r variants[801, 29] ``` -```output +``` output [1] "T" ``` d. -```r +``` r variants[2, ] ``` -```output +``` output sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP VDB 2 SRR2584863 CP000819.1 263235 NA G T 85 NA FALSE NA NA 6 0.096133 RPB MQB BQB MQSB SGB MQ0F ICB HOB AC AN DP4 MQ @@ -661,12 +661,12 @@ variants[2, ] e. -```r +``` r variants[-1, ] ``` -```output +``` output sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF 2 SRR2584863 CP000819.1 263235 NA G T 85 NA FALSE NA NA 3 SRR2584863 CP000819.1 281923 NA G T 217 NA FALSE NA NA @@ -700,22 +700,22 @@ variants[-1, ] f. -```r +``` r variants[1:4, 1] ``` -```output +``` output [1] "SRR2584863" "SRR2584863" "SRR2584863" "SRR2584863" ``` g. -```r +``` r variants[1:10, c("REF", "ALT")] ``` -```output +``` output REF 1 T 2 G @@ -743,12 +743,12 @@ variants[1:10, c("REF", "ALT")] h. -```r +``` r variants[, c("sample_id")] ``` -```output +``` output [1] "SRR2584863" "SRR2584863" "SRR2584863" "SRR2584863" "SRR2584863" [6] "SRR2584863" ``` @@ -756,11 +756,11 @@ variants[, c("sample_id")] i. -```r +``` r head(variants) ``` -```output +``` output sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF 1 SRR2584863 CP000819.1 9972 NA T G 91 NA FALSE NA NA 2 SRR2584863 CP000819.1 263235 NA G T 85 NA FALSE NA NA @@ -794,11 +794,11 @@ head(variants) j. -```r +``` r tail(variants) ``` -```output +``` output sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP 796 SRR2589044 CP000819.1 3444175 NA G T 184 NA FALSE NA NA 9 797 SRR2589044 CP000819.1 3481820 NA A G 225 NA FALSE NA NA 12 @@ -832,12 +832,12 @@ tail(variants) k. -```r +``` r variants$sample_id ``` -```output +``` output [1] "SRR2584863" "SRR2584863" "SRR2584863" "SRR2584863" "SRR2584863" [6] "SRR2584863" ``` @@ -845,12 +845,12 @@ variants$sample_id l. -```r +``` r variants[variants$REF == "A", ] ``` -```output +``` output sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP 11 SRR2584863 CP000819.1 2407766 NA A C 104 NA FALSE NA NA 9 12 SRR2584863 CP000819.1 2446984 NA A C 225 NA FALSE NA NA 20 @@ -906,7 +906,7 @@ the screen. You can create a new data frame object by assigning them to a new object name: -```r +``` r # create a new data frame containing only observations from SRR2584863 SRR2584863_variants <- variants[variants$sample_id == "SRR2584863", ] @@ -916,17 +916,17 @@ SRR2584863_variants <- variants[variants$sample_id == "SRR2584863", ] dim(SRR2584863_variants) ``` -```output +``` output [1] 25 29 ``` -```r +``` r # get a summary of the data frame summary(SRR2584863_variants) ``` -```output +``` output sample_id CHROM POS ID Length:25 Length:25 Min. : 9972 Mode:logical Class :character Class :character 1st Qu.:1331794 NA's:25 @@ -1012,12 +1012,12 @@ This can be a good thing when R gets it right, or a bad thing when the result is not what you expect. Consider: -```r +``` r snp_chromosomes <- c('3', '11', 'X', '6') typeof(snp_chromosomes) ``` -```output +``` output [1] "character" ``` @@ -1026,20 +1026,20 @@ we have explicitly told R to consider them as characters. However, even if we re the quotes from the numbers, R would coerce everything into a character: -```r +``` r snp_chromosomes_2 <- c(3, 11, 'X', 6) typeof(snp_chromosomes_2) ``` -```output +``` output [1] "character" ``` -```r +``` r snp_chromosomes_2[1] ``` -```output +``` output [1] "3" ``` @@ -1048,40 +1048,40 @@ another. Consider the following vector of characters, which all happen to be valid numbers: -```r +``` r snp_positions_2 <- c("8762685", "66560624", "67545785", "154039662") typeof(snp_positions_2) ``` -```output +``` output [1] "character" ``` -```r +``` r snp_positions_2[1] ``` -```output +``` output [1] "8762685" ``` Now we can coerce `snp_positions_2` into a numeric type using `as.numeric()`: -```r +``` r snp_positions_2 <- as.numeric(snp_positions_2) typeof(snp_positions_2) ``` -```output +``` output [1] "double" ``` -```r +``` r snp_positions_2[1] ``` -```output +``` output [1] 8762685 ``` @@ -1089,11 +1089,11 @@ Sometimes coercion is straight forward, but what would happen if we tried using `as.numeric()` on `snp_chromosomes_2` -```r +``` r snp_chromosomes_2 <- as.numeric(snp_chromosomes_2) ``` -```warning +``` warning Warning: NAs introduced by coercion ``` @@ -1101,11 +1101,11 @@ If we check, we will see that an `NA` value (R's default value for missing data) has been introduced. -```r +``` r snp_chromosomes_2 ``` -```output +``` output [1] 3 11 NA 6 ``` @@ -1114,11 +1114,11 @@ try to coerce the `factor_snps` vector into a numeric mode look at the result: -```r +``` r as.numeric(factor_snps) ``` -```output +``` output [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 @@ -1151,7 +1151,7 @@ If you need to coerce an entire column you can overwrite it using an expression like this one: -```r +``` r # make the 'REF' column a character type column variants$REF <- as.character(variants$REF) @@ -1160,7 +1160,7 @@ variants$REF <- as.character(variants$REF) typeof(variants$REF) ``` -```output +``` output [1] "character" ``` @@ -1209,23 +1209,23 @@ individual column. Let's look at the "DP" or filtered depth. This value shows th reads that support each of the reported variants. -```r +``` r max(variants$DP) ``` -```output +``` output [1] 79 ``` You can sort a data frame using the `order()` function: -```r +``` r sorted_by_DP <- variants[order(variants$DP), ] head(sorted_by_DP$DP) ``` -```output +``` output [1] 2 2 2 2 2 2 ``` @@ -1242,12 +1242,12 @@ variants with the greatest filtered depth ("DP"). ## Solution -```r +``` r sorted_by_DP <- variants[order(variants$DP, decreasing = TRUE), ] head(sorted_by_DP$DP) ``` -```output +``` output [1] 79 46 41 29 29 27 ``` @@ -1258,14 +1258,14 @@ variants with the greatest filtered depth ("DP"). You can rename columns: -```r +``` r colnames(variants)[colnames(variants) == "sample_id"] <- "strain" # check the column name (hint names are returned as a vector) colnames(variants) ``` -```output +``` output [1] "strain" "CHROM" "POS" "ID" [5] "REF" "ALT" "QUAL" "FILTER" [9] "INDEL" "IDV" "IMF" "DP" @@ -1282,7 +1282,7 @@ We can save data to a file. We will save our `SRR2584863_variants` object to a .csv file using the `write.csv()` function: -```r +``` r write.csv(SRR2584863_variants, file = "data/SRR2584863_variants.csv") ``` @@ -1335,11 +1335,11 @@ frame: -```r +``` r head(Ecoli_metadata) ``` -```output +``` output # A tibble: 6 × 7 sample generation clade strain cit run genome_size @@ -1382,52 +1382,52 @@ H) Save the edited Ecoli\_metadata data frame as "exercise\_solution.csv" in you ## Solution -```r +``` r dim(Ecoli_metadata) ``` -```output +``` output [1] 30 7 ``` -```r +``` r levels(as.factor(Ecoli_metadata$cit)) ``` -```output +``` output [1] "minus" "plus" "unknown" ``` -```r +``` r table(as.factor(Ecoli_metadata$cit)) ``` -```output +``` output minus plus unknown 9 9 12 ``` -```r +``` r Ecoli_metadata[7, 7] ``` -```output +``` output # A tibble: 1 × 1 genome_size 1 4.62 ``` -```r +``` r median(Ecoli_metadata$genome_size) ``` -```output +``` output [1] 4.625 ``` -```r +``` r colnames(Ecoli_metadata)[colnames(Ecoli_metadata) == "sample"] <- "sample_id" Ecoli_metadata$genome_size_bp <- Ecoli_metadata$genome_size * 1000000 write.csv(Ecoli_metadata, file = "exercise_solution.csv") diff --git a/04-bioconductor-vcfr.md b/04-bioconductor-vcfr.md index 8e290e5d..541e74d5 100644 --- a/04-bioconductor-vcfr.md +++ b/04-bioconductor-vcfr.md @@ -37,7 +37,7 @@ Since access to the [Bioconductor](https://bioconductor.org/) repository is not The first step is to install a package that *is* on CRAN, `BiocManager`. This package will allow us to use it to install packages from Bioconductor. You can think of Bioconductor kind of like an alternative app store for your phone, except instead of apps you are installing packages, and instead of your phone it's your local R package library. -```r +``` r # install the BiocManager from CRAN using the base R install.packages() function install.packages("BiocManager") ``` @@ -45,7 +45,7 @@ install.packages("BiocManager") To check if this worked (and also so you can make a note of the version for reproducibility purposes), you can run `BiocManager::version()` and it should give you the version number. -```r +``` r # to make sure it worked, check the version BiocManager::version() ``` @@ -61,7 +61,7 @@ Just be aware that installing packages that have many dependencies can take a wh :::::::::::::::::::::::::::::::::::::::::::::::::: -```r +``` r # install the vcfR package from bioconductor using BiocManager::install() BiocManager::install("vcfR") ``` diff --git a/05-dplyr.md b/05-dplyr.md index 2a04388b..1423fb58 100644 --- a/05-dplyr.md +++ b/05-dplyr.md @@ -48,7 +48,7 @@ packages give you access to more functions. You need to install a package and then load it to be able to use it. -```r +``` r install.packages("dplyr") ## installs dplyr package install.packages("tidyr") ## installs tidyr package install.packages("ggplot2") ## installs ggplot2 package @@ -59,7 +59,7 @@ You might get asked to choose a CRAN mirror -- this is asking you to choose a site to download the package from. The choice doesn't matter too much; I'd recommend choosing the RStudio mirror. -```r +``` r library("dplyr") ## loads in dplyr package to use library("tidyr") ## loads in tidyr package to use library("ggplot2") ## loads in ggplot2 package to use @@ -114,7 +114,7 @@ Now let's load our vcf .csv file using `read_csv()`: Similar to `str()`, which comes built into R, `glimpse()` is a `dplyr` function that (as the name suggests) gives a glimpse of the data frame. -```output +``` output Rows: 801 Columns: 29 $ sample_id "SRR2584863", "SRR2584863", "SRR2584863", "SRR2584863", … @@ -155,11 +155,11 @@ In the above output, we can already gather some information about `variants`, su To select columns of a data frame, use `select()`. The first argument to this function is the data frame (`variants`), and the subsequent arguments are the columns to keep. -```r +``` r select(variants, sample_id, REF, ALT, DP) ``` -```output +``` output # A tibble: 801 × 4 sample_id REF ALT DP @@ -180,11 +180,11 @@ To select all columns *except* certain ones, put a "-" in front of the variable to exclude it. -```r +``` r select(variants, -CHROM) ``` -```output +``` output # A tibble: 801 × 28 sample_id POS ID REF ALT QUAL FILTER INDEL IDV IMF DP @@ -207,11 +207,11 @@ select(variants, -CHROM) `dplyr` also provides useful functions to select columns based on their names. For instance, `ends_with()` allows you to select columns that ends with specific letters. For instance, if you wanted to select columns that end with the letter "B": -```r +``` r select(variants, ends_with("B")) ``` -```output +``` output # A tibble: 801 × 8 VDB RPB MQB BQB MQSB SGB ICB HOB @@ -241,7 +241,7 @@ Hint: look at for a function called `contains()`, which can be found in the help ## Solution -```r +``` r # First, we select "POS" and all columns with letter "i". This will contain columns Indiv and FILTER. variants_subset <- select(variants, POS, contains("i")) # Next, we remove columns Indiv and FILTER @@ -249,7 +249,7 @@ variants_result <- select(variants_subset, -Indiv, -FILTER) variants_result ``` -```output +``` output # A tibble: 801 × 7 POS sample_id ID INDEL IDV IMF ICB @@ -275,12 +275,12 @@ We can also get to `variants_result` in one line of code: ## Alternative solution -```r +``` r variants_result <- select(variants, POS, contains("i"), -Indiv, -FILTER) variants_result ``` -```output +``` output # A tibble: 801 × 7 POS sample_id ID INDEL IDV IMF ICB @@ -304,11 +304,11 @@ variants_result To choose rows, use `filter()`: -```r +``` r filter(variants, sample_id == "SRR2584863") ``` -```output +``` output # A tibble: 25 × 29 sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF @@ -333,12 +333,12 @@ filter(variants, sample_id == "SRR2584863") Here are a few examples: -```r +``` r # rows for which the reference genome has T or G filter(variants, REF %in% c("T", "G")) ``` -```output +``` output # A tibble: 340 × 29 sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP @@ -358,12 +358,12 @@ filter(variants, REF %in% c("T", "G")) # MQ , Indiv , gt_PL , gt_GT , gt_GT_alleles ``` -```r +``` r # rows that have TRUE in the column INDEL filter(variants, INDEL) ``` -```output +``` output # A tibble: 101 × 29 sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP @@ -383,12 +383,12 @@ filter(variants, INDEL) # MQ , Indiv , gt_PL , gt_GT , gt_GT_alleles ``` -```r +``` r # rows that don't have missing data in the IDV column filter(variants, !is.na(IDV)) ``` -```output +``` output # A tibble: 101 × 29 sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP @@ -415,12 +415,12 @@ existing at that site. `filter()` can be useful for selecting mutations that have a QUAL score above a certain threshold: -```r +``` r # rows with QUAL values greater than or equal to 100 filter(variants, QUAL >= 100) ``` -```output +``` output # A tibble: 666 × 29 sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP @@ -443,13 +443,13 @@ filter(variants, QUAL >= 100) `filter()` allows you to combine multiple conditions. You can separate them using a `,` as arguments to the function, they will be combined using the `&` (AND) logical operator. If you need to use the `|` (OR) logical operator, you can specify it explicitly: -```r +``` r # this is equivalent to: # filter(variants, sample_id == "SRR2584863" & QUAL >= 100) filter(variants, sample_id == "SRR2584863", QUAL >= 100) ``` -```output +``` output # A tibble: 19 × 29 sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP @@ -477,12 +477,12 @@ filter(variants, sample_id == "SRR2584863", QUAL >= 100) # MQ , Indiv , gt_PL , gt_GT , gt_GT_alleles ``` -```r +``` r # using `|` logical operator filter(variants, sample_id == "SRR2584863", (MQ >= 50 | QUAL >= 100)) ``` -```output +``` output # A tibble: 23 × 29 sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF @@ -517,11 +517,11 @@ Hint: to flip logical values such as TRUE to a FALSE, we can use to negation sym ## Solution -```r +``` r filter(variants, POS >= 1e6 & POS <= 2e6, QUAL > 200, !INDEL) ``` -```output +``` output # A tibble: 77 × 29 sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF DP @@ -558,13 +558,13 @@ part of `dplyr`. If you use RStudio, you can type the pipe with or Cmd + Shift + M if you're using a Mac. -```r +``` r variants %>% filter(sample_id == "SRR2584863") %>% select(REF, ALT, DP) ``` -```output +``` output # A tibble: 25 × 3 REF ALT DP @@ -599,7 +599,7 @@ If we want to create a new object with this smaller version of the data we can do so by assigning it a new name: -```r +``` r SRR2584863_variants <- variants %>% filter(sample_id == "SRR2584863") %>% select(REF, ALT, DP) @@ -609,11 +609,11 @@ This new object includes all of the data from this sample. Let's look at just the first six rows to confirm it's what we want: -```r +``` r SRR2584863_variants ``` -```output +``` output # A tibble: 25 × 3 REF ALT DP @@ -633,11 +633,11 @@ SRR2584863_variants Similar to `head()` and `tail()` functions, we can also look at the first or last six rows using tidyverse function `slice()`. Slice is a more versatile function that allows users to specify a range to view: -```r +``` r SRR2584863_variants %>% slice(1:6) ``` -```output +``` output # A tibble: 6 × 3 REF ALT DP @@ -650,11 +650,11 @@ SRR2584863_variants %>% slice(1:6) ``` -```r +``` r SRR2584863_variants %>% slice(10:25) ``` -```output +``` output # A tibble: 16 × 3 REF ALT DP @@ -690,14 +690,14 @@ Showing only 5th through 11th rows of columns `REF`, `ALT`, and `POS`. ## Solution -```r +``` r variants %>% filter(sample_id == "SRR2584863" & DP >= 10) %>% slice(5:11) %>% select(sample_id, DP, REF, ALT, POS) ``` -```output +``` output # A tibble: 7 × 5 sample_id DP REF ALT POS @@ -729,12 +729,12 @@ We can use `mutate` to add a column (`POLPROB`) to our `variants` data frame tha the probability of a polymorphism at that site given the data. -```r +``` r variants %>% mutate(POLPROB = 1 - (10 ^ -(QUAL/10))) ``` -```output +``` output # A tibble: 801 × 30 sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV IMF @@ -768,13 +768,13 @@ line to the above code to only show those columns. ## Solution -```r +``` r variants %>% mutate(POLPROB = 1 - 10 ^ -(QUAL/10)) %>% select(sample_id, POS, QUAL, POLPROB) ``` -```output +``` output # A tibble: 801 × 4 sample_id POS QUAL POLPROB @@ -806,13 +806,13 @@ We can use `group_by()` to tally the number of mutations detected in each sample using the function `tally()`: -```r +``` r variants %>% group_by(sample_id) %>% tally() ``` -```output +``` output # A tibble: 3 × 2 sample_id n @@ -824,12 +824,12 @@ variants %>% Since counting or tallying values is a common use case for `group_by()`, an alternative function was created to bypasses `group_by()` using the function `count()`: -```r +``` r variants %>% count(sample_id) ``` -```output +``` output # A tibble: 3 × 2 sample_id n @@ -849,12 +849,12 @@ variants %>% ## Solution -```r +``` r variants %>% count(INDEL) ``` -```output +``` output # A tibble: 2 × 2 INDEL n @@ -891,7 +891,7 @@ to use `na.rm = TRUE` (`rm` stands for remove). So to view the mean, median, maximum, and minimum filtered depth (`DP`) for each sample: -```r +``` r variants %>% group_by(sample_id) %>% summarize( @@ -901,7 +901,7 @@ variants %>% max_DP = max(DP)) ``` -```output +``` output # A tibble: 3 × 5 sample_id mean_DP median_DP min_DP max_DP @@ -928,23 +928,23 @@ It can sometimes be useful to transform the "long" tidy format, into the wide fo `pivot_wider()` takes a data frame as the first argument, and two arguments: the column name that will become the columns and the column name that will become the cells in the wide data. -```r +``` r variants_wide <- variants %>% group_by(sample_id, CHROM) %>% summarize(mean_DP = mean(DP)) %>% pivot_wider(names_from = sample_id, values_from = mean_DP) ``` -```output +``` output `summarise()` has grouped output by 'sample_id'. You can override using the `.groups` argument. ``` -```r +``` r variants_wide ``` -```output +``` output # A tibble: 1 × 4 CHROM SRR2584863 SRR2584866 SRR2589044 @@ -954,12 +954,12 @@ variants_wide The opposite operation of `pivot_wider()` is taken care by `pivot_longer()`. We specify the names of the new columns, and here add `-CHROM` as this column shouldn't be affected by the reshaping: -```r +``` r variants_wide %>% pivot_longer(-CHROM, names_to = "sample_id", values_to = "mean_DP") ``` -```output +``` output # A tibble: 3 × 3 CHROM sample_id mean_DP diff --git a/06-data-visualization.md b/06-data-visualization.md index 759c1133..17527ee9 100644 --- a/06-data-visualization.md +++ b/06-data-visualization.md @@ -51,37 +51,37 @@ The idea of **mapping** is crucial in **ggplot**. One familiar example is to *ma First, we need to install the `ggplot2` package. -```r +``` r install.packages("ggplot2") ``` Now, let's load the `ggplot2` package: -```r +``` r library(ggplot2) ``` We will also use some of the other tidyverse packages we used in the last episode, so we need to load them as well. -```r +``` r library(readr) library(dplyr) ``` -```output +``` output Attaching package: 'dplyr' ``` -```output +``` output The following objects are masked from 'package:stats': filter, lag ``` -```output +``` output The following objects are masked from 'package:base': intersect, setdiff, setequal, union @@ -92,18 +92,18 @@ As we can see from above output **`ggplot2`** has been already loaded along with ## Loading the dataset -```r +``` r variants <- read.csv("https://raw.githubusercontent.com/datacarpentry/genomics-r-intro/main/episodes/data/combined_tidy_vcf.csv") ``` Explore the *structure* (types of columns and number of rows) of the dataset using [dplyr](https://dplyr.tidyverse.org/index.html)'s [`glimpse()`](https://dplyr.tidyverse.org/reference/glimpse.html) (for more info, see the [Data Wrangling and Analyses with Tidyverse](https://datacarpentry.org/genomics-r-intro/05-dplyr/) episode) -```r +``` r glimpse(variants) # Show a snapshot of the rows and columns ``` -```output +``` output Rows: 801 Columns: 29 $ sample_id "SRR2584863", "SRR2584863", "SRR2584863", "SRR2584863", … @@ -140,7 +140,7 @@ $ gt_GT_alleles "G", "T", "T", "CTTTTTTTT", "CCGCGC", "T", "A", "A", "AC Alternatively, we can display the first a few rows (vertically) of the table using [`head()`](https://www.geeksforgeeks.org/get-the-first-parts-of-a-data-set-in-r-programming-head-function/): -```r +``` r head(variants) ``` @@ -162,7 +162,7 @@ head(variants) To build a ggplot, we will use the following basic template that can be used for different types of plots: -```r +``` r ggplot(data = , mapping = aes()) + () ``` @@ -170,14 +170,14 @@ ggplot(data = , mapping = aes()) + () `data` argument -```r +``` r ggplot(data = variants) ``` - define a mapping (using the aesthetic (`aes`) function), by selecting the variables to be plotted and specifying how to present them in the graph, e.g. as x and y positions or characteristics such as size, shape, color, etc. -```r +``` r ggplot(data = variants, aes(x = POS, y = DP)) ``` @@ -191,7 +191,7 @@ ggplot(data = variants, aes(x = POS, y = DP)) To add a geom to the plot use the `+` operator. Because we have two continuous variables, let's use [`geom_point()`](https://ggplot2.tidyverse.org/reference/geom_point.html) (i.e., a scatter plot) first: -```r +``` r ggplot(data = variants, aes(x = POS, y = DP)) + geom_point() ``` @@ -201,7 +201,7 @@ ggplot(data = variants, aes(x = POS, y = DP)) + The `+` in the **`ggplot2`** package is particularly useful because it allows you to modify existing `ggplot` objects. This means you can easily set up plot templates and conveniently explore different types of plots, so the above plot can also be generated with code like this: -```r +``` r # Assign plot to a variable coverage_plot <- ggplot(data = variants, aes(x = POS, y = DP)) @@ -217,7 +217,7 @@ coverage_plot + - The `+` sign used to add new layers must be placed at the end of the line containing the *previous* layer. If, instead, the `+` sign is added at the beginning of the line containing the new layer, **`ggplot2`** will not add the new layer and will return an error message. -```r +``` r # This is the correct syntax for adding layers coverage_plot + geom_point() @@ -232,7 +232,7 @@ coverage_plot Building plots with **`ggplot2`** is typically an iterative process. We start by defining the dataset we'll use, lay out the axes, and choose a geom: -```r +``` r ggplot(data = variants, aes(x = POS, y = DP)) + geom_point() ``` @@ -242,7 +242,7 @@ ggplot(data = variants, aes(x = POS, y = DP)) + Then, we start modifying this plot to extract more information from it. For instance, we can add transparency (`alpha`) to avoid over-plotting: -```r +``` r ggplot(data = variants, aes(x = POS, y = DP)) + geom_point(alpha = 0.5) ``` @@ -252,7 +252,7 @@ ggplot(data = variants, aes(x = POS, y = DP)) + We can also add colors for all the points: -```r +``` r ggplot(data = variants, aes(x = POS, y = DP)) + geom_point(alpha = 0.5, color = "blue") ``` @@ -262,7 +262,7 @@ ggplot(data = variants, aes(x = POS, y = DP)) + Or to color each species in the plot differently, you could use a vector as an input to the argument **color**. **`ggplot2`** will provide a different color corresponding to different values in the vector. Here is an example where we color with **`sample_id`**: -```r +``` r ggplot(data = variants, aes(x = POS, y = DP, color = sample_id)) + geom_point(alpha = 0.5) ``` @@ -272,7 +272,7 @@ ggplot(data = variants, aes(x = POS, y = DP, color = sample_id)) + Notice that we can change the geom layer and colors will be still determined by **`sample_id`** -```r +``` r ggplot(data = variants, aes(x = POS, y = DP, color = sample_id)) + geom_line(alpha = 0.5) ``` @@ -282,7 +282,7 @@ ggplot(data = variants, aes(x = POS, y = DP, color = sample_id)) + To make our plot more readable, we can add axis labels: -```r +``` r ggplot(data = variants, aes(x = POS, y = DP, color = sample_id)) + geom_point(alpha = 0.5) + labs(x = "Base Pair Position", @@ -294,7 +294,7 @@ ggplot(data = variants, aes(x = POS, y = DP, color = sample_id)) + To add a *main* title to the plot, we use [the title argument for the `labs()` function](https://ggplot2.tidyverse.org/reference/labs.html): -```r +``` r ggplot(data = variants, aes(x = POS, y = DP, color = sample_id)) + geom_point(alpha = 0.5) + labs(x = "Base Pair Position", @@ -307,7 +307,7 @@ ggplot(data = variants, aes(x = POS, y = DP, color = sample_id)) + Now the figure is complete and ready to be exported and saved to a file. This can be achieved easily using [`ggsave()`](https://ggplot2.tidyverse.org/reference/ggsave.html), which can write, by default, the most recent generated figure into different formats (e.g., `jpeg`, `png`, `pdf`) according to the file extension. So, for example, to create a pdf version of the above figure with a dimension of $6\times4$ inches: -```r +``` r ggsave ("depth.pdf", width = 6, height = 4) ``` @@ -326,7 +326,7 @@ relevant axis labels. ## Solution -```r +``` r ggplot(data = variants, aes(x = POS, y = MQ, color = sample_id)) + geom_point() + labs(x = "Base Pair Position", @@ -342,7 +342,7 @@ relevant axis labels. To further customize the plot, we can change the default font format: -```r +``` r ggplot(data = variants, aes(x = POS, y = DP, color = sample_id)) + geom_point(alpha = 0.5) + labs(x = "Base Pair Position", @@ -358,7 +358,7 @@ ggplot(data = variants, aes(x = POS, y = DP, color = sample_id)) + **`ggplot2`** has a special technique called *faceting* that allows the user to split one plot into multiple plots (panels) based on a factor (variable) included in the dataset. We will use it to split our mapping quality plot into three panels, one for each sample. -```r +``` r ggplot(data = variants, aes(x = POS, y = MQ, color = sample_id)) + geom_point() + labs(x = "Base Pair Position", @@ -371,7 +371,7 @@ ggplot(data = variants, aes(x = POS, y = MQ, color = sample_id)) + This looks okay, but it would be easier to read if the plot facets were stacked vertically rather than horizontally. The `facet_grid` geometry allows you to explicitly specify how you want your plots to be arranged via formula notation (`rows ~ columns`; the dot (`.`) indicates every other variable in the data i.e., no faceting on that side of the formula). -```r +``` r ggplot(data = variants, aes(x = POS, y = MQ, color = sample_id)) + geom_point() + labs(x = "Base Pair Position", @@ -384,7 +384,7 @@ ggplot(data = variants, aes(x = POS, y = MQ, color = sample_id)) + Usually plots with white background look more readable when printed. We can set the background to white using the function [`theme_bw()`](https://ggplot2.tidyverse.org/reference/ggtheme.html). Additionally, you can remove the grid: -```r +``` r ggplot(data = variants, aes(x = POS, y = MQ, color = sample_id)) + geom_point() + labs(x = "Base Pair Position", @@ -409,7 +409,7 @@ relevant axis labels. ## Solution -```r +``` r ggplot(data = variants, aes(x = POS, y = QUAL, color = sample_id)) + geom_point() + labs(x = "Base Pair Position", @@ -428,7 +428,7 @@ relevant axis labels. We can create barplots using the [`geom_bar`](https://ggplot2.tidyverse.org/reference/geom_bar.html) geom. Let's make a barplot showing the number of variants for each sample that are indels. -```r +``` r ggplot(data = variants, aes(x = INDEL, fill = sample_id)) + geom_bar() + facet_grid(sample_id ~ .) @@ -449,7 +449,7 @@ remove the legend from the plot. ## Solution -```r +``` r ggplot(data = variants, aes(x = INDEL, color = sample_id)) + geom_bar(show.legend = F) + facet_grid(sample_id ~ .) @@ -466,7 +466,7 @@ ggplot(data = variants, aes(x = INDEL, color = sample_id)) + We can create density plots using the [`geom_density`](https://ggplot2.tidyverse.org/reference/geom_density.html) geom that shows the distribution of of a variable in the dataset. Let's plot the distribution of `DP` -```r +``` r ggplot(data = variants, aes(x = DP)) + geom_density() ``` @@ -486,7 +486,7 @@ Use [`geom_density`](https://ggplot2.tidyverse.org/reference/geom_density.html) ## Solution -```r +``` r ggplot(data = variants, aes(x = DP, fill = sample_id)) + geom_density(alpha = 0.5) + theme_bw() diff --git a/07-r-help.md b/07-r-help.md index 7e4b469d..73ed8c71 100644 --- a/07-r-help.md +++ b/07-r-help.md @@ -124,7 +124,7 @@ the name of the object, in this case the `iris` data frame, and passing a filename to the `file=` argument. -```r +``` r saveRDS(iris, file="iris.rds") # By convention, we use the .rds file extension ``` @@ -151,11 +151,11 @@ they come up commonly: the number tells you what ordinal number begins the line, for example: -```r +``` r 1:101 # generates the sequence of numbers from 1 to 101 ``` -```output +``` output [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 diff --git a/config.yaml b/config.yaml deleted file mode 100644 index d35bbb68..00000000 --- a/config.yaml +++ /dev/null @@ -1,88 +0,0 @@ -#------------------------------------------------------------ -# Values for this lesson. -#------------------------------------------------------------ - -# Which carpentry is this (swc, dc, lc, or cp)? -# swc: Software Carpentry -# dc: Data Carpentry -# lc: Library Carpentry -# cp: Carpentries (to use for instructor training for instance) -# incubator: The Carpentries Incubator -carpentry: 'dc' - -# Overall title for pages. -title: 'Intro to R and RStudio for Genomics' - -# Date the lesson was created (YYYY-MM-DD, this is empty by default) -created: '2018-03-12' - -# Comma-separated list of keywords for the lesson -keywords: 'software, data, lesson, The Carpentries' - -# Life cycle stage of the lesson -# possible values: pre-alpha, alpha, beta, stable -life_cycle: 'beta' - -# License of the lesson materials (recommended CC-BY 4.0) -license: 'CC-BY 4.0' - -# Link to the source repository for this lesson -source: 'https://github.com/datacarpentry/genomics-r-intro' - -# Default branch of your lesson -branch: 'main' - -# Who to contact if there are any issues -contact: 'team@carpentries.org' - -# Navigation ------------------------------------------------ -# -# Use the following menu items to specify the order of -# individual pages in each dropdown section. Leave blank to -# include all pages in the folder. -# -# Example ------------- -# -# episodes: -# - introduction.md -# - first-steps.md -# -# learners: -# - setup.md -# -# instructors: -# - instructor-notes.md -# -# profiles: -# - one-learner.md -# - another-learner.md - -# Order of episodes in your lesson -episodes: -- 00-introduction.Rmd -- 01-r-basics.Rmd -- 02-data-prelude.Rmd -- 03-basics-factors-dataframes.Rmd -- 04-bioconductor-vcfr.Rmd -- 05-dplyr.Rmd -- 06-data-visualization.Rmd -- 07-r-help.Rmd - -# Information for Learners -learners: - -# Information for Instructors -instructors: - -# Learner Profiles -profiles: - -# Customisation --------------------------------------------- -# -# This space below is where custom yaml items (e.g. pinning -# sandpaper and varnish versions) should live - - -url: 'https://datacarpentry.github.io/genomics-r-intro' -analytics: carpentries -lang: en diff --git a/depth.pdf b/depth.pdf index cda997d7..d2301a92 100644 Binary files a/depth.pdf and b/depth.pdf differ diff --git a/md5sum.txt b/md5sum.txt index 5a3d8b1b..9eb61cb8 100644 --- a/md5sum.txt +++ b/md5sum.txt @@ -1,19 +1,19 @@ "file" "checksum" "built" "date" -"CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2024-07-02" -"LICENSE.md" "b24ebbb41b14ca25cf6b8216dda83e5f" "site/built/LICENSE.md" "2024-07-02" -"config.yaml" "b91cd97fa3b408bd1ac0a00e67ab3219" "site/built/config.yaml" "2024-07-02" -"index.md" "7f9c30e6487338a0c3f8ecc4018873ab" "site/built/index.md" "2024-07-02" -"episodes/00-introduction.Rmd" "e1354ed92fb458179c8c00b00ee1cf55" "site/built/00-introduction.md" "2024-07-02" -"episodes/01-r-basics.Rmd" "ba3faa27a6f2eb8087acf99679a7ac03" "site/built/01-r-basics.md" "2024-07-02" -"episodes/02-data-prelude.Rmd" "ab2b1fd3cdaae919f9e409f713a0a8ad" "site/built/02-data-prelude.md" "2024-07-02" -"episodes/03-basics-factors-dataframes.Rmd" "109ed19fade231fe8e1da43903b06539" "site/built/03-basics-factors-dataframes.md" "2024-07-02" -"episodes/04-bioconductor-vcfr.Rmd" "10eb69b4697d7ecb9695d36c0d974208" "site/built/04-bioconductor-vcfr.md" "2024-07-02" -"episodes/05-dplyr.Rmd" "f74055bd8677338a213e0a0c6c430119" "site/built/05-dplyr.md" "2024-07-02" -"episodes/06-data-visualization.Rmd" "0b45534421bad05f040b24c40b6da71b" "site/built/06-data-visualization.md" "2024-07-02" -"episodes/07-r-help.Rmd" "1a7610b0efbaebfdd03ff4540125a790" "site/built/07-r-help.md" "2024-07-02" -"instructors/instructor-notes.md" "78f6fe6109a0eb19a16ec6663941da7f" "site/built/instructor-notes.md" "2024-07-02" -"learners/discuss.md" "522bcb192adf6702a2e3cb2f0d1412b5" "site/built/discuss.md" "2024-07-02" -"learners/reference.md" "4e0dcbc7892af6f9610d44d356e66617" "site/built/reference.md" "2024-07-02" -"learners/setup.md" "8e109a52a3f92f4e454de0d71bf4df11" "site/built/setup.md" "2024-07-02" -"profiles/learner-profiles.md" "60b93493cf1da06dfd63255d73854461" "site/built/learner-profiles.md" "2024-07-02" -"renv/profiles/lesson-requirements/renv.lock" "14b8719663734ec04dbc8e05400fc767" "site/built/renv.lock" "2024-07-02" +"CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2024-09-03" +"LICENSE.md" "b24ebbb41b14ca25cf6b8216dda83e5f" "site/built/LICENSE.md" "2024-09-03" +"config.yaml" "b91cd97fa3b408bd1ac0a00e67ab3219" "site/built/config.yaml" "2024-09-03" +"index.md" "7f9c30e6487338a0c3f8ecc4018873ab" "site/built/index.md" "2024-09-03" +"episodes/00-introduction.Rmd" "e1354ed92fb458179c8c00b00ee1cf55" "site/built/00-introduction.md" "2024-09-03" +"episodes/01-r-basics.Rmd" "ba3faa27a6f2eb8087acf99679a7ac03" "site/built/01-r-basics.md" "2024-09-03" +"episodes/02-data-prelude.Rmd" "ab2b1fd3cdaae919f9e409f713a0a8ad" "site/built/02-data-prelude.md" "2024-09-03" +"episodes/03-basics-factors-dataframes.Rmd" "109ed19fade231fe8e1da43903b06539" "site/built/03-basics-factors-dataframes.md" "2024-09-03" +"episodes/04-bioconductor-vcfr.Rmd" "10eb69b4697d7ecb9695d36c0d974208" "site/built/04-bioconductor-vcfr.md" "2024-09-03" +"episodes/05-dplyr.Rmd" "f74055bd8677338a213e0a0c6c430119" "site/built/05-dplyr.md" "2024-09-03" +"episodes/06-data-visualization.Rmd" "0b45534421bad05f040b24c40b6da71b" "site/built/06-data-visualization.md" "2024-09-03" +"episodes/07-r-help.Rmd" "1a7610b0efbaebfdd03ff4540125a790" "site/built/07-r-help.md" "2024-09-03" +"instructors/instructor-notes.md" "78f6fe6109a0eb19a16ec6663941da7f" "site/built/instructor-notes.md" "2024-09-03" +"learners/discuss.md" "522bcb192adf6702a2e3cb2f0d1412b5" "site/built/discuss.md" "2024-09-03" +"learners/reference.md" "4e0dcbc7892af6f9610d44d356e66617" "site/built/reference.md" "2024-09-03" +"learners/setup.md" "8e109a52a3f92f4e454de0d71bf4df11" "site/built/setup.md" "2024-09-03" +"profiles/learner-profiles.md" "60b93493cf1da06dfd63255d73854461" "site/built/learner-profiles.md" "2024-09-03" +"renv/profiles/lesson-requirements/renv.lock" "f2b28b463ac49b713fd7337f307f7559" "site/built/renv.lock" "2024-09-03" diff --git a/renv.lock b/renv.lock deleted file mode 100644 index 96eb462e..00000000 --- a/renv.lock +++ /dev/null @@ -1,974 +0,0 @@ -{ - "R": { - "Version": "4.4.1", - "Repositories": [ - { - "Name": "carpentries", - "URL": "https://carpentries.r-universe.dev" - }, - { - "Name": "carpentries_archive", - "URL": "https://carpentries.github.io/drat" - }, - { - "Name": "CRAN", - "URL": "https://cran.rstudio.com" - } - ] - }, - "Packages": { - "MASS": { - "Package": "MASS", - "Version": "7.3-60.0.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "grDevices", - "graphics", - "methods", - "stats", - "utils" - ], - "Hash": "b765b28387acc8ec9e9c1530713cb19c" - }, - "Matrix": { - "Package": "Matrix", - "Version": "1.6-5", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "grDevices", - "graphics", - "grid", - "lattice", - "methods", - "stats", - "utils" - ], - "Hash": "8c7115cd3a0e048bda2a7cd110549f7a" - }, - "R6": { - "Package": "R6", - "Version": "2.5.1", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R" - ], - "Hash": "470851b6d5d0ac559e9d01bb352b4021" - }, - "RColorBrewer": { - "Package": "RColorBrewer", - "Version": "1.1-3", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R" - ], - "Hash": "45f0398006e83a5b10b72a90663d8d8c" - }, - "base64enc": { - "Package": "base64enc", - "Version": "0.1-3", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R" - ], - "Hash": "543776ae6848fde2f48ff3816d0628bc" - }, - "bit": { - "Package": "bit", - "Version": "4.0.5", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R" - ], - "Hash": "d242abec29412ce988848d0294b208fd" - }, - "bit64": { - "Package": "bit64", - "Version": "4.0.5", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "bit", - "methods", - "stats", - "utils" - ], - "Hash": "9fe98599ca456d6552421db0d6772d8f" - }, - "bslib": { - "Package": "bslib", - "Version": "0.7.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "base64enc", - "cachem", - "fastmap", - "grDevices", - "htmltools", - "jquerylib", - "jsonlite", - "lifecycle", - "memoise", - "mime", - "rlang", - "sass" - ], - "Hash": "8644cc53f43828f19133548195d7e59e" - }, - "cachem": { - "Package": "cachem", - "Version": "1.0.8", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "fastmap", - "rlang" - ], - "Hash": "c35768291560ce302c0a6589f92e837d" - }, - "cellranger": { - "Package": "cellranger", - "Version": "1.1.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "rematch", - "tibble" - ], - "Hash": "f61dbaec772ccd2e17705c1e872e9e7c" - }, - "cli": { - "Package": "cli", - "Version": "3.6.2", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "utils" - ], - "Hash": "1216ac65ac55ec0058a6f75d7ca0fd52" - }, - "clipr": { - "Package": "clipr", - "Version": "0.8.0", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "utils" - ], - "Hash": "3f038e5ac7f41d4ac41ce658c85e3042" - }, - "colorspace": { - "Package": "colorspace", - "Version": "2.1-0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "grDevices", - "graphics", - "methods", - "stats" - ], - "Hash": "f20c47fd52fae58b4e377c37bb8c335b" - }, - "cpp11": { - "Package": "cpp11", - "Version": "0.4.7", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R" - ], - "Hash": "5a295d7d963cc5035284dcdbaf334f4e" - }, - "crayon": { - "Package": "crayon", - "Version": "1.5.2", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "grDevices", - "methods", - "utils" - ], - "Hash": "e8a1e41acf02548751f45c718d55aa6a" - }, - "digest": { - "Package": "digest", - "Version": "0.6.35", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "utils" - ], - "Hash": "698ece7ba5a4fa4559e3d537e7ec3d31" - }, - "dplyr": { - "Package": "dplyr", - "Version": "1.1.4", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "R6", - "cli", - "generics", - "glue", - "lifecycle", - "magrittr", - "methods", - "pillar", - "rlang", - "tibble", - "tidyselect", - "utils", - "vctrs" - ], - "Hash": "fedd9d00c2944ff00a0e2696ccf048ec" - }, - "evaluate": { - "Package": "evaluate", - "Version": "0.23", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "methods" - ], - "Hash": "daf4a1246be12c1fa8c7705a0935c1a0" - }, - "fansi": { - "Package": "fansi", - "Version": "1.0.6", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "grDevices", - "utils" - ], - "Hash": "962174cf2aeb5b9eea581522286a911f" - }, - "farver": { - "Package": "farver", - "Version": "2.1.1", - "Source": "Repository", - "Repository": "CRAN", - "Hash": "8106d78941f34855c440ddb946b8f7a5" - }, - "fastmap": { - "Package": "fastmap", - "Version": "1.1.1", - "Source": "Repository", - "Repository": "CRAN", - "Hash": "f7736a18de97dea803bde0a2daaafb27" - }, - "fontawesome": { - "Package": "fontawesome", - "Version": "0.5.2", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "htmltools", - "rlang" - ], - "Hash": "c2efdd5f0bcd1ea861c2d4e2a883a67d" - }, - "fs": { - "Package": "fs", - "Version": "1.6.4", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "methods" - ], - "Hash": "15aeb8c27f5ea5161f9f6a641fafd93a" - }, - "generics": { - "Package": "generics", - "Version": "0.1.3", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "methods" - ], - "Hash": "15e9634c0fcd294799e9b2e929ed1b86" - }, - "ggplot2": { - "Package": "ggplot2", - "Version": "3.5.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "MASS", - "R", - "cli", - "glue", - "grDevices", - "grid", - "gtable", - "isoband", - "lifecycle", - "mgcv", - "rlang", - "scales", - "stats", - "tibble", - "vctrs", - "withr" - ], - "Hash": "44c6a2f8202d5b7e878ea274b1092426" - }, - "glue": { - "Package": "glue", - "Version": "1.7.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "methods" - ], - "Hash": "e0b3a53876554bd45879e596cdb10a52" - }, - "gtable": { - "Package": "gtable", - "Version": "0.3.5", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cli", - "glue", - "grid", - "lifecycle", - "rlang" - ], - "Hash": "e18861963cbc65a27736e02b3cd3c4a0" - }, - "highr": { - "Package": "highr", - "Version": "0.10", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "xfun" - ], - "Hash": "06230136b2d2b9ba5805e1963fa6e890" - }, - "hms": { - "Package": "hms", - "Version": "1.1.3", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "lifecycle", - "methods", - "pkgconfig", - "rlang", - "vctrs" - ], - "Hash": "b59377caa7ed00fa41808342002138f9" - }, - "htmltools": { - "Package": "htmltools", - "Version": "0.5.8.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "base64enc", - "digest", - "fastmap", - "grDevices", - "rlang", - "utils" - ], - "Hash": "81d371a9cc60640e74e4ab6ac46dcedc" - }, - "isoband": { - "Package": "isoband", - "Version": "0.2.7", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "grid", - "utils" - ], - "Hash": "0080607b4a1a7b28979aecef976d8bc2" - }, - "jquerylib": { - "Package": "jquerylib", - "Version": "0.1.4", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "htmltools" - ], - "Hash": "5aab57a3bd297eee1c1d862735972182" - }, - "jsonlite": { - "Package": "jsonlite", - "Version": "1.8.8", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "methods" - ], - "Hash": "e1b9c55281c5adc4dd113652d9e26768" - }, - "knitr": { - "Package": "knitr", - "Version": "1.46", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "evaluate", - "highr", - "methods", - "tools", - "xfun", - "yaml" - ], - "Hash": "6e008ab1d696a5283c79765fa7b56b47" - }, - "labeling": { - "Package": "labeling", - "Version": "0.4.3", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "graphics", - "stats" - ], - "Hash": "b64ec208ac5bc1852b285f665d6368b3" - }, - "lattice": { - "Package": "lattice", - "Version": "0.22-6", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "grDevices", - "graphics", - "grid", - "stats", - "utils" - ], - "Hash": "cc5ac1ba4c238c7ca9fa6a87ca11a7e2" - }, - "lifecycle": { - "Package": "lifecycle", - "Version": "1.0.4", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cli", - "glue", - "rlang" - ], - "Hash": "b8552d117e1b808b09a832f589b79035" - }, - "magrittr": { - "Package": "magrittr", - "Version": "2.0.3", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R" - ], - "Hash": "7ce2733a9826b3aeb1775d56fd305472" - }, - "memoise": { - "Package": "memoise", - "Version": "2.0.1", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "cachem", - "rlang" - ], - "Hash": "e2817ccf4a065c5d9d7f2cfbe7c1d78c" - }, - "mgcv": { - "Package": "mgcv", - "Version": "1.9-1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "Matrix", - "R", - "graphics", - "methods", - "nlme", - "splines", - "stats", - "utils" - ], - "Hash": "110ee9d83b496279960e162ac97764ce" - }, - "mime": { - "Package": "mime", - "Version": "0.12", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "tools" - ], - "Hash": "18e9c28c1d3ca1560ce30658b22ce104" - }, - "munsell": { - "Package": "munsell", - "Version": "0.5.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "colorspace", - "methods" - ], - "Hash": "4fd8900853b746af55b81fda99da7695" - }, - "nlme": { - "Package": "nlme", - "Version": "3.1-164", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "graphics", - "lattice", - "stats", - "utils" - ], - "Hash": "a623a2239e642806158bc4dc3f51565d" - }, - "pillar": { - "Package": "pillar", - "Version": "1.9.0", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "cli", - "fansi", - "glue", - "lifecycle", - "rlang", - "utf8", - "utils", - "vctrs" - ], - "Hash": "15da5a8412f317beeee6175fbc76f4bb" - }, - "pkgconfig": { - "Package": "pkgconfig", - "Version": "2.0.3", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "utils" - ], - "Hash": "01f28d4278f15c76cddbea05899c5d6f" - }, - "prettyunits": { - "Package": "prettyunits", - "Version": "1.2.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R" - ], - "Hash": "6b01fc98b1e86c4f705ce9dcfd2f57c7" - }, - "printr": { - "Package": "printr", - "Version": "0.3", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "knitr" - ], - "Hash": "03e0d4cc8152eed9515f517a8153c085" - }, - "progress": { - "Package": "progress", - "Version": "1.2.3", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "R6", - "crayon", - "hms", - "prettyunits" - ], - "Hash": "f4625e061cb2865f111b47ff163a5ca6" - }, - "purrr": { - "Package": "purrr", - "Version": "1.0.2", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cli", - "lifecycle", - "magrittr", - "rlang", - "vctrs" - ], - "Hash": "1cba04a4e9414bdefc9dcaa99649a8dc" - }, - "rappdirs": { - "Package": "rappdirs", - "Version": "0.3.3", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R" - ], - "Hash": "5e3c5dc0b071b21fa128676560dbe94d" - }, - "readr": { - "Package": "readr", - "Version": "2.1.5", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "R6", - "cli", - "clipr", - "cpp11", - "crayon", - "hms", - "lifecycle", - "methods", - "rlang", - "tibble", - "tzdb", - "utils", - "vroom" - ], - "Hash": "9de96463d2117f6ac49980577939dfb3" - }, - "readxl": { - "Package": "readxl", - "Version": "1.4.3", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cellranger", - "cpp11", - "progress", - "tibble", - "utils" - ], - "Hash": "8cf9c239b96df1bbb133b74aef77ad0a" - }, - "rematch": { - "Package": "rematch", - "Version": "2.0.0", - "Source": "Repository", - "Repository": "RSPM", - "Hash": "cbff1b666c6fa6d21202f07e2318d4f1" - }, - "renv": { - "Package": "renv", - "Version": "1.0.7", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "utils" - ], - "Hash": "397b7b2a265bc5a7a06852524dabae20" - }, - "rlang": { - "Package": "rlang", - "Version": "1.1.3", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "utils" - ], - "Hash": "42548638fae05fd9a9b5f3f437fbbbe2" - }, - "rmarkdown": { - "Package": "rmarkdown", - "Version": "2.26", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "bslib", - "evaluate", - "fontawesome", - "htmltools", - "jquerylib", - "jsonlite", - "knitr", - "methods", - "tinytex", - "tools", - "utils", - "xfun", - "yaml" - ], - "Hash": "9b148e7f95d33aac01f31282d49e4f44" - }, - "sass": { - "Package": "sass", - "Version": "0.4.9", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R6", - "fs", - "htmltools", - "rappdirs", - "rlang" - ], - "Hash": "d53dbfddf695303ea4ad66f86e99b95d" - }, - "scales": { - "Package": "scales", - "Version": "1.3.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "R6", - "RColorBrewer", - "cli", - "farver", - "glue", - "labeling", - "lifecycle", - "munsell", - "rlang", - "viridisLite" - ], - "Hash": "c19df082ba346b0ffa6f833e92de34d1" - }, - "stringi": { - "Package": "stringi", - "Version": "1.8.4", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "stats", - "tools", - "utils" - ], - "Hash": "39e1144fd75428983dc3f63aa53dfa91" - }, - "stringr": { - "Package": "stringr", - "Version": "1.5.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cli", - "glue", - "lifecycle", - "magrittr", - "rlang", - "stringi", - "vctrs" - ], - "Hash": "960e2ae9e09656611e0b8214ad543207" - }, - "tibble": { - "Package": "tibble", - "Version": "3.2.1", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "fansi", - "lifecycle", - "magrittr", - "methods", - "pillar", - "pkgconfig", - "rlang", - "utils", - "vctrs" - ], - "Hash": "a84e2cc86d07289b3b6f5069df7a004c" - }, - "tidyr": { - "Package": "tidyr", - "Version": "1.3.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cli", - "cpp11", - "dplyr", - "glue", - "lifecycle", - "magrittr", - "purrr", - "rlang", - "stringr", - "tibble", - "tidyselect", - "utils", - "vctrs" - ], - "Hash": "915fb7ce036c22a6a33b5a8adb712eb1" - }, - "tidyselect": { - "Package": "tidyselect", - "Version": "1.2.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cli", - "glue", - "lifecycle", - "rlang", - "vctrs", - "withr" - ], - "Hash": "829f27b9c4919c16b593794a6344d6c0" - }, - "tinytex": { - "Package": "tinytex", - "Version": "0.51", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "xfun" - ], - "Hash": "d44e2fcd2e4e076f0aac540208559d1d" - }, - "tzdb": { - "Package": "tzdb", - "Version": "0.4.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cpp11" - ], - "Hash": "f561504ec2897f4d46f0c7657e488ae1" - }, - "utf8": { - "Package": "utf8", - "Version": "1.2.4", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R" - ], - "Hash": "62b65c52671e6665f803ff02954446e9" - }, - "vctrs": { - "Package": "vctrs", - "Version": "0.6.5", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cli", - "glue", - "lifecycle", - "rlang" - ], - "Hash": "c03fa420630029418f7e6da3667aac4a" - }, - "viridisLite": { - "Package": "viridisLite", - "Version": "0.4.2", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R" - ], - "Hash": "c826c7c4241b6fc89ff55aaea3fa7491" - }, - "vroom": { - "Package": "vroom", - "Version": "1.6.5", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "bit64", - "cli", - "cpp11", - "crayon", - "glue", - "hms", - "lifecycle", - "methods", - "progress", - "rlang", - "stats", - "tibble", - "tidyselect", - "tzdb", - "vctrs", - "withr" - ], - "Hash": "390f9315bc0025be03012054103d227c" - }, - "withr": { - "Package": "withr", - "Version": "3.0.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "grDevices", - "graphics" - ], - "Hash": "d31b6c62c10dcf11ec530ca6b0dd5d35" - }, - "xfun": { - "Package": "xfun", - "Version": "0.43", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "grDevices", - "stats", - "tools" - ], - "Hash": "ab6371d8653ce5f2f9290f4ec7b42a8e" - }, - "yaml": { - "Package": "yaml", - "Version": "2.3.8", - "Source": "Repository", - "Repository": "CRAN", - "Hash": "29240487a071f535f5e5d5a323b7afbd" - } - } -}