baseplot.Rmd

# "Base" plots in R

R-base package graphics offers functions for producing many plots, for example:

* scatter plots - plot()
* bar plots - barplot()
* pie charts - pie()
* box plots - boxplot()
* histograms - hist()

## Scatter plots

*A scatter plot has points that show the **relationship** between two sets of data.*

* Simple scatter plot

```{r, eval=T}
# Create 2 vectors
dat1 <- 1:10
dat2 <- dat1^2

# Plot x against y
plot(x=dat1, y=dat2)
```

*Notes*:

* If one vector only is given as an input, it will be plotted against the indices of each element
* x and y can also be the **columns** of a matrix or a dataframe, e.g. `plot(x=mat[,1], y=mat[,2])`.

* Add arguments:
	* col: color
	* pch: type of point
	* type: "l" for line, "p" for point, "b" for both point and line
	* main: title of the plot
	* cex: size of points (default: 1)

```{r, eval=T}
plot(x=dat1, y=dat2, 
	col="red", 
	pch=2, 
	type="b", 
	main="a pretty scatter plot")
```

* You can play a bit:

```{r, eval=T}
plot(x=dat1, y=dat2, 
	col=1:10, 
	pch=1:10, 
	cex=1:10, 
	type="b", 
	main="an even prettier scatter plot")
```

<h4>Different type of points that you can use:</h4>

<img src="images/plots/pointtype.png" width="450"/>

<h4>About colors</h4>

* Color codes 1 to 8 are taken from the **palette()** function and respectively code for: 

```{r}
# see the 8-color palette:
palette()
```

```{r, echo=FALSE, eval=TRUE}
knitr::kable(
  data.frame(code=1:8, color=palette()), caption = 'default palette()',
  format = "html", table.attr = "style='width:30%;'"
)
```


* There is a larger set of build-in colors that you can use:

```{r}
# see all 657 possible build-in colors:
colors()

# looking for blue only? You can pick from 66 blueish options:
grep(pattern="blue", x=colors(), value=TRUE)
``` 

You can also find them [here](http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf).

* change the default palette with one of your choice:

```{r}
palette(grep(pattern="blue", x=colors(), value=TRUE))
```

* change the palette back to default:

```{r}
palette("default")
```

**HANDS-ON**

The **datasets** package (included in the base installation of R) contains pre-made / built-in datasets. You can see them in *Environment* -> change *Global Environment* to *package:datasets*.
<br>
We will use data set **Loblolly** (growth of Loblolly pine trees): you can check `dim(Loblolly)`, `head(Loblolly)`:

* Plot **age** (x-axis) versus **height** (y-axis).
* Change the **title**.
* Change the points type to a **full triangle** (see table of codes / shapes above).
* Change the points color to the color of your choice.
* Change the points size to 0.4.

<details>
<summary>
*Answer*
</summary>

```{r, echo=T, eval=F}
# Plot age (x-axis) versus height (y-axis).
plot(x=Loblolly$age, y=Loblolly$height)
# Change the title.
plot(x=Loblolly$age, y=Loblolly$height,
     main="Age and weight of Loblolly pine trees")
# Change the points type to a full triangle.
plot(x=Loblolly$age, y=Loblolly$height,
     main="Age and weight of Loblolly pine trees",
     pch=17)
# Change the points color.
plot(x=Loblolly$age, y=Loblolly$height,
     main="Age and weight of Loblolly pine trees",
     pch=17,
     col="red")
# Change the points size to 0.4.
plot(x=Loblolly$age, y=Loblolly$height,
     main="Age and weight of Loblolly pine trees",
     pch=17,
     col="red",
     cex=0.4)
```

</details>

EXTRA: you can change the color of the points depending on the value they represent. 
<br>
For example, let's say we want to color the **"age 10" points in red** and the other ones in black. <br>
You can look up for a way to do it, or look at one possibility in the answer below:

<details>
<summary>
*Answer*
</summary>

```{r, echo=T, eval=F}
# Add an extra column named colors to Loblolly that contains, for example, only black values (store in new data frame Loblolly2).
Loblolly2 <- cbind(Loblolly, colors="black")
# Set column colors to "red" in the rows that correspond to age 10.
Loblolly2$colors[Loblolly2$age == 10] <- "red"
# Set parameter `col` of `plot()` with `Loblolly2$colors`.
plot(x=Loblolly2$age, y=Loblolly2$height, 
col=Loblolly2$colors)
```

</details>


## Bar plots

*A bar chart or bar plot displays rectangular bars with **lengths proportional to the values that they represent.***

* A simple bar plot :

```{r, eval=TRUE}
# Create a vector
mycenter <- rep(x=c("PhDstudent", "Postdoc", "Technician", "PI"), 
                times=c(8,10,5,2))

# Count number of occurrences of each character string
mytable <- table(mycenter)

# Bar plot using that table
barplot(height=mytable)
```

* Customize a bit :
  * col : color
  * main : title of the plot
  * las : orientation of axis labels: 
    * 0: all labels parallel to axis
    * 1: x-axis labels parallel / y-axis labels perpendicular
    * 2: both labels perpendicular
    * 3: x-axis labels perpendicular / y-axis labels parallel

```{r, eval=T}
barplot(height=mytable,
	col=1:4,
	main="bar plot",
	las=2)
```

* Customize the ordering of the bars :

By default, the bars are organized in alphabetical order. You can change it using an **ordered factor**.

```{r, eval=T}
# Create an ordered factor out of mycenter: the order in which you write the "levels" is the sort in which the bars will next be plotted
xfact <- factor(x=mycenter, 
	levels=c("PhDstudent", "Postdoc", "Technician", "PI"), 
	ordered=TRUE)

# Produce the table
xfacttable <- table(xfact)

# Plot the same way
barplot(height=xfacttable,
	col=1:4,
        main="reorganized bar plot",
        las=2)
```

* We can also produce stacked barplot :

```{r, eval=TRUE}
# Create a matrix containing the number and type of employees per research program :
barmat <- matrix(c(8, 10, 9, 2, 6, 4, 5, 3, 14, 13, 16, 4, 11, 10, 8, 5),
	nrow=4,
	dimnames=list(c("Technician", "PhDstudent", "PostDoc", "PI"), c("BG", "CDB", "GRSCC", "SB")))

# Plot barplot
barplot(height=barmat, 
        col=sample(colors(), 4))
```

* Add some parameters:

```{r, eval=T}
# set a random color vector
  # add set.seed(38) (or any other number) to reproduce the randomization.
mycolors <- sample(x=colors(), 
                   size=4)

# plot barplot
  # ylim sets the lower and upper limit of the y-axis: here it allows us to fit the legend !
barplot(height=barmat, 
	col=mycolors, 
	ylim=c(0,50),
	main="stacked barplot")
```

* Add a legend to the plot:
  * "x" and "y" set the legend's position in the plotting area: you can specify the position as coordinates using "x" and "y".
  * if "x" only is used, you can set the legend position as "topleft", "bottomleft", "topright", "bottomright"
  * Note: `barplot()` (or any other plot function) has to be called **first**

```{r, eval=TRUE}
barplot(height=barmat, 
	col=mycolors, 
	ylim=c(0,50),
	main="stacked barplot")

legend(x="topleft", 
	legend=c("Technician", "PhDstudent", "PostDoc", "PI"),
	fill=mycolors)
```

A more automated way to do this:

```{r, eval=F}
legend(x="topleft", 
	legend=rownames(barmat),
	fill=mycolors)
```

**HANDS-ON**

The dataset **chickwts** is also a built-in dataset from the `datasets` package: the table measures and compares the effectiveness of various feed supplements on the growth rate of chickens.

* Create a barplot of the different **feed supplements**.
* Change the orientation of the x-axis labels.
* Try to re-organize the bars by the increasing number of feed supplements.

<details>
<summary>
*Answer*
</summary>

```{r, eval=F}
# Create a barplot of the different **feed supplements**.
tablefeed <- table(chickwts$feed)
barplot(tablefeed)

# Change the orientation of the x-axis labels.
barplot(tablefeed, las=2)

# Try to re-organize the bars by the increasing number of feed supplements.
  # check tablefeed and write the feed categories in increasing order:
feedfactor <- factor(x=chickwts$feed, levels=c("horsebean", "meatmeal", "casein", "linseed", "sunflower", "soybean"), 
	ordered=TRUE))

  # a less "manual" way to proceed: tablefeed (the output of table() ) is a NAMED vector: sort it and retrieve its names in sorted order
sort(tablefeed)
names(sort(tablefeed))
feedfactor <- factor(x=chickwts$feed, 
    levels=names(sort(tablefeed)),
    ordered=TRUE)

# plot sorted barplot
barplot(table(feedfactor), las=2)
```

</details>

## Pie charts

*A pie chart is a circular charts which is divided into slices, illustrating proportions.*

* Using our previous vector, build a simple pie chart:

```{r}
# Create a vector
mycenter <- rep(x=c("PhDstudent", "Postdoc", "Technician", "PI"), 
         times=c(8,10,5,2))

# Count number of occurences of each string
mytable <- table(mycenter)

pie(mytable,
	main="pie chart",
	col=c("lightblue", "lightgreen", "salmon", "maroon"))
```

<img src="images/plots/pie1.png" width="450"/>

## Box plots

*A boxplot is a convenient way to describe the **distribution** of the data.*

* A simple boxplot:

```{r, eval=T}
# Create a matrix of 1000 random values from the normal distribution (4 columns, 250 rows)
mat1000 <- matrix(rnorm(1000), 
                  ncol=4)

# Basic boxplot
boxplot(x=mat1000)
```

* Add some arguments :
	* xlab: x-axis label
	* ylab: y-axis label
	* at: position of each box along the x-axis: here we skip position 3 to allow more space between boxes 1/2 and 3/4

```{r, eval=T}
boxplot(x=mat1000, 
	xlab="sample",
	ylab="expression",
	at=c(1, 2, 4, 5))
```

* Add an horizontal line at y=0 with **abline()**; arguments of abline :
	* h : y-axis starting point of horizontal line (v for a vertical line)
	* col : color
	* lwd : line thickness
	* lty : line type

*NOTE*: you can create a vertical line with `abline(v=...)` (**v** insteald of **h**)

```{r, eval=T}
# First plot the box plot as before:
boxplot(x=mat1000, 
	xlab="sample",
	ylab="expression",
	at=c(1, 2, 4, 5),
	 main="my boxplot")
	
# Then run the abline function
abline(h=0, col="red", lwd=3, lty="dotdash")
```

* Line types in R:

<img src="images/linetypes-in-r-line-types.png" width="250"/>

* We can also create a boxplot that plots **a variable against another variable**.
For example, going back to our **Loblolly** data frame, we can create a boxplot of the **height (y-axis) for each age (x-axis)**: one box per age group. Instead of setting parameter **x** we set parameter **formula**, as follows:

```{r, eval=TRUE}
boxplot(formula=Loblolly$height ~ Loblolly$age)
```

**HANDS-ON**

Let's go back to our **chickwts** dataset:

* Create a boxplot that represents the chicken **weight** for each type of **feed** supplement.
* Create again the boxplot, but without the **sunflower** and **casein** types of feed supplement (you can create a new data frame called **chickwts2**).
  * *NOTE*: you still see the groups you removed (while there is no data -> no boxes): this is because column `feed` is made of **factors**. Factors retain the original **levels** (groups) even when no data is left for those groups. You can run: `chickwts2$feed <- droplevels(chickwts2$feed)` to "drop" the levels that do not have values left, and plot again.
* Change the boxes' colors.
* Add a legend on the top-left corner of the plot, and remove the x-axis labels.

<details>
<summary>
*Answer*
</summary>

```{r, eval=F, echo=T}
# boxplot of weight / feed supplement
boxplot(chickwts$weight ~ chickwts$feed)
# remove sunflower and casein 
chickwts2 <- chickwts[chickwts$feed != "sunflower" & chickwts$feed != "casein", ]
boxplot(chickwts2$weight ~ chickwts2$feed)
# drop "levels" from column "feed" containing factors
chickwts2$feed <- droplevels(chickwts2$feed)
# plot again after dropping the levels
boxplot(chickwts2$weight ~ chickwts2$feed)
# change colors: create a vector
boxcols <- c("lightgreen", "purple", "maroon", "lightblue")
# boxplot with colors (xaxt will remove the x-axis information)
boxplot(chickwts2$weight ~ chickwts2$feed,
  col=boxcols, xaxt="n")
# add a legend
legend("topleft", 
        legend=names(table(chickwts2$feed)), 
        fill=boxcols,
        )
```

</details>


## Histograms

*A histogram graphically summarizes the **distribution** of the data.*

* A simple histogram

```{r, eval=T}
# Vector of 200 random values from the normal distribution
hist200 <- rnorm(200)

# Plot histogram
hist(x=hist200)
```

* Add parameters:
	* border: color of bar borders
	* breaks: number of bars the data is divided into
	* cex.main: size of title
	* cex.lab: size of axis labels

```{r, eval=TRUE}
hist(x=hist200,
	border="blue",
	breaks=50,
	main="Histogram",
	xlab="",
	cex.main=2.5,
	cex.lab=2)
```