This tutorial will go over how to use histograms and boxplots in RStudio. This tutorial assumes you know the basics of R and RStudio. Please follow this series on YouTube to grasp the basics of R and RStudio.

First off, make sure you have `tidyverse`

and `cowplot`

installed using:

```
install.packages("tidyverse")
# And
install.pacakges("cowplot")
```

`tidyverse`

includes the ggplot2 package which we will use the `ggplot`

function from the package.
`cowplot`

allows us to add multiple plots into one image.

In a new R file I have the following at the top:

```
library("tidyverse")
library("cowplot")
# I use an absolute path so I can run the code from any location on the computer
# and not have to worry about the relative location
coasters = read.csv("D:/Intro to Stats/Coasters_2015txt.csv")
```

Next we will create the necessary plots starting with the boxplot:

```
p1 = ggplot(coasters, aes(y = Speed)) +
geom_boxplot()
```

The first argument in `ggplot`

is the data we want to give ggplot to parse. In this case the data we want is the `coasters`

variable as that is the data loaded from our CSV. The next argument is the `aes()`

function, short for aesthetic. We use the `aes()`

function to tell ggplot which variables to use. We need to set the y variable for a boxplot as ggplot parses data from the y axis for a boxplot. We tell ggplot to use the `Speed`

variable of the `coasters`

data. To find out which variables you can use, use `head(coasters)`

to output the first few rows of that variable. Next we add `geom_boxplot()`

to tell ggplot to display the data as a boxplot. We put this into a variable to reference later. To view the boxplot just type in `p1`

into the command line, here’s how it looks currently:

This isn’t the most ideal boxplot for our purposes as we would like it to display on top of the histogram we will make, so we want to flip it on its’ side. To do this add `coord_flip()`

after `gome_boxplot`

like so:

```
p1 = ggplot(coasters, aes(y = Speed)) +
geom_boxplot() +
coord_flip()
```

Now we get a result that we prefer:

Moving on to the histogram, it will look vary similar to a the boxplot with a few tweaks. Note setting the x variable rather than the y.

```
p2 = ggplot(coasters, aes(x = Speed)) +
geom_histogram(binwidth = 5)
```

We set x variable in `aes()`

since `geom_histogram`

interprets from the x variable. You can set options in `geom_histogram()`

such as the binwidth, which we set to 5 in this example. `p2`

gives us this:

Now here comes the fun part, putting the plots together. We’ll finally use a function from `cowplot`

to help us with this:

```
plot_grid(p1, p2, align = "v", ncol = 1, rel_heights = c(1, 5))
```

`plot_grid`

is a function from `cowplot`

that allows us to put multiple plots together. The first arguments of the function should be the plots that we have previously created, in this case `p1`

and `p2`

in that order, more on that later. Next we set the alignment of the plots to be vertical through `align="v"`

. Since we would like to stack the plots on top of each other, we will only need one column: `ncol = 1`

. Finally, we set the relative heights of the plots using `rel_heights = c(1, 5)`

. `rel_heights`

expects a vector, which we can create with the function `c()`

. The 1 and 5 indicate the heights of the plot. The sum of the vector is 6, so imagine a part of the plot grid being 1/6 of the total height and another part 5/6 of the total height. The order of the numbers of the vector are important as well as it’s based on the plots you give it in the first part of the function. So, `p1`

goes with 1, and `p2`

goes with 5. The function should immediately output a plot like so:

While the histogram looks fine, the boxplot is squished thanks to unnecessary stuff. While you can edit `rel_heights`

to something like `rel_heights = c(1, 3)`

Making the boxplot bigger while making the histogram smaller. However, we can remove unnecessary elements of plots through the `theme()`

function, which must be added to the `ggplot`

function. First, let’s remove the X label of the boxplot using `theme()`

:

```
p1 = ggplot(coasters, aes(y = Speed)) +
geom_boxplot() +
coord_flip() +
theme(axis.title.x = element_blank())
```

In the `theme()`

function, we can set multiple parts of the plot to different elements. For this example, we set the title of the x axis to something called `element_black()`

, which tells ggplot to remove that certain element. This addition gives us this:

This is a huge improvement than before, but it can still be improved by removing more elements:

```
p1 = ggplot(coasters, aes(y = Speed)) +
geom_boxplot() +
coord_flip() +
theme(axis.line = element_blank(),
axis.ticks = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.text.y = element_blank())
```

To the `theme()`

function, I added more arguments by removing the axis line, the axis ticks, the axis text on the y axis (the numbers), and the title on the y axis using `element_blank()`

. Now we get the final result of:

I left the numbers on the x axis of the boxplot there as a nice touch so that people can still tell where the numbers are on the boxplot without having to look at the bottom of the image.

Here is the final code:

```
library("tidyverse")
library("cowplot")
coasters = read.csv("D:/Intro to Stats/Coasters_2015txt.csv")
p1 = ggplot(coasters, aes(y = Speed)) +
geom_boxplot() +
coord_flip() +
theme(axis.line=element_blank(),
axis.ticks = element_blank(),
axis.title.y=element_blank(),
axis.title.x=element_blank(),
axis.text.y=element_blank())
p2 = ggplot(coasters, aes(x = Speed)) +
geom_histogram(binwidth = 5) +
# If the lower case `count` on the histogram's y axis is bothersome,
# you can change it using ylab()
ylab("Count")
plot_grid(p1, p2, align = "v", ncol = 1, rel_heights = c(1, 5))
```