Lab 2: Exploring Rodents with ggplot2

Seeking Help

Part of learning to program is learning from a variety of resources. Thus, I expect you will use resources that you find on the internet. There is, however, an important balance between copying someone else’s code and using their code to learn. Therefore, if you use external resources, I want to know about it. You can “inform” me of any resources you used by pasting the link to the resource in a code comment next to where you used that resource.

Additionally, you are permitted and encouraged to work with your peers as you complete lab assignments, but you are expected to do your own work. Copying from each other is cheating, and letting people copy from you is also cheating. Please don’t do either of those things.

Download starter .qmd file here.

Download the data - surveys.csv - file here.

Lab Instructions

The questions in this lab are noted with numbers and boldface. Each question will require you to produce code, whether it is one line or multiple lines.

This document is quite plain, meaning it does not have any special formatting. As part of your demonstration of working with Quarto documents, I would encourage you to spice your documents up (e.g., declaring execution options, specifying how your figures should be output, formatting your code output, etc.).

When you are finished with the lab clean it up! It should only include section headers, questions, and your responses. You can leave helpful information like the Data Context for understanding your work, but please remove all instructions (like this section!) other than the bolded, numbered questions themselves.

Tip

Remember to make small changes and render often!

Setup

In the code chunk below, load in the package(s) necessary for your analysis. (You should only need the tidyverse package for this analysis.)

# code for loading packages goes here!

Data Context

The Portal Project is a long-term ecological study being conducted near Portal, AZ. Since 1977, the site has been used to study the interactions among rodents, ants, and plants, as well as their respective responses to climate. To study the interactions among organisms, researchers experimentally manipulated access to 24 study plots. This study has produced over 100 scientific papers and is one of the longest running ecological studies in the U.S.

We will be investigating the animal species diversity and weights found within plots at the Portal study site for a subset of the study data. The data are stored as a comma separated value (CSV) file. Each row holds information for a single animal, and the columns represent:

Column Description
record_id Unique ID for the observation
month month of observation
day day of observation
year year of observation
plot_id ID of a particular plot
species_id 2-letter code
sex sex of animal (“M”, “F”)
hindfoot_length length of the hindfoot in mm
weight weight of the animal in grams
genus genus of animal
species species of animal
taxa e.g. Rodent, Reptile, Bird, Rabbit
plot_type type of plot

Reading the Data into R

0. Using the read_csv() function, write the code necessary to load in the surveys.csv dataset. For simplicity, name the data surveys.

# Code for question 0!

1. What are the dimensions (# of rows and columns) of these data?

2. What are the data types of the variables in this dataset?

3. What kinds of animals are included in the dataset? Birds? Rabbits? ect. (Hint: you will want to look at the variable descriptions to answer this question)

# Code for question 3!

Exploratory Data Analysis with ggplot2

ggplot() graphics are built step by step by adding new elements. Adding layers in this fashion allows for extensive flexibility and customization of plots.

To build a ggplot(), we will use the following basic template that can be used for different types of plots:

ggplot(data = <DATA>,
       mapping = aes(<VARIABLE MAPPINGS>)) +
  <GEOM_FUNCTION>()

Let’s get started!

Scatterplot

# Scatterplot code for question s 4-8! 

4. First, let’s create a scatterplot of the relationship between weight (on the \(x\)-axis) and hindfoot_length (on the \(y\)-axis).

We can see there are a lot of points plotted on top of each other. Let’s try and modify this plot to extract more information from it.

5. Let’s add transparency (alpha) to the points, to make the points more transparent and (possibly) easier to see.

Despite our best efforts there is still a substantial amount of overplotting occurring in our scatterplot. Let’s try splitting the dataset into smaller subsets and see if that allows for us to see the trends a bit better.

6. Facet your scatterplot by species.

7. No plot is complete without axis labels and a title. Include reader friendly labels and a title to your plot.

It takes a larger cognitive load to read text that is rotated. It is common practice in many journals and media outlets to move the \(y\)-axis label to the top of the graph under the title.

8. Specify your \(y\)-axis label to be empty and move the \(y\)-axis label into the subtitle. You may overwrite your code from Q7.

Boxplots

9. Sketch out (by hand or using Excalidraw) a game plan for creating side-by-side boxplots to visualize the distribution of weight within each species. Include an image of your game plan. This can take on your own flavor! The ideas is that you are thinking before coding. I recommend saving this in the same folder as your .qmd file.

# Boxplot code for question 10 - 15!

10. Implement your game plan to create side-by-side boxplots to visualize the distribution of weight within each species.

A fundamental complaint of boxplots is that they do not plot the raw data. However, with ggplot we can add the raw points on top of the boxplots!

11. Add another layer to your previous plot that plots each observation.

Alright, this should look less than optimal. Your points should appear rather stacked on top of each other. To make them less stacked, we need to jitter them a bit, using geom_jitter().

12. Remove the previous layer and include a geom_jitter() layer instead. (You can overwrite your code for Q11)

That should look a bit better! But its really hard to see the points when everything is black.

13. Set the color aesthetic in geom_jitter() to change the color of the points and add set the alpha aesthetic to add transparency.

Note

You are welcome to use whatever color you wish! Some of my favorites are “orange3” and “steelblue”.

Great! Now that you can see the points, you should notice something odd: there are two colors of points still being plotted. Some of the observations are being plotted twice, once from geom_boxplot() as outliers and again from geom_jitter()!

14. Inspect the help file for geom_boxplot() and see how you can remove the outliers from being plotted by geom_boxplot(). Make this change in your code!

Some small changes can make big differences to plots. One of these changes are better labels for a plot’s axes and legend.

15. Modify the \(x\)-axis and \(y\)-axis labels to describe what is being plotted. Be sure to include any necessary units! You might also be getting overlap in the species names – use theme(axis.text.x = ____) or theme(axis.text.y = ____) to turn the species axis labels 45 degrees. (You will need to look at the documentation or Google to find the syntax for this!)

Some people (and journals) prefer for boxplots to be stacked with a specific orientation! Let’s practice changing the orientation of our boxplots.

16. Copy-paste your boxplot code from above. Flip the orientation of your boxplots. If you created horizontally stacked boxplots, your boxplots should now be stacked vertically. If you had vertically stacked boxplots, you should now stack your boxplots horizontally!

# Copy-paste boxplot code. Then modify for question 16!

Notice how vertically stacked boxplots make the species labels more readable than horizontally stacked boxplots. This is good practice!

Conducting Statistical Analyses

Exploratory Data Analysis (EDA) is always a great start to investigating a dataset. Can we see a relationship between rodent weight and hindfoot length? How does rodent weight differ between species? After performing EDA, we can then conduct appropriate statistical analyses to formally investigate what we have seen.

In this section, we are going to conduct a one-way analysis of variance (ANOVA) to compare mean weight between the fourteen species.

Refresher on one-way ANOVA

While a second course in statistics is a pre-requisite for this class, you may want a refresher on conducting a one-way ANOVA.

I have outlined the null and alternative hypotheses we will be testing:

Null

The population mean weight is the same between all fourteen rodent species.

Alternative

At least one rodent species has a different population mean weight.

Look up the help documentation for aov().

17. Using aov(), complete the code below to carry out the analysis.

species_mod <- aov()
Error in aov(): argument "formula" is missing, with no default
summary(species_mod)
Error: object 'species_mod' not found

18. Based on the results of the ANOVA F-test, draw a conclusion in context of the hypotheses. Make sure to cite appropriate output from above.

But wait, the aov() output doesn’t actually tell us which species differ! Install the emmeans package by running install.packages("emmeans") in your console (NOT in the .qmd file). Then, run the code chunk below to load the package.

library(emmeans)

Look up the help documentation for emmeans(). This one might take a bit of Google-ing. Note that we conducted a one-factor model.

19. Using the species_mod from above and the emmeans() function, complete the code below to obtain estimated model mean rodent weights for each species

species_estimates <- emmeans()
Error in emmeans(): argument "object" is missing, with no default
species_estimates
Error: object 'species_estimates' not found

20. Now that you have obtained the estimated model mean rodent weights, conduct pairwise comparisons using the pairs() function from the emmeans package.

# carry out pairwise comparisons between species.

Use Your Graphics Skills for Evil

21. Create the ugliest version of a scatterplot showing the relationship between weight and hindfoot_length (where you started in Q4). Then, explain why you made the decisions you did, and which principles of good graphics you’ve intentionally violated.

Ugliness is subjective, so the goal here is for you to explore the different ways you can customize the finer details of graphics. Make sure your finished masterpiece has appropriate axis labels and a title (after all, even ugly plots need to be correctly labeled!). You are free to add additional variables and layers, modify the aesthetics used, and leverage other packages. Let your creativity shine through!

# Make your ugly graphic here!

Useful References