Writing Vector Functions

Monday, May 11

Today we will…

  • Lecture
    • Function Basics
    • Variable Scope + Environment
  • PA 7: Writing Functions

Follow along

Remember to download, save, and open up the handout for today!

Why write functions?

Functions allow you to automate common tasks!

  • We’ve been using functions since Day 1, but when we write our own, we can customize them!
  • Have you found yourself copy-pasting code and only changing small parts?

Writing functions has 3 big advantages over copy-paste:

  1. Your code is easier to read.
  2. To change your analysis, simply change one function.
  3. You avoid mistakes from copy-paste.

Function Basics

Function Syntax


Function Syntax

A (very) Simple Function

Let’s define the function.

  • You must run the code to define the function just once.
add_two <- function(x){
  x + 2
}


Let’s call the function!

add_two(5)
[1] 7

Naming: add_two <-

The name of the function is chosen by the author.

add_two <- function(x){
  x + 2
}

Caution: Function names have no inherent meaning.

  • The name you give to a function does not affect what the function does.
add_three <- function(x){
  x + 7
}
add_three(5)
[1] 12

Arguments

The argument(s) of the function are chosen by the author.

  • Arguments are how we pass external values into the function.
  • They are temporary variables that only exist inside the function body.
  • We give them general names:
    • x, y, z – vectors
    • df – dataframe
    • i, j – indices


add_two <- function(x){
  x + 2
}

Body: { }

The body of the function is where the action happens.

  • The body must be specified within a set of curly brackets.
  • The code in the body will be executed (in order) whenever the function is called.
add_two <- function(x){
  x + 2
}

Output: Last Value

Your function will give back what would normally print out from the last line in the body…

add_two <- function(x){
  x + 2
}


add_two(7)
[1] 9
  • an implicit return

Output: return()

…you can also explicitly use the command return().

add_two <- function(x){
  return(x + 2)
}


Style decision

The tidyverse style guide currently suggests using implicit returns since it is more concise and more “idiomatic” to R…

But using return() explicity is clearer to readers and new learners.

I leave this up to you!

Output: early returns

Explicit returns are necessary for early returns


safe_square <- function(x) {
  if (!is.numeric(x)) return(NA)
  x^2
}


safe_square(2)
[1] 4
safe_square("A")
[1] NA

Output: more than one output

If you need to return more than one object from a function, wrap those objects in a list.

min_max <- function(x){
  lowest <- min(x)
  highest <- max(x)
  
  list(min = lowest, max = highest)
}
vec <- c(346,547,865,346,6758,78,79,362)
min_max(vec)
$min
[1] 78

$max
[1] 6758

Function Arguments

What if we wanted to write a more general function, named add_something(). The function would take two inputs:

  1. x the vector to add to
  2. something the value to add to x

How would your function change?

Arguments Cont.

  • If we supply a default value when defining the function, the argument is optional when calling the function.
add_something <- function(x, something = 2){
  x + something
}
  • If a value is not supplied, something defaults to 2.
add_something(x = 5)
[1] 7
add_something(x = 5, something = 6)
[1] 11
  • If we do not supply a default value when defining the function, the argument is required when calling the function.
add_something <- function(x, something){
  x + something
}
add_something(x = 2)
Error in `add_something()`:
! argument "something" is missing, with no default

Input Validation

When a function requires an input of a specific data type, check that the supplied argument is valid.

add_something <- function(x, something){
  stopifnot(is.numeric(x))
    x + something
}

add_something(x = "statistics", something = 5)
Error in `add_something()`:
! is.numeric(x) is not TRUE
add_something <- function(x, something){
  if(!is.numeric(x)){
    stop("Please provide a numeric input for the x argument.")
  }
  x + something
}

add_something(x = "statistics", something = 5)
Error in `add_something()`:
! Please provide a numeric input for the x argument.

Variable Scope + Environment

Variable Scope

The location (environment) in which we can find and access a variable is called its scope.

  • We need to think about the scope of variables when we write functions.
  • What variables can we access inside a function?
  • What variables can we access outside a function?

Global Environment

  • The top right pane of Rstudio shows you the global environment.
    • This is the current state of all objects you have created.
    • These objects can be accessed anywhere.

Function Environment

  • The code inside a function executes in the function environment.
    • Function arguments and any variables created inside the function only exist inside the function.
      • They disappear when the function code is complete.
    • What happens in the function environment does not affect things in the global environment.
add_two <- function(x) {
  my_result <- x + 2
  return(my_result)
}

Function Environment

We cannot access variables created inside a function outside of the function.

add_two <- function(x) {
  my_result <- x + 2
  return(my_result)
}

add_two(9)
[1] 11
my_result
Error:
! object 'my_result' not found

Name Masking

Name masking occurs when an object in the function environment has the same name as an object in the global environment.

add_two <- function(x) {
  my_result <- x + 2
  return(my_result)
}
my_result <- 2000

The my_result created inside the function is different from the my_result created outside.

add_two(5)
[1] 7
my_result
[1] 2000

Dynamic Lookup

Functions look for objects FIRST in the function environment and SECOND in the global environment.

  • If the object doesn’t exist in either, the code will give an error.
add_two <- function() {
  return(x + 2)
}

add_two()
Error in `add_two()`:
! object 'x' not found
x <- 10

add_two()
[1] 12

It is not good practice to rely on global environment objects inside a function!

Debugging

(Allison Horst)

Debugging

You will make mistakes (create bugs) when coding.

  • Unfortunately, it becomes more and more complicated to debug your code as your code gets more sophisticated.
  • This is especially true with functions!

Debugging Strategies

  • Interactive coding
    • Highlight lines within your function and run them one-by-one to see what happens.
  • print() debugging
    • Add print() statements throughout your code to make sure the values are what you expect.
  • Rubber Ducking
    • Verbally explain your code line by line to a rubber duck (or a human).

General Function Writing Advice

When you have a concept that you want to turn into a function…

  1. Write a simple example of the code without the function framework.

  2. Generalize the example by assigning variables.

  3. Write the code into a function.

  4. Call the function on the desired arguments

This structure allows you to address issues as you go.

Let’s Practice

Base R Refresher

  • We can extract components of a vector using [ ]
  • The inputs can be:
    • logical values (TRUE, FALSE)
    • indices (e.g., 1, 2, 3)
x <- 1:5
x[c(TRUE, TRUE, TRUE, FALSE, FALSE)]
[1] 1 2 3
x[1:3]
[1] 1 2 3

above_average()

Goal: Keep only the elements of x greater than the mean.

Fill in the code to create a function named above_average(). The function should keep only the elements of x greater than the mean.

above_average <- function(x) {
  # Step 1: Compute mean of x
  
  
  # Step 2: Subset x to keep only values > mean
  
  
  # Step 3: Return the result
  
}

Option 1: Using Logical Values

Step 1: Find locations where values of x are larger than the mean

x <- 15:25
x > mean(x)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

Step 2: Use this output to extract the desired values from x

x[x > mean(x)]
[1] 21 22 23 24 25

Step 3: Make a function

above_average <- function(x) {
  x[x > mean(x)]
}

Option 2: Using Indices

Step 1: Find indices where values of x are larger than the mean

which(x > mean(x))
[1]  7  8  9 10 11

Step 2: Use this output to extract the desired values from x

x[which(x > mean(x))]
[1] 21 22 23 24 25

Step 3: Make a function

above_average <- function(x) {
  x[which(x > mean(x))]
}

every_third()

Goal: Return every third element from a vector.

Write down the steps you would need to create a function named every_third() that takes in a vector and returns every third element from that vector (i.e., indices 1, 4, 7, 10, etc.).

Think about:

  • What inputs the function should take.
  • How to identify which positions in the vector are “every third.”
  • How to select those elements from the vector.

Generate Indices

Represent the indices (positions) of each element of x.

x
 [1] 15 16 17 18 19 20 21 22 23 24 25
1:length(x)
 [1]  1  2  3  4  5  6  7  8  9 10 11

Identify Every Third Position

Identify which positions are “every third.”

index Remainder (index %% 3) Keep?
1 1
2 2
3 0
4 1
5 2
6 0
7 1

Identify Every Third Position

Identify which positions are “every third.”

1:length(x) %% 3 == 1
 [1]  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE

Subset x

Grab the elements of x we want to keep.

x
 [1] 15 16 17 18 19 20 21 22 23 24 25


x[1:length(x) %% 3 == 1]
[1] 15 18 21 24

Make it into a function!

every_third <- function(x) {
  
  x[1:length(x) %% 3 == 1]

  }

PA 7: Writing Functions

You will write several small functions, then use them to unscramble a message. Many of the functions have been started for you, but none of them are complete as is.

https://xkcd.com/

Collaborative Protocol

During your collaboration, your group will alternate between three roles:

  • Reads out the prompt and ensures the group understands what is being asked.
  • Manages resources (e.g., cheatsheets, textbook).
  • Answers Coder’s questions about syntax based on the resources.
  • Works with the group to debug the code.
  • Encourages the Coder to vocalize and explain their thinking.
  • Types the code specified by the Coder into the Quarto document.
  • Runs the code provided by the Coder.
  • Works with group to debug the code.
  • Evaluates the output against the question prompt.
  • Confirms they understand what the prompt is asking.
  • Talks with the group about their ideas.
  • Explains their thinking.
  • Directs the Computer what to type.
  • Works with the group to debug the code.

Submission

  • When you have completed the puzzle, you will end with 6 numbers that relate to a TV show. You will each individually submit the name of that TV show.
    • You can ask me if it is correct before you submit
  • You do not need to submit your code, but you should check your code against the solutions when they are posted! . . .

Starting Roles Today

The person who has the most siblings starts as the coder, second as the project manager.

To do…

  • PA 7: Writing Functions
    • Due Tuesday by 11:59pm
  • Project Checkpoint 2: Project Proposal and Group Contract
    • Due Friday, 5/15 at 11:59pm.
  • Lab 7: Searching for Efficiency
    • Due Sunday 5/17 at 11:59pm.