Week 7, Part 2: Iteration

1 Learning Objectives

  • Recognize a for() loop in R to handle repeated tasks
  • Use the map() family of functions in the purrr package to handle repeated tasks.
  • Identify the advantages of using the map() family of functions for repeated tasks.

📖 Readings: 45 min

📽 Optional Videos: 6 min


2 Introduction to Iteration

We just learned the rule of “don’t repeat yourself more than two times” and to instead automate our procedures with functions.

We previously usedacross() to help eliminate copy-paste when working with data frames. This is a form of iteration in programming as across() “iterates” over variables, applying a function to manipulate each variable and then doing the same for the next variable.

This week, we are adding to our toolbox of ways to do efficient iteration rather than repeating code.

3 For loops

while() and for() loops are a common form of iteration that can be extremely useful when logically thinking through a problem. If you are unfamiliar with loops or have not seen them in R, read the R4DS section linked below.

Unlike some other programming languages, loops are extremely computationally intensive in R. Thus, we will avoid using them at all costs!

📖 Optional Reading: R4DS Ed. 1 12.2: For loops

4 Vectorized Operations

As we discussed at the beginning of the quarter, one of the beautiful things about R is that many functions are vectorized. This means, that functions are built to work with vectors, and specifically to apply some operation to each element of a vector separately. This is actually a form of iteration!

x <- c(10, 34, 3)
y <- c(3, 35, 1)

x + y
[1] 13 69  4

Since addition is vectorized, we don’t have to use a for loop to add each element of the two vectors x and y together.

In languages which don’t have implicit support for vectorized computations, you would have to instead do:

result <- rep(NA, 3)

for(i in 1:3){
  result[i] <- x[i] + y[i]
}

result
[1] 13 69  4

In other words, we would map the function + to each entry of a and b. For atomic vectors, most functions if a function is vectorized, it will do this automatically!

But what if we want to map a function to each element of a list? Or what if a function isn’t vectorized? This is where the map family of the purrr package comes in.

5 Iteration with purrr

The purrr package in R provides functions that allow us to apply some task (function) to all elements of a list. This supports very computationally efficient iteration!

Note that there are base functions in R that solve similar problems (apply(), lapply(), tapply(), etc.), but purrr is much easier to use and has more consistent behavior. For that reason, we will not be working with the apply family functions in this course. I will also say that I would not use apply after learning how to use the map family of functions in purrr myself – they are much better!

If you feel a bit shaky on lists, please read the review below before continuing.

NoteReview of Lists

A list is a 1-dimensional data structure that has no restrictions on what type of content is stored within it. A list is a “vector”, but it is not an atomic vector - that is, it does not necessarily contain things that are all the same type.

mylist <- list(
    logicals = c(TRUE, TRUE, FALSE, FALSE, TRUE), 
    numeric_vec = 1:12, 
    third_thing = letters[1:2]
    )

mylist
$logicals
[1]  TRUE  TRUE FALSE FALSE  TRUE

$numeric_vec
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

$third_thing
[1] "a" "b"

List components may have names (or not), be homogeneous (or not), have the same length (or not).

Indexing

Indexing necessarily differs between R and Python, and since the list types are also somewhat different (e.g. lists cannot be named in python), we will treat list indexing in the two languages separately.

A pepper shaker containing several individual paper packets of pepper

An unusual pepper shaker which we’ll call pepper

A pepper shaker containing a single individual paper packet of pepper.

When a list is indexed with single brackets, pepper[1], the return value is always a list containing the selected element(s).

A single individual paper packet of pepper, no longer contained within a pepper shaker.

When a list is indexed with double brackets, pepper[[1]], the return value is the selected element.

A pile of pepper, free from any containment structures.

To actually access the pepper, we have to use double indexing and index both the list object and the sub-object, as in pepper[[1]][[1]].
Figure 1: The types of indexing are made most memorable with a fantastic visual example from @r4ds, which I have repeated here.

There are 3 ways to index a list:

  • With single square brackets, just like we index atomic vectors. In this case, the return value is always a list.
mylist[1]
$logicals
[1]  TRUE  TRUE FALSE FALSE  TRUE
mylist[2]
$numeric_vec
 [1]  1  2  3  4  5  6  7  8  9 10 11 12
mylist[c(T, F, T)]
$logicals
[1]  TRUE  TRUE FALSE FALSE  TRUE

$third_thing
[1] "a" "b"
  • With double square brackets. In this case, the return value is the thing inside the specified position in the list, but you also can only get one entry in the main list at a time. You can also get things by name.
mylist[[1]]
[1]  TRUE  TRUE FALSE FALSE  TRUE
mylist[["third_thing"]]
[1] "a" "b"
  • Using x$name. This is equivalent to using x[["name"]]. Note that this does not work on unnamed entries in the list.
mylist$third_thing
[1] "a" "b"

To access the contents of a list object, we have to use double-indexing:

mylist[["third_thing"]][[1]]
[1] "a"

The map family of functions is so called, because all of the function names start with the word map. We can think of these functions as “mapping” a function to all elements of a list. You will learn more about these functions in the reading.

📖 Required Reading: R4DS Ed. 1 12.5: The map functions

📽 Optional Video: Iteration with the map() family