my_string <- "Hi, my name is Bond!"
my_string[1] "Hi, my name is Bond!"
stringr to Work with StringsToday we will…
Follow along
Remember to download, save, and open up the starter notes for this week!
stringr
lubridate
dplyr + stringr + ludridategit
A string is a bunch of characters.
There is a difference between…
…a string (many characters, one object)…
and
…a character vector (vector of strings).
stringrCommon tasks

Note
stringr package loads with tidyverse.str_xxx().pattern =The pattern argument appears in many stringr functions.
Let’s explore these functions!
str_detect()Returns a logical vector indicating whether the pattern was found in each element of the supplied vector.
filter().summarise() + sum (to get total matches) or mean (to get proportion of matches).Related Function
str_which() returns the indexes of the strings that contain a match.
str_match()Returns a character matrix containing either NA or the pattern, depending on if the pattern was found.
str_extract()Returns a character vector with either NA or the pattern, depending on if the pattern was found.
Warning
str_extract() only returns the first pattern match.
Use str_extract_all() to return every pattern match.
Suppose we had a slightly different vector…
str_locate()Returns a dateframe with two numeric variables – the starting and ending location of the pattern. The values are NA if the pattern is not found.
Related Function
str_sub() extracts values based on a starting and ending location.
str_subset()Returns a character vector containing a subset of the original character vector consisting of the elements where the pattern was found anywhere in the element.
Note
For each of these functions, write down:
Replace the first matched pattern in each string.
mutate().Related Function
str_replace_all() replaces all matched patterns in each string.
Convert letters in a string to a specific capitalization format.
str_to_lower() converts all letters in a string to lowercase.
str_to_upper() converts all letters in a string to uppercase.
Join multiple strings into a single character vector.
prompt <- "Hello, my name is"
first <- "James"
last <- "Bond"
str_c(prompt, last, ",", first, last, sep = " ")[1] "Hello, my name is Bond , James Bond"
Note
Similar to paste() and paste0().
Combine a vector of strings into a single string.
Use variables in the environment to create a string based on {expressions}.
My name is Bond, James Bond
Tip
For more details, I would recommend looking up the glue R package!
Refer to the stringr cheatsheet
Remember that str_xxx functions need the first argument to be a vector of strings, not a dataset!
dplyr verbs like filter() or mutate().| name | is_bran | manuf | type | calories | protein | fat | sodium | fiber | carbo | sugars | potass | vitamins | shelf | weight | cups | rating |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 100% Bran | TRUE | N | cold | 70 | 4 | 1 | 130 | 10.0 | 5.0 | 6 | 280 | 25 | 3 | 1.00 | 0.33 | 68.40297 |
| 100% Natural Bran | TRUE | Q | cold | 120 | 3 | 5 | 15 | 2.0 | 8.0 | 8 | 135 | 0 | 3 | 1.00 | 1.00 | 33.98368 |
| All-Bran | TRUE | K | cold | 70 | 4 | 1 | 260 | 9.0 | 7.0 | 5 | 320 | 25 | 3 | 1.00 | 0.33 | 59.42551 |
| All-Bran with Extra Fiber | TRUE | K | cold | 50 | 4 | 0 | 140 | 14.0 | 8.0 | 0 | 330 | 25 | 3 | 1.00 | 0.50 | 93.70491 |
| Almond Delight | FALSE | R | cold | 110 | 2 | 2 | 200 | 1.0 | 14.0 | 8 | -1 | 25 | 3 | 1.00 | 0.75 | 34.38484 |
| Apple Cinnamon Cheerios | FALSE | G | cold | 110 | 2 | 2 | 180 | 1.5 | 10.5 | 10 | 70 | 25 | 1 | 1.00 | 0.75 | 29.50954 |
| Apple Jacks | FALSE | K | cold | 110 | 2 | 0 | 125 | 1.0 | 11.0 | 14 | 30 | 25 | 2 | 1.00 | 1.00 | 33.17409 |
| Basic 4 | FALSE | G | cold | 130 | 3 | 2 | 210 | 2.0 | 18.0 | 8 | 100 | 25 | 3 | 1.33 | 0.75 | 37.03856 |
| Bran Chex | TRUE | R | cold | 90 | 2 | 1 | 200 | 4.0 | 15.0 | 6 | 125 | 25 | 1 | 1.00 | 0.67 | 49.12025 |
| Bran Flakes | TRUE | P | cold | 90 | 3 | 0 | 210 | 5.0 | 13.0 | 5 | 190 | 25 | 3 | 1.00 | 0.67 | 53.31381 |
| Cap'n'Crunch | FALSE | Q | cold | 120 | 1 | 2 | 220 | 0.0 | 12.0 | 12 | 35 | 25 | 2 | 1.00 | 0.75 | 18.04285 |
| Cheerios | FALSE | G | cold | 110 | 6 | 2 | 290 | 2.0 | 17.0 | 1 | 105 | 25 | 1 | 1.00 | 1.25 | 50.76500 |
| Cinnamon Toast Crunch | FALSE | G | cold | 120 | 1 | 3 | 210 | 0.0 | 13.0 | 9 | 45 | 25 | 2 | 1.00 | 0.75 | 19.82357 |
| Clusters | FALSE | G | cold | 110 | 3 | 2 | 140 | 2.0 | 13.0 | 7 | 105 | 25 | 3 | 1.00 | 0.50 | 40.40021 |
| Cocoa Puffs | FALSE | G | cold | 110 | 1 | 1 | 180 | 0.0 | 12.0 | 13 | 55 | 25 | 2 | 1.00 | 1.00 | 22.73645 |
| Corn Chex | FALSE | R | cold | 110 | 2 | 0 | 280 | 0.0 | 22.0 | 3 | 25 | 25 | 1 | 1.00 | 1.00 | 41.44502 |
| Corn Flakes | FALSE | K | cold | 100 | 2 | 0 | 290 | 1.0 | 21.0 | 2 | 35 | 25 | 1 | 1.00 | 1.00 | 45.86332 |
| Corn Pops | FALSE | K | cold | 110 | 1 | 0 | 90 | 1.0 | 13.0 | 12 | 20 | 25 | 2 | 1.00 | 1.00 | 35.78279 |
| Count Chocula | FALSE | G | cold | 110 | 1 | 1 | 180 | 0.0 | 12.0 | 13 | 65 | 25 | 2 | 1.00 | 1.00 | 22.39651 |
| Cracklin' Oat Bran | TRUE | K | cold | 110 | 3 | 3 | 140 | 4.0 | 10.0 | 7 | 160 | 25 | 3 | 1.00 | 0.50 | 40.44877 |
| Cream of Wheat (Quick) | FALSE | N | hot | 100 | 3 | 0 | 80 | 1.0 | 21.0 | 0 | -1 | 0 | 2 | 1.00 | 1.00 | 64.53382 |
| Crispix | FALSE | K | cold | 110 | 2 | 0 | 220 | 1.0 | 21.0 | 3 | 30 | 25 | 3 | 1.00 | 1.00 | 46.89564 |
| Crispy Wheat & Raisins | FALSE | G | cold | 100 | 2 | 1 | 140 | 2.0 | 11.0 | 10 | 120 | 25 | 3 | 1.00 | 0.75 | 36.17620 |
| Double Chex | FALSE | R | cold | 100 | 2 | 0 | 190 | 1.0 | 18.0 | 5 | 80 | 25 | 3 | 1.00 | 0.75 | 44.33086 |
| Froot Loops | FALSE | K | cold | 110 | 2 | 1 | 125 | 1.0 | 11.0 | 13 | 30 | 25 | 2 | 1.00 | 1.00 | 32.20758 |
| Frosted Flakes | FALSE | K | cold | 110 | 1 | 0 | 200 | 1.0 | 14.0 | 11 | 25 | 25 | 1 | 1.00 | 0.75 | 31.43597 |
| Frosted Mini-Wheats | FALSE | K | cold | 100 | 3 | 0 | 0 | 3.0 | 14.0 | 7 | 100 | 25 | 2 | 1.00 | 0.80 | 58.34514 |
| Fruit & Fibre Dates; Walnuts; and Oats | FALSE | P | cold | 120 | 3 | 2 | 160 | 5.0 | 12.0 | 10 | 200 | 25 | 3 | 1.25 | 0.67 | 40.91705 |
| Fruitful Bran | TRUE | K | cold | 120 | 3 | 0 | 240 | 5.0 | 14.0 | 12 | 190 | 25 | 3 | 1.33 | 0.67 | 41.01549 |
| Fruity Pebbles | FALSE | P | cold | 110 | 1 | 1 | 135 | 0.0 | 13.0 | 12 | 25 | 25 | 2 | 1.00 | 0.75 | 28.02576 |
| Golden Crisp | FALSE | P | cold | 100 | 2 | 0 | 45 | 0.0 | 11.0 | 15 | 40 | 25 | 1 | 1.00 | 0.88 | 35.25244 |
| Golden Grahams | FALSE | G | cold | 110 | 1 | 1 | 280 | 0.0 | 15.0 | 9 | 45 | 25 | 2 | 1.00 | 0.75 | 23.80404 |
| Grape Nuts Flakes | FALSE | P | cold | 100 | 3 | 1 | 140 | 3.0 | 15.0 | 5 | 85 | 25 | 3 | 1.00 | 0.88 | 52.07690 |
| Grape-Nuts | FALSE | P | cold | 110 | 3 | 0 | 170 | 3.0 | 17.0 | 3 | 90 | 25 | 3 | 1.00 | 0.25 | 53.37101 |
| Great Grains Pecan | FALSE | P | cold | 120 | 3 | 3 | 75 | 3.0 | 13.0 | 4 | 100 | 25 | 3 | 1.00 | 0.33 | 45.81172 |
| Honey Graham Ohs | FALSE | Q | cold | 120 | 1 | 2 | 220 | 1.0 | 12.0 | 11 | 45 | 25 | 2 | 1.00 | 1.00 | 21.87129 |
| Honey Nut Cheerios | FALSE | G | cold | 110 | 3 | 1 | 250 | 1.5 | 11.5 | 10 | 90 | 25 | 1 | 1.00 | 0.75 | 31.07222 |
| Honey-comb | FALSE | P | cold | 110 | 1 | 0 | 180 | 0.0 | 14.0 | 11 | 35 | 25 | 1 | 1.00 | 1.33 | 28.74241 |
| Just Right Crunchy Nuggets | FALSE | K | cold | 110 | 2 | 1 | 170 | 1.0 | 17.0 | 6 | 60 | 100 | 3 | 1.00 | 1.00 | 36.52368 |
| Just Right Fruit & Nut | FALSE | K | cold | 140 | 3 | 1 | 170 | 2.0 | 20.0 | 9 | 95 | 100 | 3 | 1.30 | 0.75 | 36.47151 |
| Kix | FALSE | G | cold | 110 | 2 | 1 | 260 | 0.0 | 21.0 | 3 | 40 | 25 | 2 | 1.00 | 1.50 | 39.24111 |
| Life | FALSE | Q | cold | 100 | 4 | 2 | 150 | 2.0 | 12.0 | 6 | 95 | 25 | 2 | 1.00 | 0.67 | 45.32807 |
| Lucky Charms | FALSE | G | cold | 110 | 2 | 1 | 180 | 0.0 | 12.0 | 12 | 55 | 25 | 2 | 1.00 | 1.00 | 26.73451 |
| Maypo | FALSE | A | hot | 100 | 4 | 1 | 0 | 0.0 | 16.0 | 3 | 95 | 25 | 2 | 1.00 | 1.00 | 54.85092 |
| Muesli Raisins; Dates; & Almonds | FALSE | R | cold | 150 | 4 | 3 | 95 | 3.0 | 16.0 | 11 | 170 | 25 | 3 | 1.00 | 1.00 | 37.13686 |
| Muesli Raisins; Peaches; & Pecans | FALSE | R | cold | 150 | 4 | 3 | 150 | 3.0 | 16.0 | 11 | 170 | 25 | 3 | 1.00 | 1.00 | 34.13976 |
| Mueslix Crispy Blend | FALSE | K | cold | 160 | 3 | 2 | 150 | 3.0 | 17.0 | 13 | 160 | 25 | 3 | 1.50 | 0.67 | 30.31335 |
| Multi-Grain Cheerios | FALSE | G | cold | 100 | 2 | 1 | 220 | 2.0 | 15.0 | 6 | 90 | 25 | 1 | 1.00 | 1.00 | 40.10596 |
| Nut&Honey Crunch | FALSE | K | cold | 120 | 2 | 1 | 190 | 0.0 | 15.0 | 9 | 40 | 25 | 2 | 1.00 | 0.67 | 29.92429 |
| Nutri-Grain Almond-Raisin | FALSE | K | cold | 140 | 3 | 2 | 220 | 3.0 | 21.0 | 7 | 130 | 25 | 3 | 1.33 | 0.67 | 40.69232 |
| Nutri-grain Wheat | FALSE | K | cold | 90 | 3 | 0 | 170 | 3.0 | 18.0 | 2 | 90 | 25 | 3 | 1.00 | 1.00 | 59.64284 |
| Oatmeal Raisin Crisp | FALSE | G | cold | 130 | 3 | 2 | 170 | 1.5 | 13.5 | 10 | 120 | 25 | 3 | 1.25 | 0.50 | 30.45084 |
| Post Nat. Raisin Bran | TRUE | P | cold | 120 | 3 | 1 | 200 | 6.0 | 11.0 | 14 | 260 | 25 | 3 | 1.33 | 0.67 | 37.84059 |
| Product 19 | FALSE | K | cold | 100 | 3 | 0 | 320 | 1.0 | 20.0 | 3 | 45 | 100 | 3 | 1.00 | 1.00 | 41.50354 |
| Puffed Rice | FALSE | Q | cold | 50 | 1 | 0 | 0 | 0.0 | 13.0 | 0 | 15 | 0 | 3 | 0.50 | 1.00 | 60.75611 |
| Puffed Wheat | FALSE | Q | cold | 50 | 2 | 0 | 0 | 1.0 | 10.0 | 0 | 50 | 0 | 3 | 0.50 | 1.00 | 63.00565 |
| Quaker Oat Squares | FALSE | Q | cold | 100 | 4 | 1 | 135 | 2.0 | 14.0 | 6 | 110 | 25 | 3 | 1.00 | 0.50 | 49.51187 |
| Quaker Oatmeal | FALSE | Q | hot | 100 | 5 | 2 | 0 | 2.7 | -1.0 | -1 | 110 | 0 | 1 | 1.00 | 0.67 | 50.82839 |
| Raisin Bran | TRUE | K | cold | 120 | 3 | 1 | 210 | 5.0 | 14.0 | 12 | 240 | 25 | 2 | 1.33 | 0.75 | 39.25920 |
| Raisin Nut Bran | TRUE | G | cold | 100 | 3 | 2 | 140 | 2.5 | 10.5 | 8 | 140 | 25 | 3 | 1.00 | 0.50 | 39.70340 |
| Raisin Squares | FALSE | K | cold | 90 | 2 | 0 | 0 | 2.0 | 15.0 | 6 | 110 | 25 | 3 | 1.00 | 0.50 | 55.33314 |
| Rice Chex | FALSE | R | cold | 110 | 1 | 0 | 240 | 0.0 | 23.0 | 2 | 30 | 25 | 1 | 1.00 | 1.13 | 41.99893 |
| Rice Krispies | FALSE | K | cold | 110 | 2 | 0 | 290 | 0.0 | 22.0 | 3 | 35 | 25 | 1 | 1.00 | 1.00 | 40.56016 |
| Shredded Wheat | FALSE | N | cold | 80 | 2 | 0 | 0 | 3.0 | 16.0 | 0 | 95 | 0 | 1 | 0.83 | 1.00 | 68.23588 |
| Shredded Wheat 'n'Bran | TRUE | N | cold | 90 | 3 | 0 | 0 | 4.0 | 19.0 | 0 | 140 | 0 | 1 | 1.00 | 0.67 | 74.47295 |
| Shredded Wheat spoon size | FALSE | N | cold | 90 | 3 | 0 | 0 | 3.0 | 20.0 | 0 | 120 | 0 | 1 | 1.00 | 0.67 | 72.80179 |
| Smacks | FALSE | K | cold | 110 | 2 | 1 | 70 | 1.0 | 9.0 | 15 | 40 | 25 | 2 | 1.00 | 0.75 | 31.23005 |
| Special K | FALSE | K | cold | 110 | 6 | 0 | 230 | 1.0 | 16.0 | 3 | 55 | 25 | 1 | 1.00 | 1.00 | 53.13132 |
| Strawberry Fruit Wheats | FALSE | N | cold | 90 | 2 | 0 | 15 | 3.0 | 15.0 | 5 | 90 | 25 | 2 | 1.00 | 1.00 | 59.36399 |
| Total Corn Flakes | FALSE | G | cold | 110 | 2 | 1 | 200 | 0.0 | 21.0 | 3 | 35 | 100 | 3 | 1.00 | 1.00 | 38.83975 |
| Total Raisin Bran | TRUE | G | cold | 140 | 3 | 1 | 190 | 4.0 | 15.0 | 14 | 230 | 100 | 3 | 1.50 | 1.00 | 28.59278 |
| Total Whole Grain | FALSE | G | cold | 100 | 3 | 1 | 200 | 3.0 | 16.0 | 3 | 110 | 100 | 3 | 1.00 | 1.00 | 46.65884 |
| Triples | FALSE | G | cold | 110 | 2 | 1 | 250 | 0.0 | 21.0 | 3 | 60 | 25 | 3 | 1.00 | 0.75 | 39.10617 |
| Trix | FALSE | G | cold | 110 | 1 | 1 | 140 | 0.0 | 13.0 | 12 | 25 | 25 | 2 | 1.00 | 1.00 | 27.75330 |
| Wheat Chex | FALSE | R | cold | 100 | 3 | 1 | 230 | 3.0 | 17.0 | 3 | 115 | 25 | 1 | 1.00 | 0.67 | 49.78744 |
| Wheaties | FALSE | G | cold | 100 | 3 | 1 | 200 | 3.0 | 17.0 | 3 | 110 | 25 | 1 | 1.00 | 1.00 | 51.59219 |
| Wheaties Honey Gold | FALSE | G | cold | 110 | 2 | 1 | 200 | 1.0 | 16.0 | 8 | 60 | 25 | 1 | 1.00 | 0.75 | 36.18756 |
The real power of these str_xxx functions comes when you specify the pattern using regular expressions!
“Regexps are a very terse language that allow you to describe patterns in strings.”
R for Data Science
Use str_xxx functions + regular expressions!
Tip
You might encounter gsub(), grep(), etc. from Base R, but I would highly recommending using functions from the stringr package instead.
…are tricky!
This web app for testing R regular expressions might be handy!
There is a set of characters that have a specific meaning when using regex.
stringr package does not read these as normal characters.. ^ $ \ | * + ? { } [ ] ( )
.This character can match any character.
[1] "sells" "seashells"
^ $? + *? – matches when the preceding character occurs 0 or 1 times in a row.
{}{n} – matches when the preceding character occurs exactly n times in a row.
[][^ ] – specifies characters not to match on (think except)
But remember that ^ outside of brackets specifies the first charatcter in a string.
Warning
Why do “Peter” and “Piper” not match "^[^p]"?
Capitilization matters!
[][ - ] – specifies a range of characters.
\\w – matches any “word” (\\W matches not “word”)
\\d – matches any digit (\\D matches not digit)
\\s – matches any whitespace (\\S matches not whitespace)
()Groups are created with ( ).
|.This matches strings that contain either “peck” or “pick”.
()\\1) to specify that certain groupings repeat.[1] "hannah" "race car"
This matches strings that start and end with the same character.
()What regular expressions would match words that…
\\To match a special character, you need to escape it.
\\Use \\ to escape the ? – it is now read as a normal character.
Use the web app to test R regular expressions.
stringr cheatsheet.I want to join two datasets that have a county variable:
Practice
What stringr function will help me join the county_pop and county_loc by county?
What if I want to pull out only the area code in a phone number?
Practice
You will need a stringr function and to use regular expressions!
What if I want just the numbers in the area code?
| awards |
|---|
| Beyonce: 35G, 0A, 0E |
| Kendrick Lamar: 22G, 0A, 1E |
| Charli XCX: 2G, 0A, 0E |
| Cynthia Erivo: 1G, 0A, 1E |
| Viola Davis: 1G, 1A, 1E |
| Elton John: 6G, 2A, 1E |
That’s annoying…
Create a variable with just the artist name and a variable with the number of Grammys won.
In this activity, you will use functions from the stringr package and regex to decode a message.
