<- "Hi, my name is Bond!"
my_string my_string
[1] "Hi, my name is Bond!"
stringr
to Work with StringsToday we will…
Follow along
Remember to download, save, and open up the starter notes for this week!
stringr
lubridate
dplyr
+ stringr
+ ludridate
git
A string is a bunch of characters.
There is a difference between…
…a string (many characters, one object)…
and
…a character vector (vector of strings).
stringr
Common tasks
Note
stringr
package loads with tidyverse
.str_xxx()
.pattern =
The pattern
argument appears in many stringr
functions.
Let’s explore these functions!
str_detect()
Returns a logical vector indicating whether the pattern was found in each element of the supplied vector.
filter()
.summarise()
+ sum
(to get total matches) or mean
(to get proportion of matches).Related Function
str_which()
returns the indexes of the strings that contain a match.
str_match()
Returns a character matrix containing either NA
or the pattern, depending on if the pattern was found.
str_extract()
Returns a character vector with either NA
or the pattern, depending on if the pattern was found.
Warning
str_extract()
only returns the first pattern match.
Use str_extract_all()
to return every pattern match.
Suppose we had a slightly different vector…
str_locate()
Returns a dateframe with two numeric variables – the starting and ending location of the pattern. The values are NA
if the pattern is not found.
Related Function
str_sub()
extracts values based on a starting and ending location.
str_subset()
Returns a character vector containing a subset of the original character vector consisting of the elements where the pattern was found anywhere in the element.
Note
For each of these functions, write down:
Replace the first matched pattern in each string.
mutate()
.Related Function
str_replace_all()
replaces all matched patterns in each string.
Convert letters in a string to a specific capitalization format.
str_to_lower()
converts all letters in a string to lowercase.
str_to_upper()
converts all letters in a string to uppercase.
Join multiple strings into a single character vector.
prompt <- "Hello, my name is"
first <- "James"
last <- "Bond"
str_c(prompt, last, ",", first, last, sep = " ")
[1] "Hello, my name is Bond , James Bond"
Note
Similar to paste()
and paste0()
.
Combine a vector of strings into a single string.
Use variables in the environment to create a string based on {expressions}.
My name is Bond, James Bond
Tip
For more details, I would recommend looking up the glue
R package!
Refer to the stringr
cheatsheet
Remember that str_xxx
functions need the first argument to be a vector of strings, not a dataset!
dplyr
verbs like filter()
or mutate()
.name | is_bran | manuf | type | calories | protein | fat | sodium | fiber | carbo | sugars | potass | vitamins | shelf | weight | cups | rating |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
100% Bran | TRUE | N | cold | 70 | 4 | 1 | 130 | 10.0 | 5.0 | 6 | 280 | 25 | 3 | 1.00 | 0.33 | 68.40297 |
100% Natural Bran | TRUE | Q | cold | 120 | 3 | 5 | 15 | 2.0 | 8.0 | 8 | 135 | 0 | 3 | 1.00 | 1.00 | 33.98368 |
All-Bran | TRUE | K | cold | 70 | 4 | 1 | 260 | 9.0 | 7.0 | 5 | 320 | 25 | 3 | 1.00 | 0.33 | 59.42551 |
All-Bran with Extra Fiber | TRUE | K | cold | 50 | 4 | 0 | 140 | 14.0 | 8.0 | 0 | 330 | 25 | 3 | 1.00 | 0.50 | 93.70491 |
Almond Delight | FALSE | R | cold | 110 | 2 | 2 | 200 | 1.0 | 14.0 | 8 | -1 | 25 | 3 | 1.00 | 0.75 | 34.38484 |
Apple Cinnamon Cheerios | FALSE | G | cold | 110 | 2 | 2 | 180 | 1.5 | 10.5 | 10 | 70 | 25 | 1 | 1.00 | 0.75 | 29.50954 |
Apple Jacks | FALSE | K | cold | 110 | 2 | 0 | 125 | 1.0 | 11.0 | 14 | 30 | 25 | 2 | 1.00 | 1.00 | 33.17409 |
Basic 4 | FALSE | G | cold | 130 | 3 | 2 | 210 | 2.0 | 18.0 | 8 | 100 | 25 | 3 | 1.33 | 0.75 | 37.03856 |
Bran Chex | TRUE | R | cold | 90 | 2 | 1 | 200 | 4.0 | 15.0 | 6 | 125 | 25 | 1 | 1.00 | 0.67 | 49.12025 |
Bran Flakes | TRUE | P | cold | 90 | 3 | 0 | 210 | 5.0 | 13.0 | 5 | 190 | 25 | 3 | 1.00 | 0.67 | 53.31381 |
Cap'n'Crunch | FALSE | Q | cold | 120 | 1 | 2 | 220 | 0.0 | 12.0 | 12 | 35 | 25 | 2 | 1.00 | 0.75 | 18.04285 |
Cheerios | FALSE | G | cold | 110 | 6 | 2 | 290 | 2.0 | 17.0 | 1 | 105 | 25 | 1 | 1.00 | 1.25 | 50.76500 |
Cinnamon Toast Crunch | FALSE | G | cold | 120 | 1 | 3 | 210 | 0.0 | 13.0 | 9 | 45 | 25 | 2 | 1.00 | 0.75 | 19.82357 |
Clusters | FALSE | G | cold | 110 | 3 | 2 | 140 | 2.0 | 13.0 | 7 | 105 | 25 | 3 | 1.00 | 0.50 | 40.40021 |
Cocoa Puffs | FALSE | G | cold | 110 | 1 | 1 | 180 | 0.0 | 12.0 | 13 | 55 | 25 | 2 | 1.00 | 1.00 | 22.73645 |
Corn Chex | FALSE | R | cold | 110 | 2 | 0 | 280 | 0.0 | 22.0 | 3 | 25 | 25 | 1 | 1.00 | 1.00 | 41.44502 |
Corn Flakes | FALSE | K | cold | 100 | 2 | 0 | 290 | 1.0 | 21.0 | 2 | 35 | 25 | 1 | 1.00 | 1.00 | 45.86332 |
Corn Pops | FALSE | K | cold | 110 | 1 | 0 | 90 | 1.0 | 13.0 | 12 | 20 | 25 | 2 | 1.00 | 1.00 | 35.78279 |
Count Chocula | FALSE | G | cold | 110 | 1 | 1 | 180 | 0.0 | 12.0 | 13 | 65 | 25 | 2 | 1.00 | 1.00 | 22.39651 |
Cracklin' Oat Bran | TRUE | K | cold | 110 | 3 | 3 | 140 | 4.0 | 10.0 | 7 | 160 | 25 | 3 | 1.00 | 0.50 | 40.44877 |
Cream of Wheat (Quick) | FALSE | N | hot | 100 | 3 | 0 | 80 | 1.0 | 21.0 | 0 | -1 | 0 | 2 | 1.00 | 1.00 | 64.53382 |
Crispix | FALSE | K | cold | 110 | 2 | 0 | 220 | 1.0 | 21.0 | 3 | 30 | 25 | 3 | 1.00 | 1.00 | 46.89564 |
Crispy Wheat & Raisins | FALSE | G | cold | 100 | 2 | 1 | 140 | 2.0 | 11.0 | 10 | 120 | 25 | 3 | 1.00 | 0.75 | 36.17620 |
Double Chex | FALSE | R | cold | 100 | 2 | 0 | 190 | 1.0 | 18.0 | 5 | 80 | 25 | 3 | 1.00 | 0.75 | 44.33086 |
Froot Loops | FALSE | K | cold | 110 | 2 | 1 | 125 | 1.0 | 11.0 | 13 | 30 | 25 | 2 | 1.00 | 1.00 | 32.20758 |
Frosted Flakes | FALSE | K | cold | 110 | 1 | 0 | 200 | 1.0 | 14.0 | 11 | 25 | 25 | 1 | 1.00 | 0.75 | 31.43597 |
Frosted Mini-Wheats | FALSE | K | cold | 100 | 3 | 0 | 0 | 3.0 | 14.0 | 7 | 100 | 25 | 2 | 1.00 | 0.80 | 58.34514 |
Fruit & Fibre Dates; Walnuts; and Oats | FALSE | P | cold | 120 | 3 | 2 | 160 | 5.0 | 12.0 | 10 | 200 | 25 | 3 | 1.25 | 0.67 | 40.91705 |
Fruitful Bran | TRUE | K | cold | 120 | 3 | 0 | 240 | 5.0 | 14.0 | 12 | 190 | 25 | 3 | 1.33 | 0.67 | 41.01549 |
Fruity Pebbles | FALSE | P | cold | 110 | 1 | 1 | 135 | 0.0 | 13.0 | 12 | 25 | 25 | 2 | 1.00 | 0.75 | 28.02576 |
Golden Crisp | FALSE | P | cold | 100 | 2 | 0 | 45 | 0.0 | 11.0 | 15 | 40 | 25 | 1 | 1.00 | 0.88 | 35.25244 |
Golden Grahams | FALSE | G | cold | 110 | 1 | 1 | 280 | 0.0 | 15.0 | 9 | 45 | 25 | 2 | 1.00 | 0.75 | 23.80404 |
Grape Nuts Flakes | FALSE | P | cold | 100 | 3 | 1 | 140 | 3.0 | 15.0 | 5 | 85 | 25 | 3 | 1.00 | 0.88 | 52.07690 |
Grape-Nuts | FALSE | P | cold | 110 | 3 | 0 | 170 | 3.0 | 17.0 | 3 | 90 | 25 | 3 | 1.00 | 0.25 | 53.37101 |
Great Grains Pecan | FALSE | P | cold | 120 | 3 | 3 | 75 | 3.0 | 13.0 | 4 | 100 | 25 | 3 | 1.00 | 0.33 | 45.81172 |
Honey Graham Ohs | FALSE | Q | cold | 120 | 1 | 2 | 220 | 1.0 | 12.0 | 11 | 45 | 25 | 2 | 1.00 | 1.00 | 21.87129 |
Honey Nut Cheerios | FALSE | G | cold | 110 | 3 | 1 | 250 | 1.5 | 11.5 | 10 | 90 | 25 | 1 | 1.00 | 0.75 | 31.07222 |
Honey-comb | FALSE | P | cold | 110 | 1 | 0 | 180 | 0.0 | 14.0 | 11 | 35 | 25 | 1 | 1.00 | 1.33 | 28.74241 |
Just Right Crunchy Nuggets | FALSE | K | cold | 110 | 2 | 1 | 170 | 1.0 | 17.0 | 6 | 60 | 100 | 3 | 1.00 | 1.00 | 36.52368 |
Just Right Fruit & Nut | FALSE | K | cold | 140 | 3 | 1 | 170 | 2.0 | 20.0 | 9 | 95 | 100 | 3 | 1.30 | 0.75 | 36.47151 |
Kix | FALSE | G | cold | 110 | 2 | 1 | 260 | 0.0 | 21.0 | 3 | 40 | 25 | 2 | 1.00 | 1.50 | 39.24111 |
Life | FALSE | Q | cold | 100 | 4 | 2 | 150 | 2.0 | 12.0 | 6 | 95 | 25 | 2 | 1.00 | 0.67 | 45.32807 |
Lucky Charms | FALSE | G | cold | 110 | 2 | 1 | 180 | 0.0 | 12.0 | 12 | 55 | 25 | 2 | 1.00 | 1.00 | 26.73451 |
Maypo | FALSE | A | hot | 100 | 4 | 1 | 0 | 0.0 | 16.0 | 3 | 95 | 25 | 2 | 1.00 | 1.00 | 54.85092 |
Muesli Raisins; Dates; & Almonds | FALSE | R | cold | 150 | 4 | 3 | 95 | 3.0 | 16.0 | 11 | 170 | 25 | 3 | 1.00 | 1.00 | 37.13686 |
Muesli Raisins; Peaches; & Pecans | FALSE | R | cold | 150 | 4 | 3 | 150 | 3.0 | 16.0 | 11 | 170 | 25 | 3 | 1.00 | 1.00 | 34.13976 |
Mueslix Crispy Blend | FALSE | K | cold | 160 | 3 | 2 | 150 | 3.0 | 17.0 | 13 | 160 | 25 | 3 | 1.50 | 0.67 | 30.31335 |
Multi-Grain Cheerios | FALSE | G | cold | 100 | 2 | 1 | 220 | 2.0 | 15.0 | 6 | 90 | 25 | 1 | 1.00 | 1.00 | 40.10596 |
Nut&Honey Crunch | FALSE | K | cold | 120 | 2 | 1 | 190 | 0.0 | 15.0 | 9 | 40 | 25 | 2 | 1.00 | 0.67 | 29.92429 |
Nutri-Grain Almond-Raisin | FALSE | K | cold | 140 | 3 | 2 | 220 | 3.0 | 21.0 | 7 | 130 | 25 | 3 | 1.33 | 0.67 | 40.69232 |
Nutri-grain Wheat | FALSE | K | cold | 90 | 3 | 0 | 170 | 3.0 | 18.0 | 2 | 90 | 25 | 3 | 1.00 | 1.00 | 59.64284 |
Oatmeal Raisin Crisp | FALSE | G | cold | 130 | 3 | 2 | 170 | 1.5 | 13.5 | 10 | 120 | 25 | 3 | 1.25 | 0.50 | 30.45084 |
Post Nat. Raisin Bran | TRUE | P | cold | 120 | 3 | 1 | 200 | 6.0 | 11.0 | 14 | 260 | 25 | 3 | 1.33 | 0.67 | 37.84059 |
Product 19 | FALSE | K | cold | 100 | 3 | 0 | 320 | 1.0 | 20.0 | 3 | 45 | 100 | 3 | 1.00 | 1.00 | 41.50354 |
Puffed Rice | FALSE | Q | cold | 50 | 1 | 0 | 0 | 0.0 | 13.0 | 0 | 15 | 0 | 3 | 0.50 | 1.00 | 60.75611 |
Puffed Wheat | FALSE | Q | cold | 50 | 2 | 0 | 0 | 1.0 | 10.0 | 0 | 50 | 0 | 3 | 0.50 | 1.00 | 63.00565 |
Quaker Oat Squares | FALSE | Q | cold | 100 | 4 | 1 | 135 | 2.0 | 14.0 | 6 | 110 | 25 | 3 | 1.00 | 0.50 | 49.51187 |
Quaker Oatmeal | FALSE | Q | hot | 100 | 5 | 2 | 0 | 2.7 | -1.0 | -1 | 110 | 0 | 1 | 1.00 | 0.67 | 50.82839 |
Raisin Bran | TRUE | K | cold | 120 | 3 | 1 | 210 | 5.0 | 14.0 | 12 | 240 | 25 | 2 | 1.33 | 0.75 | 39.25920 |
Raisin Nut Bran | TRUE | G | cold | 100 | 3 | 2 | 140 | 2.5 | 10.5 | 8 | 140 | 25 | 3 | 1.00 | 0.50 | 39.70340 |
Raisin Squares | FALSE | K | cold | 90 | 2 | 0 | 0 | 2.0 | 15.0 | 6 | 110 | 25 | 3 | 1.00 | 0.50 | 55.33314 |
Rice Chex | FALSE | R | cold | 110 | 1 | 0 | 240 | 0.0 | 23.0 | 2 | 30 | 25 | 1 | 1.00 | 1.13 | 41.99893 |
Rice Krispies | FALSE | K | cold | 110 | 2 | 0 | 290 | 0.0 | 22.0 | 3 | 35 | 25 | 1 | 1.00 | 1.00 | 40.56016 |
Shredded Wheat | FALSE | N | cold | 80 | 2 | 0 | 0 | 3.0 | 16.0 | 0 | 95 | 0 | 1 | 0.83 | 1.00 | 68.23588 |
Shredded Wheat 'n'Bran | TRUE | N | cold | 90 | 3 | 0 | 0 | 4.0 | 19.0 | 0 | 140 | 0 | 1 | 1.00 | 0.67 | 74.47295 |
Shredded Wheat spoon size | FALSE | N | cold | 90 | 3 | 0 | 0 | 3.0 | 20.0 | 0 | 120 | 0 | 1 | 1.00 | 0.67 | 72.80179 |
Smacks | FALSE | K | cold | 110 | 2 | 1 | 70 | 1.0 | 9.0 | 15 | 40 | 25 | 2 | 1.00 | 0.75 | 31.23005 |
Special K | FALSE | K | cold | 110 | 6 | 0 | 230 | 1.0 | 16.0 | 3 | 55 | 25 | 1 | 1.00 | 1.00 | 53.13132 |
Strawberry Fruit Wheats | FALSE | N | cold | 90 | 2 | 0 | 15 | 3.0 | 15.0 | 5 | 90 | 25 | 2 | 1.00 | 1.00 | 59.36399 |
Total Corn Flakes | FALSE | G | cold | 110 | 2 | 1 | 200 | 0.0 | 21.0 | 3 | 35 | 100 | 3 | 1.00 | 1.00 | 38.83975 |
Total Raisin Bran | TRUE | G | cold | 140 | 3 | 1 | 190 | 4.0 | 15.0 | 14 | 230 | 100 | 3 | 1.50 | 1.00 | 28.59278 |
Total Whole Grain | FALSE | G | cold | 100 | 3 | 1 | 200 | 3.0 | 16.0 | 3 | 110 | 100 | 3 | 1.00 | 1.00 | 46.65884 |
Triples | FALSE | G | cold | 110 | 2 | 1 | 250 | 0.0 | 21.0 | 3 | 60 | 25 | 3 | 1.00 | 0.75 | 39.10617 |
Trix | FALSE | G | cold | 110 | 1 | 1 | 140 | 0.0 | 13.0 | 12 | 25 | 25 | 2 | 1.00 | 1.00 | 27.75330 |
Wheat Chex | FALSE | R | cold | 100 | 3 | 1 | 230 | 3.0 | 17.0 | 3 | 115 | 25 | 1 | 1.00 | 0.67 | 49.78744 |
Wheaties | FALSE | G | cold | 100 | 3 | 1 | 200 | 3.0 | 17.0 | 3 | 110 | 25 | 1 | 1.00 | 1.00 | 51.59219 |
Wheaties Honey Gold | FALSE | G | cold | 110 | 2 | 1 | 200 | 1.0 | 16.0 | 8 | 60 | 25 | 1 | 1.00 | 0.75 | 36.18756 |
The real power of these str_xxx
functions comes when you specify the pattern
using regular expressions!
“Regexps are a very terse language that allow you to describe patterns in strings.”
R for Data Science
Use str_xxx
functions + regular expressions!
Tip
You might encounter gsub()
, grep()
, etc. from Base R, but I would highly recommending using functions from the stringr
package instead.
…are tricky!
This web app for testing R regular expressions might be handy!
There is a set of characters that have a specific meaning when using regex.
stringr
package does not read these as normal characters..
^
$
\
|
*
+
?
{
}
[
]
(
)
.
This character can match any character.
[1] "sells" "seashells"
^ $
? + *
?
– matches when the preceding character occurs 0 or 1 times in a row.
{}
{n}
– matches when the preceding character occurs exactly n times in a row.
[]
[^ ]
– specifies characters not to match on (think except)
But remember that ^
outside of brackets specifies the first charatcter in a string.
Warning
Why do “Peter” and “Piper” not match "^[^p]"
?
Capitilization matters!
[]
[ - ]
– specifies a range of characters.
\\w
– matches any “word” (\\W
matches not “word”)
\\d
– matches any digit (\\D
matches not digit)
\\s
– matches any whitespace (\\S
matches not whitespace)
()
Groups are created with ( )
.
|
.This matches strings that contain either “peck” or “pick”.
()
\\1
) to specify that certain groupings repeat.[1] "hannah" "race car"
This matches strings that start and end with the same character.
()
What regular expressions would match words that…
\\
To match a special character, you need to escape it.
\\
Use \\
to escape the ?
– it is now read as a normal character.
Use the web app to test R regular expressions.
stringr
cheatsheet.I want to join two datasets that have a county
variable:
Practice
What stringr
function will help me join the county_pop
and county_loc
by county
?
What if I want to pull out only the area code in a phone number?
Practice
You will need a stringr
function and to use regular expressions!
What if I want just the numbers in the area code?
awards |
---|
Beyonce: 35G, 0A, 0E |
Kendrick Lamar: 22G, 0A, 1E |
Charli XCX: 2G, 0A, 0E |
Cynthia Erivo: 1G, 0A, 1E |
Viola Davis: 1G, 1A, 1E |
Elton John: 6G, 2A, 1E |
That’s annoying…
Create a variable with just the artist name and a variable with the number of Grammys won.
In this activity, you will use functions from the stringr
package and regex to decode a message.