Today we will…
This week, we’re writing functions that take a data frame and variable names as arguments.
These functions can be incredibly powerful, but they require us to learn some interesting details about how some of the functions we’ve grown very accustomed to (e.g., select(), mutate(), group_by()) work “behind the scenes.”
We want to take in a vector of numbers and standardize it. One form of standardization is ensuring that the mean is 0 and standard deviation is 1.
Is it a good idea to standardize (scale) variables in a data analysis?
Why standardize?
Why not standardize?
dplyrLet’s standardize penguin measurements.
That’s nice, but our function must be combined with mutate() now to work…
dplyr function?Note
I used the existing function std_vec() inside the new function for clarity!
Functions using unquoted variable names as arguments are said to use nonstandard evaluation or tidy evaluation.
tidy evaluation is not supported in writing your own functions
Don’t use tidy evaluation in your own functions.
rlangUse the rlang package!
tidyverse pipelines.
rlangThe tidyverse functions use either “tidy selection” or “data masking.” Both of these features makes common tasks easier at the cost of making less commons tasks harder.
Blurs the line between the two different meanings of the word “variable”:
env-variables – “programming” variables that live in an environment
<-.data-variables — “statistical” variables that live in a data frame.
In the case of our function, the name of the columns we want to use are stored in an intermediate variable (e.g., var = bill_length_mm).
If you want to create a data-variable where the name is a user-provided function, argument, you need to use the the walrus operator (:=)
Error in `mutate()`:
ℹ In argument: `var = std_vec(var)`.
Caused by error:
! object 'body_mass_g' not found
mutate() doesn’t know what body_mass_g is.var so that mutate() knows to look for body_mass_g as a data-variableUse the embrace operator:
# A tibble: 5 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <dbl>
1 Adelie Torgersen 39.1 18.7 181 -0.563
2 Adelie Torgersen 39.5 17.4 186 -0.501
3 Adelie Torgersen 40.3 18 195 -1.19
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 -0.937
# ℹ 2 more variables: sex <fct>, year <int>
You will write a tidy function for a contingency table like the table() function in base R.
Allison Horst
During your collaboration, your group will alternate between three roles:
Starting Roles Today
The person who lives closest to campus starts as the coder, second as the project manager.