Today we will…
They are coming along very nicely!
Final submission should be a polished report.
Think about the readability of the numbers you are presenting.
Include units on your plots including any transformations
Don’t display the raw R lm()
output
Even if a model is “good” according to our metric with the data we have, how do we know if it will still work well with other data?
If the model is overfit…
If the model is underfit …
If the model is neither over nor underfit …
Choose a value for \(k\)
Split data into \(k\) folds
For fold \(i\) from 1 … \(k\):
Average the \(k\) performance metrics across folds
0.Choose a value for \(k\)
We implemented 5-fold CV for the model of birth weight on gestation weeks with the NC births data…
The \(R^2\) from fitting the model on the full dataset was 0.449, so it appears the model is neither overfitting or underfitting.
Plotting geospacial data can uncover patterns that would be hard to determine through other analyses …
… It can also help make grouping of observations in your analysis clear!
R
to plot geospacial datamaps
/ mapdata
+ geom_polygon()
sf
You are implementing CV and animated plots in your project, so we’ll take this time to practice making nice maps with ggplot
!