Chapter 5 Training predictive models using tidymodels
R provides a very large number of packages with functions for fitting predictive models. While there is some consistency in how models are trained, there are many differences. For example, some models can only be trained using the matrix interface so cannot be used easily with a formula. All of this can be overwhelming for new users. It would be nice if there was a consistent interface to all models.
There are various packages that address this. Here is a small selection.
tidymodels
(a collection of packages that share a common design philosophy with tidyverse and are designed to work together)caret
(Classification And REgression Training)modelr
(part of the tidyverse, but less powerful than tidymodelsmlr3
h2o.ai
(in contrast to the other packages, this is a commercial product that supports the full machine learning workflow from development to deployment)
In this course, we will focus on tidymodels.
5.1 What is tidymodels?
The tidymodels package is developed by Max Kuhn who now works at RStudio / posit. It was first released in 2018 and is still under active development. It is an ecosystem of packages that share a common design philosophy and are designed to work together. The packages include
parsnip
for model specificationrecipes
for data preprocessingrsample
for resamplingyardstick
for model evaluationtune
for hyperparameter tuningworkflows
for modeling workflowstidyposterior
for Bayesian modeling
The tidymodels packages are designed to work with the tidyverse
and tidydata
principles. The packages are designed to be modular and extensible. They are loaded using the command.
## ── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────── tidymodels 1.2.0 ──
## ✔ broom 1.0.5 ✔ recipes 1.0.10
## ✔ dials 1.2.1 ✔ rsample 1.2.1
## ✔ dplyr 1.1.4 ✔ tibble 3.2.1
## ✔ ggplot2 3.5.1 ✔ tidyr 1.3.1
## ✔ infer 1.0.7 ✔ tune 1.2.1
## ✔ modeldata 1.3.0 ✔ workflows 1.1.4
## ✔ parsnip 1.2.1 ✔ workflowsets 1.1.0
## ✔ purrr 1.0.2 ✔ yardstick 1.3.1
## ── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
## ✖ purrr::discard() masks scales::discard()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ recipes::step() masks stats::step()
## • Dig deeper into tidy modeling with R at https://www.tmwr.org
The packages were developed with the aim to making it easy to follow best practices.
Further information:
- Go to https://www.tidymodels.org/ to learn more about tidymodels
- Links to all tidymodels packages https://www.tidymodels.org/packages/
- Example code for all supported models https://parsnip.tidymodels.org/articles/Examples.html
- Tidy modeling with R by Max Kuhn and Julia Silge
- An overview of all packages in the tidymodels ecosystem can be found at https://www.tidymodels.org/packages/