5  Training predictive models

R provides a very large number of packages with functions for fitting predictive models. While there is some consistency in how models are trained, there are many differences. For example, some models can only be trained using the matrix interface so cannot be used easily with a formula. All of this can be overwhelming for new users. It would be nice if there was a consistent interface to all models.

There are various packages that address this. Here is a small selection.

In this course, we will focus on tidymodels.

5.1 What is tidymodels?

The tidymodels package is developed by Max Kuhn who now works at RStudio / posit. It was first released in 2018 and is still under active development. It is an ecosystem of packages that share a common design philosophy and are designed to work together. The packages include

  • parsnip for model specification
  • recipes for data preprocessing
  • rsample for resampling
  • yardstick for model evaluation
  • tune for hyperparameter tuning
  • workflows for modeling workflows
  • tidyposterior for Bayesian modeling

The tidymodels packages are designed to work with the tidyverse and tidydata principles. The packages are designed to be modular and extensible. They are loaded using the command.

library(tidymodels)
── Attaching packages ────────────────────────────────────── tidymodels 1.3.0 ──
✔ broom        1.0.8     ✔ recipes      1.3.1
✔ dials        1.4.0     ✔ rsample      1.3.0
✔ dplyr        1.1.4     ✔ tibble       3.3.0
✔ ggplot2      3.5.2     ✔ tidyr        1.3.1
✔ infer        1.0.8     ✔ tune         1.3.0
✔ modeldata    1.4.0     ✔ workflows    1.2.0
✔ parsnip      1.3.1     ✔ workflowsets 1.1.0
✔ purrr        1.0.4     ✔ yardstick    1.3.2
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ purrr::discard() masks scales::discard()
✖ dplyr::filter()  masks stats::filter()
✖ dplyr::lag()     masks stats::lag()
✖ recipes::step()  masks stats::step()

The packages were developed with the aim to making it easy to follow best practices.

NoteFurther information

Code

The code of this chapter is summarized here.

Show the code
knitr::opts_chunk$set(echo = TRUE, cache = TRUE, autodep = TRUE,
  fig.align = "center")
library(tidymodels)