Machine Learning with Tidymodels

R for exploratory data analysis and statistical modeling

Author

Peter Gedeck

Published

January 26, 2026

Introduction

In the DS-6030 Statistical Learning course, we will use

  • the tidyverse packages for data loading and processing
  • the tidymodels packages for model building and validation

Compared to the classical base-R packages covered in An Introduction to Statistical Learning (James et al. 2021), these packages offer many advantages that will make working with data easier and more streamlined.

Tidyverse

The tidyverse is a collection of packages that share a common design philosophy and are designed to work together. Hadley Wickham outlined the principles of the tidyverse in 2014 in the Tidy Data paper published in the Journal of Statistical Software 59(10), 1–23.

To load the tidyverse, use the following command:

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

You will see that this loads a number of packages. The most important ones are:

  • ggplot2 for plotting
  • dplyr for data manipulation
  • readr for data import
  • tibble for improved data frames
  • tidyr for getting data into tidy form
  • purrr for functional programming
  • stringr for string manipulation
  • forcats for categorical/factor data

Tidymodels

The tidymodels package was first released in 2018 and is still under active development and maintained by the company Posit as an open source project. It is an ecosystem of packages that share a common design philosophy and are designed to work together. The packages include

  • parsnip for model specification
  • recipes for data preprocessing
  • rsample for resampling
  • yardstick for model evaluation
  • tune for hyperparameter tuning
  • workflows for modeling workflows
  • tidyposterior for Bayesian modeling

The tidymodels packages are designed to work with the tidyverse and tidydata principles. The packages are designed to be modular and extensible.

Getting Help

RStudio

  • Install R and RStudio
  • Make use of Projects in RStudio