Machine Learning with Tidymodels

R for exploratory data analysis and statistical modeling

Author

Peter Gedeck

Published

January 26, 2026

Introduction

In the DS-6030 Statistical Learning course, we will use

the tidyverse packages for data loading and processing
the tidymodels packages for model building and validation

Compared to the classical base-R packages covered in An Introduction to Statistical Learning (James et al. 2021), these packages offer many advantages that will make working with data easier and more streamlined.

Tidyverse

The tidyverse is a collection of packages that share a common design philosophy and are designed to work together. Hadley Wickham outlined the principles of the tidyverse in 2014 in the Tidy Data paper published in the Journal of Statistical Software 59(10), 1–23.

To load the tidyverse, use the following command:

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

You will see that this loads a number of packages. The most important ones are:

ggplot2 for plotting
dplyr for data manipulation
readr for data import
tibble for improved data frames
tidyr for getting data into tidy form
purrr for functional programming
stringr for string manipulation
forcats for categorical/factor data

Tidymodels

The tidymodels package was first released in 2018 and is still under active development and maintained by the company Posit as an open source project. It is an ecosystem of packages that share a common design philosophy and are designed to work together. The packages include

parsnip for model specification
recipes for data preprocessing
rsample for resampling
yardstick for model evaluation
tune for hyperparameter tuning
workflows for modeling workflows
tidyposterior for Bayesian modeling

The tidymodels packages are designed to work with the tidyverse and tidydata principles. The packages are designed to be modular and extensible.

Getting Help

A good source of basic data analysis using R is found in the free book R for Data Science (2e) by Wickham et al. (Wickham, Çetinkaya-Rundel, and Grolemund 2023).
Web search, especially stackoverflow.com and stats.stackexchange.com
Troubleshooting/Debugging.
- Check one line of code at a time.
- Google your error message
- Use scripts

RStudio

Install R and RStudio
Make use of Projects in RStudio