Machine Learning with Tidymodels

R for exploratory data analysis and statistical modeling

Author

Peter Gedeck

Published

April 13, 2026

Introduction

This book is an introduction to machine learning with R and the tidymodels ecosystem of packages. It is designed for readers who have some experience with R and want to learn how to use it for machine learning. The book is organized into parts that cover exploratory data analysis, training models, regression and classification models, model validation and tuning, unsupervised learning, and deeper dives into specific model types. Each chapter combines explanations of the underlying concepts with worked examples, and ends with a “Code” section that summarizes all the R code used in the chapter so you can reproduce the results yourself. A final “Examples” part pulls several of these ideas together into complete modeling workflows.

Tidyverse

The tidyverse is a collection of packages that share a common design philosophy and are designed to work together. Hadley Wickham outlined the principles of the tidyverse in 2014 in the Tidy Data paper published in the Journal of Statistical Software 59(10), 1–23.

To load the tidyverse, use the following command:

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

You will see that this loads a number of packages. The most important ones are:

  • ggplot2 for plotting
  • dplyr for data manipulation
  • readr for data import
  • tibble for improved data frames
  • tidyr for getting data into tidy form
  • purrr for functional programming
  • stringr for string manipulation
  • forcats for categorical/factor data

Tidymodels

The tidymodels package was first released in 2018 and is still under active development and maintained by the company Posit as an open source project. It is an ecosystem of packages that share a common design philosophy and are designed to work together. The packages include

  • parsnip for model specification
  • recipes for data preprocessing
  • rsample for resampling
  • yardstick for model evaluation
  • tune for hyperparameter tuning
  • workflows for modeling workflows
  • tidyposterior for Bayesian modeling
  • probably for postprocessing of predictions (e.g. thresholding for binary classification models)
  • tailor for postprocessing of predictions as part of workflows

The tidymodels packages are designed to work with the tidyverse and tidydata principles. The packages are designed to be modular and extensible.

Getting Help

  • A good source of basic data analysis using R is found in the free book R for Data Science (2e) by Wickham et al. (Wickham et al. 2023).
  • Web search, especially stackoverflow.com and stats.stackexchange.com
  • Troubleshooting/Debugging.
    • Check one line of code at a time.
    • Google your error message
    • Use scripts

RStudio

  • Install R and RStudio
  • Make use of Projects in RStudio