R provides a very large number of packages with functions for fitting predictive models. While there is some consistency in how models are trained, there are many differences. For example, some models can only be trained using the matrix interface so cannot be used easily with a formula. All of this can be overwhelming for new users. It would be nice if there was a consistent interface to all models.
There are various packages that address this. Here is a small selection.
tidymodels (a collection of packages that share a common design philosophy with tidyverse and are designed to work together)
h2o.ai (in contrast to the other packages, this is a commercial product that supports the full machine learning workflow from development to deployment)
In this course, we will focus on tidymodels.
5.1 What is tidymodels?
The tidymodels package is developed by Max Kuhn who now works at RStudio / posit. It was first released in 2018 and is still under active development. It is an ecosystem of packages that share a common design philosophy and are designed to work together. The packages include
parsnip for model specification
recipes for data preprocessing
rsample for resampling
yardstick for model evaluation
tune for hyperparameter tuning
workflows for modeling workflows
tidyposterior for Bayesian modeling
The tidymodels packages are designed to work with the tidyverse and tidydata principles. The packages are designed to be modular and extensible. They are loaded using the command.