AI & ML

mlS3: Streamlining Machine Learning Workflows with a Unified S3 Interface in R

Apr 12, 2026 5 min read views
[This article was first published on T. Moudiki's Webpage - R, and kindly contributed to R-bloggers]. (You can report issues about the content on this page here)
Want to share your content on R-bloggers? Click here if you have a blog, or here if you don't.

Overview

Building on the R6-based unifiedml package introduced last week, this post covers mlS3 — a new R package that provides a unified, consistent S3 interface for training and generating predictions from a range of popular machine learning models. Instead of adapting to the unique API conventions of each underlying package (you'll still want to consult their documentation for parameter details), mlS3 wraps them behind a shared wrap_*() / predict() pattern.

What You'll Learn

  • How to install and load mlS3 (currently available from GitHub)
  • How to apply a consistent API across multiple ML algorithms for both classification and regression tasks

Models Covered

Wrapper Underlying Package Task(s)
wrap_glmnet() glmnet — generalized linear models Classification, Regression
wrap_lightgbm() lightgbm — gradient boosting Classification, Regression
wrap_ranger() ranger — random forest Classification, Regression
wrap_svm() e1071 — support vector machines Classification, Regression
wrap_caret() caret200+ models Classification, Regression

Datasets Used

  • iris — binary and multiclass classification (setosa vs. versicolor; all three species)
  • mtcars — regression to predict miles per gallon (mpg)

Key Design Principle

Every model in mlS3 follows the same two-step workflow:

mod  <- wrap_*(X_train, y_train, ...)        # Train
pred <- predict(mod, newx = X_test, ...)     # Predict

This uniform interface makes it straightforward to swap algorithms and compare results without restructuring your pipeline.

Prerequisites

  • R with the following packages: remotes, caret, randomForest, ggplot2
  • mlS3 installed from GitHub via remotes::install_github("Techtonique/mlS3")

Code

Installing Packages

install.packages(c("remotes"))
install.packages(c("caret"))
install.packages(c("randomForest"))
remotes::install_github("Techtonique/mlS3")

Predefined Wrappers

# Classification
library(mlS3)
# =============================================================================
# Classification examples (no leakage)
# =============================================================================
set.seed(123)
# --- Binary classification: iris setosa vs versicolor ---
iris_bin <- iris[iris$Species != "virginica", ]
X_bin <- iris_bin[, 1:4]
y_bin <- droplevels(iris_bin$Species)
# Split into train/test
idx_bin <- sample(nrow(X_bin), 0.7 * nrow(X_bin))
X_bin_train <- X_bin[idx_bin, ]
y_bin_train <- y_bin[idx_bin]
X_bin_test  <- X_bin[-idx_bin, ]
y_bin_test  <- y_bin[-idx_bin]
# glmnet
mod <- wrap_glmnet(X_bin_train, y_bin_train, family = "binomial")
pred_bin_glmnet <- predict(mod, newx = X_bin_test, type = "class")
acc_glmnet <- mean(pred_bin_glmnet == y_bin_test)
cat("Accuracy (glmnet): ", acc_glmnet, "\n")
# --- Multiclass classification: iris all species ---
X_multi <- iris[, 1:4]
y_multi <- iris$Species
# Split into train/test
idx_multi <- sample(nrow(X_multi), 0.7 * nrow(X_multi))
X_multi_train <- X_multi[idx_multi, ]
y_multi_train <- y_multi[idx_multi]
X_multi_test  <- X_multi[-idx_multi, ]
y_multi_test  <- y_multi[-idx_multi]
# lightgbm
mod <- wrap_lightgbm(X_multi_train, y_multi_train,
                     params = list(objective = "multiclass",
                                   num_class = 3, verbose = -1),
                     nrounds = 150)
pred_multi_lightgbm <- predict(mod, newx = X_multi_test, type = "class")
acc_lightgbm <- mean(pred_multi_lightgbm == y_multi_test)
# ranger
mod <- wrap_ranger(X_multi_train, y_multi_train, num.trees = 100L)
pred_multi_ranger <- predict(mod, newx = X_multi_test, type = "class")
acc_ranger <- mean(pred_multi_ranger == y_multi_test)
# svm
mod <- wrap_svm(X_multi_train, y_multi_train, kernel = "radial")
pred_multi_svm <- predict(mod, newx = X_multi_test, type = "class")
acc_svm <- mean(pred_multi_svm == y_multi_test)
cat("Accuracy (lightgbm): ", acc_lightgbm, "\n")
cat("Accuracy (ranger):   ", acc_ranger, "\n")
cat("Accuracy (svm):      ", acc_svm, "\n")
# =============================================================================
# Regression examples (mtcars)
# =============================================================================
X_reg <- mtcars[, -1]
y_reg <- mtcars$mpg
# Split into train/test
set.seed(123)
idx_reg <- sample(nrow(X_reg), 0.7 * nrow(X_reg))
X_reg_train <- X_reg[idx_reg, ];  y_reg_train <- y_reg[idx_reg]
X_reg_test  <- X_reg[-idx_reg, ]; y_reg_test  <- y_reg[-idx_reg]
# lightgbm
mod <- wrap_lightgbm(X_reg_train, y_reg_train,
                     params = list(objective = "regression", verbose = -1),
                     nrounds = 50)
pred_reg_lightgbm <- predict(mod, newx = X_reg_test)
rmse_lightgbm <- sqrt(mean((pred_reg_lightgbm - y_reg_test)^2))
# glmnet
mod <- wrap_glmnet(X_reg_train, y_reg_train, alpha = 0)
pred_reg_glmnet <- predict(mod, newx = X_reg_test)
rmse_glmnet <- sqrt(mean((pred_reg_glmnet - y_reg_test)^2))
# svm
mod <- wrap_svm(X_reg_train, y_reg_train)
pred_reg_svm <- predict(mod, newx = X_reg_test)
rmse_svm <- sqrt(mean((pred_reg_svm - y_reg_test)^2))
# ranger
mod <- wrap_ranger(X_reg_train, y_reg_train, num.trees = 100L)
pred_reg_ranger <- predict(mod, newx = X_reg_test)
rmse_ranger <- sqrt(mean((pred_reg_ranger - y_reg_test)^2))
cat("RMSE (lightgbm): ", rmse_lightgbm, "\n")
cat("RMSE (glmnet):   ", rmse_glmnet, "\n")
cat("RMSE (svm):      ", rmse_svm, "\n")
cat("RMSE (ranger):   ", rmse_ranger, "\n")
Accuracy (glmnet):   1
Accuracy (lightgbm): 0.2444444 # I'm probably doing something wrong here
Accuracy (ranger):   0.9333333
Accuracy (svm):      0.9333333
RMSE (lightgbm): 4.713678
RMSE (glmnet):   2.972557
RMSE (svm):      2.275837
RMSE (ranger):   2.067692

The caret Wrapper

This section requires the caret and randomForest packages. The full list of models available through wrap_caret() can be found here: https://topepo.github.io/caret/available-models.html

library(mlS3)
library(caret)
# =============================================================================
# Regression with mtcars dataset
# =============================================================================
data(mtcars)
X_reg <- mtcars[, -1]  # All columns except mpg
y_reg <- mtcars$mpg    # Target variable
# Split into train/test
set.seed(123)
idx_reg <- sample(nrow(X_reg), 0.7 * nrow(X_reg))
X_reg_train <- X_reg[idx_reg, ]
y_reg_train <- y_reg[idx_reg]
X_reg_test  <- X_reg[-idx_reg, ]
y_reg_test  <- y_reg[-idx_reg]
# ----------------------------------------------------------------------------
# Example 1: Random Forest Regression
# ----------------------------------------------------------------------------
cat("\n=== Example 1: Random Forest Regression ===\n")
mod_rf <- wrap_caret(X_reg_train, y_reg_train,
                     method = "rf",
                     mtry = 3)  # Variables sampled at each split
print(mod_rf)
# Predictions and evaluation
pred_rf <- predict(mod_rf, newx = X_reg_test)
rmse_rf <- sqrt(mean((pred_rf - y_reg_test)^2))
r2_rf   <- 1 - sum((y_reg_test - pred_rf)^2) / sum((y_reg_test - mean(y_reg_test))^2)
cat("RMSE:", round(rmse_rf, 3), "\n")
cat("R-squared:", round(r2_rf, 3), "\n")
=== Example 1: Random Forest Regression ===
$model
Random Forest
22 samples
10 predictors
No pre-processing
Resampling: None
$task
[1] "regression"
$method
[1] "rf"
$parameters
$parameters$mtry
[1] 3
attr(,"class")
[1] "wrap_caret"
RMSE: 2.007
R-squared: 0.681
library(ggplot2)
df <- data.frame(
  pred   = pred_rf,
  actual = y_reg_test
)
ggplot(df, aes(x = pred, y = actual)) +
  geom_point() +
  geom_abline(slope = 1, intercept = 0, color = "red") +
  theme_minimal() +
  labs(x = "Predicted", y = "Actual")

image-title-here

To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? Click here if you have a blog, or here if you don't.
Continue reading: mlS3 — A Unified S3 Machine Learning Interface in R