Building a Simple Neural Network in R with torch
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The torch
package brings deep learning to R by providing bindings to the popular PyTorch library. This comprehensive tutorial demonstrates how to build and train a simple neural network using torch
in R.
Installation
# install.packages("torch") library(torch) # torch::install_torch()
A Simple Neural Network
This section focuses on the creation of a neural network to perform a simple regression task.
1. Sample Data
# Set seed for reproducibility set.seed(42) # Generate training data: y = 3x + 2 + noise x <- torch_randn(100, 1) y <- 3 * x + 2 + torch_randn(100, 1) * 0.3 # Display the first few data points head( data.frame( x = as.numeric(x$squeeze()), y = as.numeric(y$squeeze()) ))
x y 1 -0.02329975 1.861628 2 1.92341769 7.555232 3 0.11041667 2.613283 4 -2.55959392 -5.931758 5 0.36482519 3.099005 6 0.97125226 4.551073
2. Neural Network Module
The next step involves defining the neural network architecture using torch
’s module system:
# Define a simple feedforward neural network nnet <- nn_module( initialize = function() { # Define layers self$layer1 <- nn_linear(1, 8) # Input layer to hidden layer (1 -> 8 neurons) self$layer2 <- nn_linear(8, 1) # Hidden layer to output layer (8 -> 1 neuron) }, forward = function(x) { # Define forward pass x %>% self$layer1() %>% # First linear transformation nnf_relu() %>% # ReLU activation function self$layer2() # Second linear transformation } ) # Instantiate the model model <- nnet() # Display model structure print(model)
An `nn_module` containing 25 parameters. ── Modules ───────────────────────────────────────────────────────────────────── • layer1: <nn_linear> #16 parameters • layer2: <nn_linear> #9 parameters
3. Set Up the Optimizer and Loss Function
The training process requires defining how the model will learn from the data:
# Set up optimizer (Adam optimizer with learning rate 0.02) optimizer <- optim_adam(model$parameters, lr = 0.02) # Define loss function (Mean Squared Error for regression) loss_fn <- nnf_mse_loss
4. Training Loop
The neural network training process proceeds as follows:
# Store loss values for plotting loss_history <- numeric(300) # Training loop for(epoch in 1:300) { # Set model to training mode model$train() # Reset gradients optimizer$zero_grad() # Forward pass y_pred <- model(x) # Calculate loss loss <- loss_fn(y_pred, y) # Backward pass loss$backward() # Update parameters optimizer$step() # Store loss for plotting loss_history[epoch] <- loss$item() }
5. Visualize the Training Progress
The following visualization demonstrates how the loss decreased during training:
# Create a data frame for plotting training_df <- data.frame( epoch = 1:300, loss = loss_history ) # Plot training loss ggplot(training_df, aes(x = epoch, y = loss)) + geom_line(color = "#2c3e50", size = 1) + labs( title = "Training Loss Over Time", subtitle = "Neural Network Learning Progress", x = "Epoch", y = "Mean Squared Error Loss" ) + theme_minimal() + theme( plot.title = element_text(size = 14, face = "bold"), plot.subtitle = element_text(size = 12, color = "gray60") )
6. Visualize the Results
The following analysis demonstrates how well the trained model performs:
# Set model to evaluation mode model$eval() # Generate predictions with_no_grad({ y_pred <- model(x) }) # Convert to R vectors for plotting x_np <- as.numeric(x$squeeze()) y_np <- as.numeric(y$squeeze()) y_pred_np <- as.numeric(y_pred$squeeze()) # Create data frame for ggplot plot_df <- data.frame( x = x_np, y_actual = y_np, y_predicted = y_pred_np ) # Create the plot ggplot(plot_df, aes(x = x)) + geom_point(aes(y = y_actual, color = "Actual"), alpha = 0.7, size = 2) + geom_point(aes(y = y_predicted, color = "Predicted"), alpha = 0.7, size = 2) + geom_smooth(aes(y = y_predicted), method = "loess", se = FALSE, color = "#e74c3c", linetype = "dashed") + labs( title = "Neural Network Regression Results", subtitle = "Comparing actual vs predicted values", x = "Input (x)", y = "Output (y)", color = "Data Type" ) + scale_color_manual(values = c("Actual" = "#3498db", "Predicted" = "#e74c3c")) + theme_minimal() + theme( plot.title = element_text(size = 14, face = "bold"), plot.subtitle = element_text(size = 12, color = "gray60"), legend.position = "top" )
7. Model Performance Analysis
The following analysis examines how well the model learned the underlying pattern:
# Calculate performance metrics mse <- mean((y_pred_np - y_np)^2) rmse <- sqrt(mse) mae <- mean(abs(y_pred_np - y_np)) r_squared <- cor(y_pred_np, y_np)^2 # Create performance summary performance_summary <- data.frame( Metric = c("Mean Squared Error", "Root Mean Squared Error", "Mean Absolute Error", "R-squared"), Value = c(mse, rmse, mae, r_squared) ) print(performance_summary)
Metric Value 1 Mean Squared Error 0.09061213 2 Root Mean Squared Error 0.30101848 3 Mean Absolute Error 0.23722124 4 R-squared 0.99000990
# Compare with true relationship (y = 3x + 2) # Generate predictions on a grid for comparison x_grid <- torch_linspace(-3, 3, 100)$unsqueeze(2) with_no_grad({ y_grid_pred <- model(x_grid) }) x_grid_np <- as.numeric(x_grid$squeeze()) y_grid_pred_np <- as.numeric(y_grid_pred$squeeze()) y_grid_true <- 3 * x_grid_np + 2 # Plot comparison comparison_df <- data.frame( x = x_grid_np, y_true = y_grid_true, y_predicted = y_grid_pred_np ) ggplot(comparison_df, aes(x = x)) + geom_line(aes(y = y_true, color = "True Function"), size = 2) + geom_line(aes(y = y_predicted, color = "Neural Network"), size = 2, linetype = "dashed") + geom_point(data = plot_df, aes(y = y_actual), alpha = 0.3, color = "gray50") + labs( title = "Neural Network vs True Function", subtitle = "Model learning assessment against the underlying pattern", x = "Input (x)", y = "Output (y)", color = "Function Type" ) + scale_color_manual(values = c("True Function" = "#2c3e50", "Neural Network" = "#e74c3c")) + theme_minimal() + theme( plot.title = element_text(size = 14, face = "bold"), plot.subtitle = element_text(size = 12, color = "gray60"), legend.position = "top" )
Understanding the Neural Network
The following examination reveals what the network learned by analyzing its parameters:
# Extract learned parameters layer1_weight <- as.matrix(model$layer1$weight$detach()) layer1_bias <- as.numeric(model$layer1$bias$detach()) layer2_weight <- as.matrix(model$layer2$weight$detach()) layer2_bias <- as.numeric(model$layer1$bias$detach()) cat("First layer (fc1) parameters:\n")
First layer (fc1) parameters:
cat("Weight matrix shape:", dim(layer1_weight), "\n")
Weight matrix shape: 8 1
cat("Bias vector length:", length(layer1_bias), "\n\n")
Bias vector length: 8
cat("Second layer (fc2) parameters:\n")
Second layer (fc2) parameters:
cat("Weight matrix shape:", dim(layer2_weight), "\n")
Weight matrix shape: 1 8
cat("Bias value:", layer2_bias, "\n\n")
Bias value: 0.701076 -0.8832566 -1.28852 0.4193589 0.8179439 -0.4608558 0.6640872 0.2222885
# Display first layer weights and biases cat("First layer weights:\n")
First layer weights:
print(round(layer1_weight, 4))
[,1] [1,] 1.2292 [2,] -2.0338 [3,] 0.3231 [4,] 1.4845 [5,] 1.2861 [6,] -0.0174 [7,] 0.1889 [8,] -0.5916
cat("\nFirst layer biases:\n")
First layer biases:
print(round(layer2_bias, 4))
[1] 0.7011 -0.8833 -1.2885 0.4194 0.8179 -0.4609 0.6641 0.2223
Experimenting with Different Architectures
The following section analyzes the simple network against different architectures:
# Define different network architectures create_network <- function(hidden_sizes) { nn_module( initialize = function(hidden_sizes) { self$layers <- nn_module_list() # Input layer prev_size <- 1 for(i in seq_along(hidden_sizes)) { self$layers$append(nn_linear(prev_size, hidden_sizes[i])) prev_size <- hidden_sizes[i] } # Output layer self$layers$append(nn_linear(prev_size, 1)) }, forward = function(x) { for(i in 1:(length(self$layers) - 1)) { x <- nnf_relu(self$layers[[i]](x)) } # No activation on output layer self$layers[[length(self$layers)]](x) } ) } # Train different architectures architectures <- list( "Simple (8)" = c(8), "Deep (16-8)" = c(16, 8), "Wide (32)" = c(32), "Very Deep (16-16-8)" = c(16, 16, 8) ) results <- list() for(arch_name in names(architectures)) { # Create and train model net_class <- create_network(architectures[[arch_name]]) model_temp <- net_class(architectures[[arch_name]]) optimizer_temp <- optim_adam(model_temp$parameters, lr = 0.01) # Quick training (fewer epochs for comparison) for(epoch in 1:200) { model_temp$train() optimizer_temp$zero_grad() y_pred_temp <- model_temp(x) loss_temp <- loss_fn(y_pred_temp, y) loss_temp$backward() optimizer_temp$step() } # Generate predictions model_temp$eval() with_no_grad({ y_pred_arch <- model_temp(x_grid) }) results[[arch_name]] <- data.frame( x = x_grid_np, y_pred = as.numeric(y_pred_arch$squeeze()), architecture = arch_name ) } # Combine results all_results <- do.call(rbind, results) # Plot comparison ggplot(all_results, aes(x = x, y = y_pred, color = architecture)) + geom_line(size = 1.2) + geom_line(data = comparison_df, aes(y = y_true, color = "True Function"), size = 2, linetype = "solid") + geom_point(data = plot_df, aes(x = x, y = y_actual), color = "gray50", alpha = 0.3, inherit.aes = FALSE) + labs( title = "Comparison of Different Neural Network Architectures", subtitle = "Effects of network depth and width on learning performance", x = "Input (x)", y = "Output (y)", color = "Architecture" ) + theme_minimal() + theme( plot.title = element_text(size = 14, face = "bold"), plot.subtitle = element_text(size = 12, color = "gray60"), legend.position = "top" )
Key Takeaways
- Simple Architecture: Even a simple 2-layer network can learn complex patterns effectively
- Training Process: The importance of proper training loops with gradient computation
- Visualization: Effective methods for visualizing both training progress and results
- Model Evaluation: Understanding model performance through multiple metrics
- Architecture Comparison: How different network structures affect learning capabilities
The torch
package provides a straightforward approach to building and experimenting with neural networks in R, bringing the power of deep learning to the R ecosystem. This approach can be extended to more complex datasets and deeper architectures as needed.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.