Keras for R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We are excited to announce that the keras package is now available on CRAN. The package provides an R interface to Keras, a high-level neural networks API developed with a focus on enabling fast experimentation. Keras has the following key features:
Allows the same code to run on CPU or on GPU, seamlessly.
User-friendly API which makes it easy to quickly prototype deep learning models.
Built-in support for convolutional networks (for computer vision), recurrent networks (for sequence processing), and any combination of both.
Supports arbitrary network architectures: multi-input or multi-output models, layer sharing, model sharing, etc. This means that Keras is appropriate for building essentially any deep learning model, from a memory network to a neural Turing machine.
Is capable of running on top of multiple back-ends including TensorFlow, CNTK, or Theano.
If you are already familiar with Keras and want to jump right in, check out https://tensorflow.rstudio.com/keras which has everything you need to get started including over 20 complete examples to learn from.
To learn a bit more about Keras and why we’re so excited to announce the Keras interface for R, read on!
Keras and Deep Learning
Interest in deep learning has been accelerating rapidly over the past few years, and several deep learning frameworks have emerged over the same time frame. Of all the available frameworks, Keras has stood out for its productivity, flexibility and user-friendly API. At the same time, TensorFlow has emerged as a next-generation machine learning platform that is both extremely flexible and well-suited to production deployment.
Not surprisingly, Keras and TensorFlow have of late been pulling away from other deep learning frameworks:
Google web search interest around deep learning frameworks over time. If you remember Q4 2015 and Q1-2 2016 as confusing, you weren’t alone. pic.twitter.com/1f1VQVGr8n
— François Chollet (@fchollet) June 3, 2017
The good news about Keras and TensorFlow is that you don’t need to choose between them! The default backend for Keras is TensorFlow and Keras can be integrated seamlessly with TensorFlow workflows. There is also a pure-TensorFlow implementation of Keras with deeper integration on the roadmap for later this year.
Keras and TensorFlow are the state of the art in deep learning tools and with the keras package you can now access both with a fluent R interface.
Getting Started
Installation
To begin, install the keras R package from CRAN as follows:
install.packages("keras")
The Keras R interface uses the TensorFlow backend engine by default. To install both the core Keras library as well as the TensorFlow backend use the install_keras()
function:
library(keras) install_keras()
This will provide you with default CPU-based installations of Keras and TensorFlow. If you want a more customized installation, e.g. if you want to take advantage of NVIDIA GPUs, see the documentation for install_keras()
.
MNIST Example
We can learn the basics of Keras by walking through a simple example: recognizing handwritten digits from the MNIST dataset. MNIST consists of 28 x 28 grayscale images of handwritten digits like these:
The dataset also includes labels for each image, telling us which digit it is. For example, the labels for the above images are 5, 0, 4, and 1.
Preparing the Data
The MNIST dataset is included with Keras and can be accessed using the dataset_mnist()
function. Here we load the dataset then create variables for our test and training data:
library(keras) mnist <- dataset_mnist() x_train <- mnist$train$x y_train <- mnist$train$y x_test <- mnist$test$x y_test <- mnist$test$y
The x
data is a 3-d array (images,width,height)
of grayscale values. To prepare the data for training we convert the 3-d arrays into matrices by reshaping width and height into a single dimension (28x28 images are flattened into length 784 vectors). Then, we convert the grayscale values from integers ranging between 0 to 255 into floating point values ranging between 0 and 1:
# reshape dim(x_train) <- c(nrow(x_train), 784) dim(x_test) <- c(nrow(x_test), 784) # rescale x_train <- x_train / 255 x_test <- x_test / 255
The y
data is an integer vector with values ranging from 0 to 9. To prepare this data for training we one-hot encode the vectors into binary class matrices using the Keras to_categorical()
function:
y_train <- to_categorical(y_train, 10) y_test <- to_categorical(y_test, 10)
Defining the Model
The core data structure of Keras is a model, a way to organize layers. The simplest type of model is the sequential model, a linear stack of layers.
We begin by creating a sequential model and then adding layers using the pipe (%>%
) operator:
model <- keras_model_sequential() model %>% layer_dense(units = 256, activation = "relu", input_shape = c(784)) %>% layer_dropout(rate = 0.4) %>% layer_dense(units = 128, activation = "relu") %>% layer_dropout(rate = 0.3) %>% layer_dense(units = 10, activation = "softmax")
The input_shape
argument to the first layer specifies the shape of the input data (a length 784 numeric vector representing a grayscale image). The final layer outputs a length 10 numeric vector (probabilities for each digit) using a softmax activation function.
Use the summary()
function to print the details of the model:
summary(model) Model ________________________________________________________________________________ Layer (type) Output Shape Param # ================================================================================ dense_1 (Dense) (None, 256) 200960 ________________________________________________________________________________ dropout_1 (Dropout) (None, 256) 0 ________________________________________________________________________________ dense_2 (Dense) (None, 128) 32896 ________________________________________________________________________________ dropout_2 (Dropout) (None, 128) 0 ________________________________________________________________________________ dense_3 (Dense) (None, 10) 1290 ================================================================================ Total params: 235,146 Trainable params: 235,146 Non-trainable params: 0 ________________________________________________________________________________
Next, compile the model with appropriate loss function, optimizer, and metrics:
model %>% compile( loss = "categorical_crossentropy", optimizer = optimizer_rmsprop(), metrics = c("accuracy") )
Training and Evaluation
Use the fit()
function to train the model for 30 epochs using batches of 128 images:
history <- model %>% fit( x_train, y_train, epochs = 30, batch_size = 128, validation_split = 0.2 )
The history
object returned by fit()
includes loss and accuracy metrics which we can plot:
plot(history)
Evaluate the model’s performance on the test data:
model %>% evaluate(x_test, y_test,verbose = 0) $loss [1] 0.1149 $acc [1] 0.9807
Generate predictions on new data:
model %>% predict_classes(x_test) [1] 7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 [40] 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 [79] 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6 9 [ reached getOption("max.print") -- omitted 9900 entries ]
Keras provides a vocabulary for building deep learning models that is simple, elegant, and intuitive. Building a question answering system, an image classification model, a neural Turing machine, or any other model is just as straightforward.
The Guide to the Sequential Model article describes the basics of Keras sequential models in more depth.
Examples
Over 20 complete examples are available (special thanks to [@dfalbel](https://github.com/dfalbel) for his work on these!). The examples cover image classification, text generation with stacked LSTMs, question-answering with memory networks, transfer learning, variational encoding, and more.
Example | Description |
---|---|
addition_rnn | Implementation of sequence to sequence learning for performing addition of two numbers (as strings). |
babi_memnn | Trains a memory network on the bAbI dataset for reading comprehension. |
babi_rnn | Trains a two-branch recurrent network on the bAbI dataset for reading comprehension. |
cifar10_cnn | Trains a simple deep CNN on the CIFAR10 small images dataset. |
conv_lstm | Demonstrates the use of a convolutional LSTM network. |
deep_dream | Deep Dreams in Keras. |
imdb_bidirectional_lstm | Trains a Bidirectional LSTM on the IMDB sentiment classification task. |
imdb_cnn | Demonstrates the use of Convolution1D for text classification. |
imdb_cnn_lstm | Trains a convolutional stack followed by a recurrent stack network on the IMDB sentiment classification task. |
imdb_fasttext | Trains a FastText model on the IMDB sentiment classification task. |
imdb_lstm | Trains a LSTM on the IMDB sentiment classification task. |
lstm_text_generation | Generates text from Nietzsche’s writings. |
mnist_acgan | Implementation of AC-GAN (Auxiliary Classifier GAN ) on the MNIST dataset |
mnist_antirectifier | Demonstrates how to write custom layers for Keras |
mnist_cnn | Trains a simple convnet on the MNIST dataset. |
mnist_irnn | Reproduction of the IRNN experiment with pixel-by-pixel sequential MNIST in “A Simple Way to Initialize Recurrent Networks of Rectified Linear Units” by Le et al. |
mnist_mlp | Trains a simple deep multi-layer perceptron on the MNIST dataset. |
mnist_hierarchical_rnn | Trains a Hierarchical RNN (HRNN) to classify MNIST digits. |
mnist_transfer_cnn | Transfer learning toy example. |
neural_style_transfer | Neural style transfer (generating an image with the same “content” as a base image, but with the “style” of a different picture). |
reuters_mlp | Trains and evaluates a simple MLP on the Reuters newswire topic classification task. |
stateful_lstm | Demonstrates how to use stateful RNNs to model long sequences efficiently. |
variational_autoencoder | Demonstrates how to build a variational autoencoder. |
variational_autoencoder_deconv | Demonstrates how to build a variational autoencoder with Keras using deconvolution layers. |
Learning More
After you’ve become familiar with the basics, these articles are a good next step:
Guide to the Sequential Model. The sequential model is a linear stack of layers and is the API most users should start with.
Guide to the Functional API. The Keras functional API is the way to go for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers.
Training Visualization. There are a wide variety of tools available for visualizing training. These include plotting of training metrics, real time display of metrics within the RStudio IDE, and integration with the TensorBoard visualization tool included with TensorFlow.
Using Pre-Trained Models. Keras includes a number of deep learning models (Xception, VGG16, VGG19, ResNet50, InceptionVV3, and MobileNet) that are made available alongside pre-trained weights. These models can be used for prediction, feature extraction, and fine-tuning.
Frequently Asked Questions. Covers many additional topics including streaming training data, saving models, training on GPUs, and more.
Keras provides a productive, highly flexible framework for developing deep learning models. We can’t wait to see what the R community will do with these tools!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.