[This article was first published on r – Appsilon Data Science | End­ to­ End Data Science Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Many news reports scare us with machines taking over our jobs in the not too distant future. Common examples of take-over targets include professions like truck drivers, lawyers and accountants. In this article we will explore how far machines are from replacing us (R programmers) in writing Shiny code. Spoiler alert: you should not be worried about your obsolescence right now. You will see in a minute that we’re not quite there yet. I’m just hoping to show you in an entertaining way some easy applications of a simple model of a recurrent neural network implemented in an R version of Keras

Let’s formulate our problem once again precisely: we want to generate Shiny code character by character with a neural network.

## Background

To achieve that we would need a recurrent neural network (RNN). By definition such a network does a pretty good job with time series. Right now you might be asking yourself, what?  We defined our problem as a text mining issue; where is temporal dependency here?! Well, imagine a programmer typing characters on his/her keyboard, one by one, every time step. It would also be nice if our network captured long-range dependencies such as, for instance, a curly bracket in the 1021st line of code that can refer to a “for” loop from  line 352 (that would be a long loop though). Fortunately, RNNs are perfect for that because they can (in theory) memorize the influence of a signal from the distant past to a present data sample.

I will not get into details on how recurrent neural networks work here, as I believe that there are a lot of fantastic resources online elsewhere. Let me just briefly mention that some of the regular recurrent networks suffer from a vanishing gradient problem. As a result, networks with such architectures are notoriously difficult to train. That’s why machine learning researchers started looking for more robust solutions. These are provided by a gating mechanism that helps to teach a network long-term dependencies.

The first such solution was introduced in 1997 as a Long Short Term Memory neuron (LSTM). It consists of three gates: input, forget and output, that together prevent the gradient from vanishing in further time steps. A simplified version of LSTM that still achieves good performance is the Gated Recurrent Unit (GRU) introduced in 2014. In this solution, forget and input gates are merged into one update gate. In our implementation we will use a layer of GRU units.

Most of my code relies on an excellent example from Chapter 8 in Deep Learning with R by François Chollet. I recommend this book wholeheartedly to everyone interested in practical basics of neural networks. Since I think that François can explain to you his implementation better than I could , I’ll just leave you with it and get to the part I modified or added.

## Experiment

Before we get to the model, we need some training data. As we don’t want to generate just  any code, but specifically Shiny code , we need to find enough training samples. For that, I scraped the data mainly from this official shiny examples repository and added some of our semantic examples. As a result I generated 1300 lines of Shiny code.

Second, I played with  several network architectures and looked for a balance between speed of training, accuracy and model complexity. After some experiments, I found a suitable network for our purposes:

(BTW If you want to find out more about Keras in R, I invite you to take a look at a nice introduction by Michał).

I trained the above model for 50 epochs with a learning rate of 0.02. I experimented with different values of a temperature parameter too. Temperature is used to control the randomness of a prediction by scaling the logits (output of a last layer) before applying the softmax function. To illustrate, let’s  have a look at the output of the network predictions with temperature = 0.07.

and with temperature = 1:

I think that both examples are already quite impressive, given the limited training data we had. In the first case, the network is more confident about its choices but also quite prone to repetitions (many spaces follow spaces, letters follow letters and so on). The latter, from a long, loooong distance looks way closer to Shiny code. Obviously, it’s still gibberish, but look! There is a nice function call heag(heig= x(input$obr)), object property input$obr, comment # goith and even variable assignment filectinput <- ren({. Isn’t that cool?

Let’s have a look now at the evolution of training after 5 epochs:

As you can see, after each training the generated text becomes increasingly structured.

## Final Thoughts

I appreciate that some of you might not be as impressed as I was. Frankly speaking, I almost hear all of these Shiny programmers saying: “Phew… my job is secure then!” Yeah, yeah, sure it is… For now! Remember that these models will probably improve over time. I  challenge you to play with different architectures and train some better models based on this example.

And for completeness, here’s the code I used to generate the fake Shiny code above:

You can find me on Twitter @doktaox

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.