July 2020

Posts

July 19, 2020 | R Blogs

[This article was first published on R Blogs, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on [Read more...]

Building A Neural Net from Scratch Using R – Part 1

July 19, 2020 | R Views

Akshaj is a budding deep learning researcher who loves to work with R. He has worked as a Research Associate at the Indian Institute of Science and as a Data Scientist at KPMG India. A lot of deep learning frameworks often abstract away the mechanics behind training a neural network. ...
[Read more...]

Le Monde puzzle [#1152]

July 19, 2020 | xi'an

The weekly puzzle from Le Monde is a tournament classic: An even number of teams play one another once a week with no tie allowed and have played all other teams. Four weeks into the tournament, A has won all its games, B,C, and D have won three games, ...
[Read more...]

Time series prediction with FNN-LSTM

July 18, 2020 | Sigrid Keydana

") training_loop(ds_train) test_batch % iter_next() encoded % round(5)) }
On to what we'll use as a baseline for comparison.

#### Vanilla LSTM

Here is the vanilla LSTM, stacking two layers, each, again, of size 32. Dropout and recurrent dropout were chosen individually
per dataset, as was the learning rate.



### Data preparation

For all experiments, data were prepared in the same way.

In every case, we used the first 10000 measurements available in the respective `.pkl` files [provided by Gilpin in his GitHub
repository](https://github.com/williamgilpin/fnn/tree/master/datasets). To save on file size and not depend on an external
data source, we extracted those first 10000 entries to `.csv` files downloadable directly from this blog's repo:



Should you want to access the complete time series (of considerably greater lengths), just download them from Gilpin's repo
and load them using `reticulate`:



Here is the data preparation code for the first dataset, `geyser` - all other datasets were treated the same way.



Now we're ready to look at how forecasting goes on our four datasets.

## Experiments

### Geyser dataset

People working with time series may have heard of [Old Faithful](https://en.wikipedia.org/wiki/Old_Faithful), a geyser in
Wyoming, US that has continually been erupting every 44 minutes to two hours since the year 2004. For the subset of data
Gilpin extracted[^3],

[^3]: see dataset descriptions in the [repository\'s README](https://github.com/williamgilpin/fnn)

> `geyser_train_test.pkl` corresponds to detrended temperature readings from the main runoff pool of the Old Faithful geyser
> in Yellowstone National Park, downloaded from the [GeyserTimes database](https://geysertimes.org/). Temperature measurements
> start on April 13, 2015 and occur in one-minute increments.

Like we said above, `geyser.csv` is a subset of these measurements, comprising the first 10000 data points. To choose an
adequate timestep for the LSTMs, we inspect the series at various resolutions:

<div class="figure">
<img src="images/geyser_ts.png" alt="Geyer dataset. Top: First 1000 observations. Bottom: Zooming in on the first 200." width="600" />
<p class="caption">(\#fig:unnamed-chunk-5)Geyer dataset. Top: First 1000 observations. Bottom: Zooming in on the first 200.</p>
</div>

It seems like the behavior is periodic with a period of about 40-50; a timestep of 60 thus seemed like a good try.

Having trained both FNN-LSTM and the vanilla LSTM for 200 epochs, we first inspect the variances of the latent variables on
the test set. The value of `fnn_multiplier` corresponding to this run was `0.7`.



```{}
   V1     V2        V3          V4       V5       V6       V7       V8       V9      V10
0.258 0.0262 0.0000627 0.000000600 0.000533 0.000362 0.000238 0.000121 0.000518 0.000365
There is a drop in importance between the first two variables and the rest; however, unlike in the Lorenz system, V1 and V2 variances also differ by an order of magnitude. Now, it’s interesting to compare prediction errors ...
[Read more...]

Riddler: Can You Beat MLB Recods?

July 18, 2020 | Posts | Joshua Cook

FiveThirtyEight’s Riddler Express link From Taylor Firman comes an opportunity to make baseball history: This year, Major League Baseball announced it will play a shortened 60-game season, as opposed to the typical 162-game season. Baseball is a sport of numbers and statistics, and so Taylor wondered about the impact ...
[Read more...]

drat 0.1.8: Minor test fix

July 18, 2020 | Thinking inside the box

A new version of drat arrived on CRAN today. This is a follow-up release to 0.1.7 from a week ago. It contains a quick follow-up by Felix Ernst to correct on of the tests which misbehaved under the old release of R still being tested at CRAN. drat s... [Read more...]

New Packages: GetDFPdata = GetDFPData2 + GetFREData

July 17, 2020 | R | msperlin

Back in 2017 I wrote the first version of package GetDFPData, along with a paper describing the code and providing an empirical application. However, maintaining the package over the years has been frustrating. The code is becoming increasingly complex, much due to the fact that it handles FRE and DFP data ... [Read more...]

SIMD Revisited

July 17, 2020 | HighlandR

SIMD data without maps - The Scottish Index of Multiple Deprivation updated for 2020 I have blogged about the SIMD previously. The last time was using data from 2016. Earlier this year, the data was refreshed, and my friend David Henderson ...
[Read more...]

SIMD Revisited

July 17, 2020 | HighlandR

SIMD data without maps - The Scottish Index of Multiple Deprivation updated for 2020 I have blogged about the SIMD previously. The last time was using data from 2016. Earlier this year, the data was refreshed, and my friend David Henderson ... [Read more...]

SIMD Revisited

July 17, 2020 | HighlandR

SIMD data without maps - The Scottish Index of Multiple Deprivation updated for 2020 I have blogged about the SIMD previously. The last time was using data from 2016. Earlier this year, the data was refreshed, and my friend David Henderson ...
[Read more...]

RcppArmadillo 0.9.900.2.0

July 17, 2020 | Thinking inside the box

Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and... [Read more...]

Free vtreat Tutorial Videos

July 17, 2020 | jmount

I would like to re-share links to our free vtreat data preparation system introduction videos, which show you what sort of machine learning problems vtreat can help you with. Python vtreat introduction video (PyData LA 2019), slides here. R vtreat introduction video (Why R? Foundation). The idea is: instead of […] [Read more...]

SIMD Revisited

July 16, 2020 | HighlandR

The Scottish Index of Multiple Deprivation updated for 2020 I have blogged about the SIMD previously. The last time was using data from 2016. Earlier this year, the data was refreshed, and my friend David Henderson was hot off the press wit...
[Read more...]
1 5 6 7 8 9 14

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)