July 2020

Posts

July 19, 2020 | R Blogs

[This article was first published on R Blogs, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on [Read more...]

Building A Neural Net from Scratch Using R – Part 1

July 19, 2020 | R Views

Akshaj is a budding deep learning researcher who loves to work with R. He has worked as a Research Associate at the Indian Institute of Science and as a Data Scientist at KPMG India. A lot of deep learning frameworks often abstract away the mechanics behind training a neural network. ...

[Read more...]

Le Monde puzzle [#1152]

July 19, 2020 | xi'an

The weekly puzzle from Le Monde is a tournament classic: An even number of teams play one another once a week with no tie allowed and have played all other teams. Four weeks into the tournament, A has won all its games, B,C, and D have won three games, ...

[Read more...]

How To Create a Pitching Spray Chart with RStudio

July 19, 2020 | Brad Congelio

I will be honest: this example is a little “backward” because I am graphing every pitch faced by a batter over the course of a segmented season. However, the general coding remains the same: you would simply replace the batter’s name with the pitcher’s name you want to ...

[Read more...]

RMarkdown + Github +Hugo + Blogdown + Netlify: Develope your own blogging site

July 18, 2020 | R Stories

The Plan If you are interested in building your own website where you can sometimes blog and link your blogs with other famous blogging sites such as R-bloggers and PYTHON-BLOGGERS, you will be very familiar with Blogdown, Hugo Themes, Github and Net... [Read more...]

Time series prediction with FNN-LSTM

July 18, 2020 | Sigrid Keydana

") training_loop(ds_train) test_batch % iter_next() encoded % round(5)) }

On to what we'll use as a baseline for comparison.

#### Vanilla LSTM

Here is the vanilla LSTM, stacking two layers, each, again, of size 32. Dropout and recurrent dropout were chosen individually
per dataset, as was the learning rate.



### Data preparation

For all experiments, data were prepared in the same way.

In every case, we used the first 10000 measurements available in the respective `.pkl` files [provided by Gilpin in his GitHub
repository](https://github.com/williamgilpin/fnn/tree/master/datasets). To save on file size and not depend on an external
data source, we extracted those first 10000 entries to `.csv` files downloadable directly from this blog's repo:



Should you want to access the complete time series (of considerably greater lengths), just download them from Gilpin's repo
and load them using `reticulate`:



Here is the data preparation code for the first dataset, `geyser` - all other datasets were treated the same way.



Now we're ready to look at how forecasting goes on our four datasets.

## Experiments

### Geyser dataset

People working with time series may have heard of [Old Faithful](https://en.wikipedia.org/wiki/Old_Faithful), a geyser in
Wyoming, US that has continually been erupting every 44 minutes to two hours since the year 2004. For the subset of data
Gilpin extracted[^3],

[^3]: see dataset descriptions in the [repository\'s README](https://github.com/williamgilpin/fnn)

> `geyser_train_test.pkl` corresponds to detrended temperature readings from the main runoff pool of the Old Faithful geyser
> in Yellowstone National Park, downloaded from the [GeyserTimes database](https://geysertimes.org/). Temperature measurements
> start on April 13, 2015 and occur in one-minute increments.

Like we said above, `geyser.csv` is a subset of these measurements, comprising the first 10000 data points. To choose an
adequate timestep for the LSTMs, we inspect the series at various resolutions:

<div class="figure">
<img src="images/geyser_ts.png" alt="Geyer dataset. Top: First 1000 observations. Bottom: Zooming in on the first 200." width="600" />
<p class="caption">(\#fig:unnamed-chunk-5)Geyer dataset. Top: First 1000 observations. Bottom: Zooming in on the first 200.</p>
</div>

It seems like the behavior is periodic with a period of about 40-50; a timestep of 60 thus seemed like a good try.

Having trained both FNN-LSTM and the vanilla LSTM for 200 epochs, we first inspect the variances of the latent variables on
the test set. The value of `fnn_multiplier` corresponding to this run was `0.7`.



```{}
   V1     V2        V3          V4       V5       V6       V7       V8       V9      V10
0.258 0.0262 0.0000627 0.000000600 0.000533 0.000362 0.000238 0.000121 0.000518 0.000365

There is a drop in importance between the first two variables and the rest; however, unlike in the Lorenz system, V1 and V2 variances also differ by an order of magnitude. Now, it’s interesting to compare prediction errors ...

[Read more...]

Riddler: Can You Beat MLB Recods?

July 18, 2020 | Posts | Joshua Cook

FiveThirtyEight’s Riddler Express link From Taylor Firman comes an opportunity to make baseball history: This year, Major League Baseball announced it will play a shortened 60-game season, as opposed to the typical 162-game season. Baseball is a sport of numbers and statistics, and so Taylor wondered about the impact ...

[Read more...]

tint 0.1.3: Fixes for html mode, new demo

July 18, 2020 | Thinking inside the box

A new version 0.1.3 of the tint package arrived at CRAN today. It corrects some features for html output, notably margin notes and references. It also contains a new example for inline references. The full list of changes is below. Changes in tint v... [Read more...]

drat 0.1.8: Minor test fix

July 18, 2020 | Thinking inside the box

A new version of drat arrived on CRAN today. This is a follow-up release to 0.1.7 from a week ago. It contains a quick follow-up by Felix Ernst to correct on of the tests which misbehaved under the old release of R still being tested at CRAN. drat s... [Read more...]

New Packages: GetDFPdata = GetDFPData2 + GetFREData

July 17, 2020 | R | msperlin

Back in 2017 I wrote the first version of package GetDFPData, along with a paper describing the code and providing an empirical application. However, maintaining the package over the years has been frustrating. The code is becoming increasingly complex, much due to the fact that it handles FRE and DFP data ... [Read more...]

SIMD Revisited

July 17, 2020 | HighlandR

SIMD data without maps - The Scottish Index of Multiple Deprivation updated for 2020 I have blogged about the SIMD previously. The last time was using data from 2016. Earlier this year, the data was refreshed, and my friend David Henderson ...

[Read more...]

Export WordPress to Hugo RMarkdown or Org Mode with R

July 17, 2020 | Peter Prevos

I started my first website in 1996 with hand-written HTML. That became a bit of a chore, so about fifteen years, WordPress became my friend. I recently returned to a static website using Hugo. I tried the WordPress to Hugo exporter, but a lot of HTML artefacts were left in the ...

[Read more...]

SIMD Revisited

July 17, 2020 | HighlandR

SIMD Revisited

July 17, 2020 | HighlandR

[Read more...]

RcppArmadillo 0.9.900.2.0

July 17, 2020 | Thinking inside the box

Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and... [Read more...]

Free vtreat Tutorial Videos

July 17, 2020 | jmount

I would like to re-share links to our free vtreat data preparation system introduction videos, which show you what sort of machine learning problems vtreat can help you with. Python vtreat introduction video (PyData LA 2019), slides here. R vtreat introduction video (Why R? Foundation). The idea is: instead of […] [Read more...]

Estimating Covid-19 reproduction number with delays and right-truncation by @ellis2013nz

July 17, 2020 | free range statistics - R

This great preprint recently came out from a team of Katelyn Gostic and others. It uses simulations to test various methods of estimating the effective reproduction number . If you are following the Covid-19 pandemic from a data angle at all, you will no doubt have come across the effective reproduction ... [Read more...]

Creating custom neural networks with nnlib2Rcpp

July 17, 2020 | Vasilis Nikolaidis

For anyone interested, this is a post about creating arbitrary, new, or custom Neural Networks (NN) using the nnlib2Rcpp R package I apologize for the bare format of this post, but for 3 days now I had issues with the online visual text editor. Let’s return to the nlib2... [Read more...]

RObservations #1: Uploading your .Rmd File to WordPress: A TroubleShooters Guide

July 16, 2020 | Benyamin Smith

As anyone in tech will tell you. Having a website where you can showcase your work is a huge plus for getting an edge on the market, networking and building a portfolio. When starting out, this sort of stuff might seem overwhelming. If you’re an R user and have ... [Read more...]

SIMD Revisited

July 16, 2020 | HighlandR

The Scottish Index of Multiple Deprivation updated for 2020 I have blogged about the SIMD previously. The last time was using data from 2016. Earlier this year, the data was refreshed, and my friend David Henderson was hot off the press wit...

[Read more...]

« 1 … 5 6 7 8 9 … 14 »

July 2020

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)