Time series cross-validation using crossval

[This article was first published on T. Moudiki's Webpage - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Time series cross-validation is now available in crossval, using function crossval::crossval_ts. Main parameters for crossval::crossval_ts include:

  • fixed_window described below in sections 1 and 2, and indicating if the training set’s size is fixed or increasing through cross-validation iterations
  • initial_window: the number of points in the rolling training set
  • horizon: the number of points in the rolling testing set

Yes, this type of functionality exists in packages such as caret, or forecast, but with different flavours. We start by installing crossval from its online repository (in R’s console):

<span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="n">devtools</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"thierrymoudiki/crossval"</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">crossval</span><span class="p">)</span><span class="w">
</span>

1 – Calling crossval_ts with option fixed_window = TRUE

image-title-here

initial_windowis the length of the training set, depicted in blue, which is fixed through cross-validation iterations. horizon is the length of the testing set, in orange.

1 – 1 Using statistical learning functions

<span class="c1"># regressors including trend </span><span class="w">
</span><span class="n">xreg</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="nf">length</span><span class="p">(</span><span class="n">AirPassengers</span><span class="p">))</span><span class="w">

</span><span class="c1"># cross validation with least squares regression</span><span class="w">
</span><span class="n">res</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">crossval_ts</span><span class="p">(</span><span class="n">y</span><span class="o">=</span><span class="n">AirPassengers</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="o">=</span><span class="n">xreg</span><span class="p">,</span><span class="w"> </span><span class="n">fit_func</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">crossval</span><span class="o">::</span><span class="n">fit_lm</span><span class="p">,</span><span class="w">
</span><span class="n">predict_func</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">crossval</span><span class="o">::</span><span class="n">predict_lm</span><span class="p">,</span><span class="w">
</span><span class="n">initial_window</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w">
</span><span class="n">horizon</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w">
</span><span class="n">fixed_window</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">

</span><span class="c1"># print results</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">colMeans</span><span class="p">(</span><span class="n">res</span><span class="p">))</span><span class="w">
</span>
<span class="w">       </span><span class="n">ME</span><span class="w">        </span><span class="n">RMSE</span><span class="w">         </span><span class="n">MAE</span><span class="w">         </span><span class="n">MPE</span><span class="w">        </span><span class="n">MAPE</span><span class="w"> 
 </span><span class="m">0.16473829</span><span class="w"> </span><span class="m">71.42382836</span><span class="w"> </span><span class="m">67.01472299</span><span class="w">  </span><span class="m">0.02345201</span><span class="w">  </span><span class="m">0.22106607</span><span class="w"> 
</span>

1 – 2 Using time series functions from package forecast

<span class="n">res</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">crossval_ts</span><span class="p">(</span><span class="n">y</span><span class="o">=</span><span class="n">AirPassengers</span><span class="p">,</span><span class="w"> </span><span class="n">initial_window</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> 
	</span><span class="n">horizon</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w">
	</span><span class="n">fcast_func</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">forecast</span><span class="o">::</span><span class="n">thetaf</span><span class="p">,</span><span class="w"> 
	</span><span class="n">fixed_window</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">colMeans</span><span class="p">(</span><span class="n">res</span><span class="p">))</span><span class="w">
</span>
<span class="w">        </span><span class="n">ME</span><span class="w">         </span><span class="n">RMSE</span><span class="w">          </span><span class="n">MAE</span><span class="w">          </span><span class="n">MPE</span><span class="w">         </span><span class="n">MAPE</span><span class="w"> 
 </span><span class="m">2.657082195</span><span class="w"> </span><span class="m">51.427170382</span><span class="w"> </span><span class="m">46.511874693</span><span class="w">  </span><span class="m">0.003423843</span><span class="w">  </span><span class="m">0.155428590</span><span class="w"> 
</span>

2 – Calling crossval_ts with option fixed_window = FALSE

image-title-here

initial_windowis the length of the training set, in blue, which increases through cross-validation iterations. horizon is the length of the testing set, depicted in orange.

2 – 1 Using statistical learning functions

<span class="c1"># regressors including trend </span><span class="w">
</span><span class="n">xreg</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="nf">length</span><span class="p">(</span><span class="n">AirPassengers</span><span class="p">))</span><span class="w">

</span><span class="c1"># cross validation with least squares regression </span><span class="w">
</span><span class="n">res</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">crossval_ts</span><span class="p">(</span><span class="n">y</span><span class="o">=</span><span class="n">AirPassengers</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="o">=</span><span class="n">xreg</span><span class="p">,</span><span class="w"> </span><span class="n">fit_func</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">crossval</span><span class="o">::</span><span class="n">fit_lm</span><span class="p">,</span><span class="w">
</span><span class="n">predict_func</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">crossval</span><span class="o">::</span><span class="n">predict_lm</span><span class="p">,</span><span class="w">
</span><span class="n">initial_window</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w">
</span><span class="n">horizon</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w">
</span><span class="n">fixed_window</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">

</span><span class="c1"># print results</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">colMeans</span><span class="p">(</span><span class="n">res</span><span class="p">))</span><span class="w">
</span>
<span class="w">     </span><span class="n">ME</span><span class="w">        </span><span class="n">RMSE</span><span class="w">         </span><span class="n">MAE</span><span class="w">         </span><span class="n">MPE</span><span class="w">        </span><span class="n">MAPE</span><span class="w"> 
</span><span class="m">11.35159629</span><span class="w"> </span><span class="m">40.54895772</span><span class="w"> </span><span class="m">36.07794747</span><span class="w"> </span><span class="m">-0.01723816</span><span class="w">  </span><span class="m">0.11825111</span><span class="w"> 
</span>

2 – 2 Using time series functions from package forecast

<span class="n">res</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">crossval_ts</span><span class="p">(</span><span class="n">y</span><span class="o">=</span><span class="n">AirPassengers</span><span class="p">,</span><span class="w"> </span><span class="n">initial_window</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> 
	</span><span class="n">horizon</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w">
	</span><span class="n">fcast_func</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">forecast</span><span class="o">::</span><span class="n">thetaf</span><span class="p">,</span><span class="w"> 
	</span><span class="n">fixed_window</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">colMeans</span><span class="p">(</span><span class="n">res</span><span class="p">))</span><span class="w">
</span>
<span class="w">       </span><span class="n">ME</span><span class="w">         </span><span class="n">RMSE</span><span class="w">          </span><span class="n">MAE</span><span class="w">          </span><span class="n">MPE</span><span class="w">         </span><span class="n">MAPE</span><span class="w"> 
 </span><span class="m">2.670281455</span><span class="w"> </span><span class="m">44.758106487</span><span class="w"> </span><span class="m">40.284267136</span><span class="w">  </span><span class="m">0.002183707</span><span class="w">  </span><span class="m">0.135572333</span><span class="w"> 
</span>

Note: I am currently looking for a gig. You can hire me on Malt or send me an email: thierry dot moudiki at pm dot me. I can do descriptive statistics, data preparation, feature engineering, model calibration, training and validation, and model outputs’ interpretation. I am fluent in Python, R, SQL, Microsoft Excel, Visual Basic (among others) and French. My résumé? Here!

Licence Creative Commons
Under License Creative Commons Attribution 4.0 International.

To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)