Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This guest post from
Marcos Arancibia describes how to use Oracle R Enterprise to analyze Time Series data.

In this article, we give an overview of how to use Time Series Analysis against data stored in Oracle
Database, using the Embedded R Execution capability to send time series computations
to the Oracle Database server instead processing at the client. We will also learn how to retrieve the final
series or forecasts and retrieve them to the client for plotting, forecasting, and
diagnosing.

One key thing to keep in mind when
using Time Series techniques with data that is stored in Oracle Database is the
order of the rows, or records. Because
of the parallel capabilities of Oracle Database, when queried for records, one
might end up receiving records out of order if an option for order is not
specified.

Simple Example using Stock Data

Series example. First we will need to connect to our Oracle Database using ORE.
Then, using the package TTR, we will access Oracle Stock data from YahooData
service, from January 1, 2008 to January 1, 2013 and push it to the database.

#
Load the ORE library and connect to Oracle Database

library(ORE)

ore.connect(“myuser”,”mysid”,”myserver”,”mypass”,port=1521,all=TRUE)

library(TTR)

#
Get data in XTS format

xts.orcl
<- getYahooData("ORCL", 20080101, 20130101)

#
Convert it to a data frame and gets the date

#
Makes the date the Index

df.orcl
<- data.frame(xts.orcl)

df.orcl$date <- (data.frame(date=index(xts.orcl))$date)

#
Create/overwrite data in Oracle Database

#
to a Table called ORCLSTOCK

ore.drop(table=”ORCLSTOCK”)

ore.create(df.orcl,table=”ORCLSTOCK”)

#
IMPORTANT STEP!!!

#
Ensure indexing is kept by date

rownames(ORCLSTOCK)
<- ORCLSTOCK$date # Ensure the data is in the DB ore.ls() # Review column names, data statistics and # print a sample of the data names(ORCLSTOCK) >names(ORCLSTOCK) [1] “Open” “High” “Low” “Close” “Volume” [6] “Unadj.Close” “Div” “Split” “Adj.Div” “date” summary(ORCLSTOCK$Close)

>summary(ORCLSTOCK$Close) Min. 1st Qu. Median Mean 3rd Qu. Max. 13.36 20.53 24.22 24.79 29.70 35.73 head(ORCLSTOCK) >head(ORCLSTOCK) Open High Low Close Volume 2008-01-02 01:00:00 21.74414 22.00449 21.58022 21.68629 44360179 2008-01-03 01:00:00 21.62843 22.28413 21.62843 22.28413 43600532 2008-01-04 01:00:00 21.95628 22.06235 21.01130 21.24272 46391263 2008-01-07 01:00:00 21.17523 21.67664 21.01130 21.45486 41527032 2008-01-08 01:00:00 21.44522 21.52236 20.38453 20.39417 45155398 2008-01-09 01:00:00 20.57738 20.91487 20.39417 20.83773 49750304 Unadj.Close Div Split Adj.Div date 2008-01-02 01:00:00 22.49 NA NA NA 2008-01-02 2008-01-03 01:00:00 23.11 NA NA NA 2008-01-03 2008-01-04 01:00:00 22.03 NA NA NA 2008-01-04 2008-01-07 01:00:00 22.25 NA NA NA 2008-01-07 2008-01-08 01:00:00 21.15 NA NA NA 2008-01-08 2008-01-09 01:00:00 21.61 NA NA NA 2008-01-09 Pull data from the database for a simple plot # Pull data from Oracle Database (only the necessary columns) orcl <- ore.pull(ORCLSTOCK[,c("date","Close","Open","Low","High")]) # Simple plot with base libraries – Closing plot(orcl$date,orcl$Close,type=”l”,col=”red”,xlab=”Date”,ylab=”US$”,

main=”Base plot:Daily ORACLE Stock
Closing points”)

#
Simple plot with base libraries – Other Series

plot(orcl$date,orcl$Open,type=”l”,col=”blue”,xlab=”Date”,ylab=”US$”, main=”Base plot:Daily ORACLE Stock: Open/High/Low points”) lines(orcl$date,orcl$High,col=”green”) lines(orcl$date,orcl$Low,col=”orange”) legend(“topleft”, c(“Opening”,”High”,”Low”), col=c(“blue”,”green”,”orange”),lwd=2,title = “Series”,bty=”n”) A different plot option, using the package xts library(xts) # Pull data from Oracle Database (only the necessary columns) orcl <- ore.pull(ORCLSTOCK[,c("date","Close","Open","Low","High")]) # Convert data to Time Series format orcl.xts <- as.xts(orcl,order.by=orcl$date,dateFormat="POSIXct")

#
Plot original series

plot(orcl.xts$Close,major.ticks=’months’,minor.ticks=FALSE, main=”Time Series plot:Daily ORACLE Stock Closing points”,col=”red”) Simple Time Series: Moving Average Smoothing We might be tempted to call functions like the Smoothing Moving Average from open-source CRAN packages against Oracle Database Tables, but those packages do not know what to do with an “ore.frame”. For that process to work correctly, we can either load the data locally or send the process for remote execution on the Database Server by using Embedded R Execution. We will also explore the built-in Moving Average process from ore.rollmean() as a third alternative. ALTERNATIVE 1 – The first example is pulling the data from Oracle Database into a ts (time series) object first, for a Client-side smoothing Process. library(TTR) # Pull part of the database table into a local data.frame sm.orcl <- ore.pull(ORCLSTOCK[,c("date","Close")]) # Convert “Close” attribute into a Time Series (ts) ts.orcl <- ts(sm.orcl$Close)

#
Use SMA – Smoothing Moving Average algorithm from package TTR

ts.sm.orcl
<-ts(SMA(ts.orcl,n=30),frequency=365, start=c(2008,1) )

#
Plot both Series together

plot(sm.orcl$date,sm.orcl$Close,type=”l”,col=”red”,xlab=”Date”,ylab=”US$”, main=”ORCL Stock Close CLIENT-side Smoothed Series n=30 days”) lines(sm.orcl$date,ts.sm.orcl,col=”blue”)

legend(“topleft”,
c(“Closing”,”MA(30) of Closing”),

col=c(“red”,”blue”),lwd=2,title = “Series”,bty=”n”)

ALTERNATIVE 2 – In
this alternative, we will use a Server-side example for running the Smoothing
via Moving Average, without bringing all data to the client. Only the result is brought locally for
plotting. Remember that the TTR package
has to be installed on the Server in order to be called.

#
Server execution call using ore.tableApply

#
Result is an ore.list that remains in the database until needed

sv.orcl.ma30
<-

ore.tableApply(ORCLSTOCK[,c(“date”,”Close”)],ore.connect = TRUE,

function(dat) {

library(TTR)

ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),] list(res1 <- ts(ordered$Close,frequency=365, start=c(2008,1)),

res2 <- ts(SMA(res1,n=30),frequency=365, start=c(2008,1)),

res3 <- ordered$date) } ); # Bring the results locally for plotting local.orcl.ma30 <- ore.pull(sv.orcl.ma30) # Plot two series side by side # (the third element of the list is the date) plot(local.orcl.ma30[[3]],local.orcl.ma30[[1]],type=”l”, col=”red”,xlab=”Date”,ylab=”US$”,

main=”ORCL Stock Close SERVER-side
Smoothed Series n=30 days”)

#

lines(local.orcl.ma30[[3]],

local.orcl.ma30[[2]],col=”blue”,type=”l”)

#

legend(“topleft”,
c(“Closing”,”Server MA(30) of Closing”),

col=c(“red”,”blue”), lwd=2,title = “Series”, bty=”n”)

ALTERNATIVE 3 – In this alternative we will use a Server-side example with
the computation of Moving Averages using the native ORE in-Database functions
without bringing data to the client. Only the result is brought locally for
plotting.

Just one line of code is needed to generate an in-Database
Computation of Moving averages and the creation of a new VIRTUAL column in the Oracle
Database. We will call this new column rollmean30.

We will use the function ore.rollmean(). The option align=”right” makes the
MA look at only the past k days (30 in this case), or less, depending on the
point in time. This creates a small
difference between this method and the previous methods in the beginning of the
series, since ore.rollmean() can actually calculate the first sets of days
using smaller sets of data available, while other methods discard this data.

#
Moving Average done directly in Oracle Database

ORCLSTOCK$rollmean30 <- ore.rollmean(ORCLSTOCK$Close, k = 30, align="right")

#
Check that new variable is in the database

Open High Low Close Volume

2008-01-02 01:00:00 21.74414 22.00449 21.58022 21.68629 44360179

2008-01-03 01:00:00 21.62843 22.28413 21.62843 22.28413 43600532

2008-01-04 01:00:00 21.95628 22.06235 21.01130 21.24272 46391263

2008-01-07 01:00:00 21.17523 21.67664 21.01130 21.45486 41527032

2008-01-08 01:00:00 21.44522 21.52236 20.38453 20.39417 45155398

2008-01-09 01:00:00 20.57738 20.91487 20.39417 20.83773 49750304

2008-01-02 01:00:00 22.49 NA NA NA 2008-01-02 21.68629

2008-01-03 01:00:00 23.11 NA NA NA 2008-01-03 21.98521

2008-01-04 01:00:00 22.03 NA NA NA 2008-01-04 21.73771

2008-01-07 01:00:00 22.25 NA NA NA 2008-01-07 21.66700

2008-01-08 01:00:00 21.15 NA NA NA 2008-01-08 21.41243

2008-01-09 01:00:00 21.61 NA NA NA 2008-01-09 21.31665

#
Get results locally for plotting

local.orcl
<- ore.pull(ORCLSTOCK[,c("date","Close", "rollmean30")])

sub.orcl
<- subset(local.orcl,local.orcl$date> as.Date(“2011-12-16”)) # Plot the two series side by side # First plot original series plot(local.orcl$date, local.orcl$Close,type=”l”, col=”red”,xlab=”Date”,ylab=”US$”,

main=”ORCL Stock Close ORE
Computation of Smoothed Series n=30 days”)

#

lines(local.orcl$date,local.orcl$rollmean30,col=”blue”,type=”l”)

#

legend(“topleft”,
c(“Closing”,”ORE MA(30) of Closing”),

col=c(“red”,”blue”),lwd=2,title = “Series”,bty=”n”)

Seasonal Decomposition for Time
Series Diagnostics

Now that we have learned how to
execute these processes using Embedded R, we can start using other
methodologies required for Time Series using the same Server-side computation
and local plotting.

It is typical for an analyst to try
to understand a Time Series better by looking at some of the basic diagnostics
like the Seasonal Decomposition of Time Series by Loess. These can be achieved by using the stl()
command in the following process:

#
Server execution

sv.orcl.dcom
<-

ore.tableApply(ORCLSTOCK[,c(“date”,”Close”)],ore.connect = TRUE,

function(dat) {

ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),] ts.orcl <- ts(ordered$Close,frequency=365, start=c(2008,1))

res <- stl(ts.orcl,s.window="periodic")

}

);

#
Get result for plotting

local.orcl.dcom
<- ore.pull(sv.orcl.dcom)

plot(local.orcl.dcom,
main=”Server-side Decomposition of ORCL Time-Series”,col=”blue”)

Another typical set of diagnostic
charts includes Autocorrelation and Partial Autocorrelation function plots.
These can be achieved by using the acf() command with the proper options in
Embedded R Execution, so computations happen at the Oracle Database server:

#
Server-side ACF and PACF computation

#
Use function acf() and save result as a list

sv.orcl.acf
<-

ore.tableApply(ORCLSTOCK[,c(“date”,”Close”)],ore.connect=TRUE,

function(dat){

ts.orcl <- ts(dat$Close,frequency=365, start=c(2008,1)) list(res1 <- acf(ts.orcl,lag.max=120,type="correlation"),res2 <- acf(ts.orcl,lag.max=30, type="partial")) } ); # Get results for plotting # ACF and PACF as members of the list pulled local.orcl.acf <- ore.pull(sv.orcl.acf) plot(local.orcl.acf[[1]],main=”Server-side ACF Analysis for Series ORCL”,col=”blue”,lwd=2) plot(local.orcl.acf[[2]],main=”Server-side PACF Analysis for Series ORCL”,col=”blue”,lwd=5) Simple Exponential Smoothing Using the popular package “forecast”, we will use the ses() function to calculate a 90 days horizon (h=90) into the future, using the option criterion=MSE for the model. The package forecast needs to be installed on the Oracle Database server R engine. Then, we will bring the resulting model locally for plotting. Remember to load the library “forecast” locally as well, to be able to interpret the meaning of the ses() output when it’s brought locally. # Execute ses() call in the server sv.orcl.ses <- ore.tableApply(ORCLSTOCK[,c(“date”,”Close”)], ore.connect=TRUE, function(dat) { library(forecast) ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]

ts.orcl <- ts(ordered$Close,frequency=365, start=c(2008,1) ) res <- ses(ts.orcl, h=90, alpha=0.1, initial="simple") } ); # Get SES result locally for plotting # Since remote object contains a SES model from package forecast, # load package locally as well library(forecast) plot.orcl.ses <- ore.pull(sv.orcl.ses) plot(plot.orcl.ses,col=”blue”,fcol=”red”, main=”ORCL with Server-side SES – Simple Exponential Smoothing Forecast”) Holt Exponential Smoothing Using the popular package “forecast”, we will use the holt() function to calculate a 90 days horizon (h=90) into the future, requesting the Intervals of confidence of 80 and 95%. Again. the package “forecast” needs to be installed on the Oracle Database server R engine. Then, we will bring the resulting model locally for plotting. Remember to load the library forecast locally as well, to be able to interpret the meaning of the holt() output when it’s brought locally. # Execute holt() call in the server sv.orcl.ets <- ore.tableApply(ORCLSTOCK[,c(“date”,”Close”)], ore.connect=TRUE, function(dat) { library(forecast) ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]

ts.orcl <- ts(ordered$Close,frequency=365, start=c(2008,1)) res <- holt(ts.orcl, h=90, level=c(80,95), initial="optimal") } ); # Get resulting model from the server # Since remote object contains a Holt Exponential Smoothing # model from package forecast, load package locally as well library(forecast) local.orcl.ets <- ore.pull(sv.orcl.ets) plot(local.orcl.ets,col=”blue”,fcol=”red”, main=”ORCL Original Series Stock Close with Server-side Holt Forecast”) ARIMA – Auto-Regressive Interactive Moving Average There are at least two options for fitting an ARIMA model into a Time Series. One option is to use the package “forecast”, that allows for an automatic arima fitting (auto.arima) to find the best parameters possible based on the series. For more advanced users, the arima() function in the “stats” package itself allows for choosing the model parameters. # ARIMA models on the server using auto.arima() from package forecast arimaModel <- ore.tableApply(ORCLSTOCK[,c(“date”,”Close”)], ore.connect=TRUE, FUN = function(dat){ # load forecast library to use auto.arima library(forecast) # sort the table into a temp file by date ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]

# convert column into
a Time Series

# format ts(…) and
request creation of an automatic

# ARIMA model auto.arima(…)

res <- auto.arima(ts(ordered$Close,frequency=365, start=c(2008,1)), stepwise=TRUE, seasonal=TRUE) }) # Alternative using the arima() from package “stats”. arimaModel <- ore.tableApply(ORCLSTOCK[,c(“date”,”Close”)],ore.connect=TRUE, FUN = function(dat){ # sort table into a temp file by date ordered <- dat[order(as.Date(dat$date, format="%Y-%m-%d")),]

# convert column
into a Time Series

# format ts(…) and
request creation of a specific

# ARIMA model
using arima(), for example an ARIMA(2,1,2)

res <- arima(ts(ordered$Close,frequency=365, start=c(2008,1)), order = c(2,1,2)) }) # Load forecast package locally to use the model # for plotting and producing forecasts library(forecast) # Show remote resulting Time Series model >arimaModel Series: ts(ordered$Close, frequency = 365, start = c(2008, 1))

ARIMA(2,1,0)

Coefficients:

ar1 ar2

-0.0935 -0.0192

s.e. 0.0282 0.0282

sigma^2 estimated as 0.2323: log likelihood=-866.77

AIC=1739.55 AICc=1739.57 BIC=1754.96

#
Get remote model using ore.pull for local prediction and plotting

local.arimaModel
<- ore.pull(arimaModel)

#
Generate forecasts for the next 15 days

fore.arimaModel
<- forecast(local.arimaModel, h=15)

#
Use the following option if you need to remove scientific notation of

#
numbers that are too large in charts

options(scipen=10)

#
Generate the plot of forecasts, including interval of confidence

#
Main title is generated automatically indicating the type of model

#
chosen by the Auto ARIMA process

plot(fore.arimaModel,type=”l”, col=”blue”, xlab=”Date”,

ylab=”Closing value
(US$)”, cex.axis=0.75, font.lab=”serif EUC”, sub=”Auto-generated ARIMA for ORCL Stock Closing” ) # Generate and print forecasted data points plus standard errors # of the next 15 days forecasts <- predict(local.arimaModel, n.ahead = 15) >forecasts$pred

Time Series:

Start = c(2011, 165)

End = c(2011, 179)

Frequency = 365

[1] 33.29677 33.29317 33.29395
33.29395 33.29393 33.29393 33.29393 33.29393 33.29393 33.29393 33.29393

[12] 33.29393 33.29393 33.29393 33.29393

\$se

Time Series:

Start = c(2011, 165)

End = c(2011, 179)

Frequency = 365

[1] 0.4819417 0.6504925
0.7807798 0.8928901 0.9924032 1.0827998 1.1662115 1.2440430 1.3172839 1.3866617

[11] 1.4527300 1.5159216 1.5765824 1.6349941 1.6913898