**Oracle R Enterprise**, and kindly contributed to R-bloggers)

*This guest post from
Marcos Arancibia describes how to use Oracle R Enterprise to analyze Time Series data.*

In this article, we give an overview of how to use Time Series Analysis against data stored in Oracle

Database, using the Embedded R Execution capability to send time series computations

to the Oracle Database server instead processing at the client. We will also learn how to retrieve the final

series or forecasts and retrieve them to the client for plotting, forecasting, and

diagnosing.

One key thing to keep in mind when

using Time Series techniques with data that is stored in Oracle Database is the

order of the rows, or records. Because

of the parallel capabilities of Oracle Database, when queried for records, one

might end up receiving records out of order if an option for order is not

specified.

**Simple Example using Stock Data **

Let’s start with a simple Time

Series example. First we will need to connect to our Oracle Database using ORE.

Then, using the package TTR, we will access Oracle Stock data from YahooData

service, from January 1, 2008 to January 1, 2013 and push it to the database.

#

Load the ORE library and connect to Oracle Database

library(ORE)

ore.connect("myuser","mysid","myserver","mypass",port=1521,all=TRUE)

library(TTR)

#

Get data in XTS format

xts.orcl

<- getYahooData("ORCL", 20080101, 20130101)

#

Convert it to a data frame and gets the date

#

Makes the date the Index

df.orcl

<- data.frame(xts.orcl)

df.orcl$date

<- (data.frame(date=index(xts.orcl))$date)

#

Create/overwrite data in Oracle Database

#

to a Table called ORCLSTOCK

ore.drop(table="ORCLSTOCK")

ore.create(df.orcl,table="ORCLSTOCK")

#

IMPORTANT STEP!!!

#

Ensure indexing is kept by date

rownames(ORCLSTOCK)

<- ORCLSTOCK$date

#

Ensure the data is in the DB

ore.ls()

#

Review column names, data statistics and

#

print a sample of the data

names(ORCLSTOCK)

>names(ORCLSTOCK)

[1] "Open" "High" "Low" "Close" "Volume"

[6] "Unadj.Close"

"Div" "Split" "Adj.Div" "date"

summary(ORCLSTOCK$Close)

>summary(ORCLSTOCK$Close)

Min. 1st Qu. Median Mean 3rd Qu. Max.

13.36 20.53 24.22 24.79 29.70 35.73

head(ORCLSTOCK)

>head(ORCLSTOCK)

Open High Low Close Volume

2008-01-02 01:00:00 21.74414 22.00449 21.58022 21.68629 44360179

2008-01-03 01:00:00 21.62843 22.28413 21.62843 22.28413 43600532

2008-01-04 01:00:00 21.95628 22.06235 21.01130 21.24272 46391263

2008-01-07 01:00:00 21.17523 21.67664 21.01130 21.45486 41527032

2008-01-08 01:00:00 21.44522 21.52236 20.38453 20.39417 45155398

2008-01-09 01:00:00 20.57738 20.91487 20.39417 20.83773 49750304

Unadj.Close

Div Split Adj.Div date

2008-01-02 01:00:00 22.49 NA NA NA 2008-01-02

2008-01-03 01:00:00 23.11 NA NA NA 2008-01-03

2008-01-04 01:00:00 22.03 NA NA NA 2008-01-04

2008-01-07 01:00:00 22.25 NA NA NA 2008-01-07

2008-01-08 01:00:00 21.15 NA NA NA 2008-01-08

2008-01-09 01:00:00 21.61 NA NA NA 2008-01-09

**Pull data from the database for a simple
plot**

#

Pull data from Oracle Database (only the necessary columns)

orcl

<-

ore.pull(ORCLSTOCK[,c("date","Close","Open","Low","High")])

#

Simple plot with base libraries – Closing

plot(orcl$date,orcl$Close,type="l",col="red",xlab="Date",ylab="US$",

main="Base plot:Daily ORACLE Stock

Closing points")

#

Simple plot with base libraries – Other Series

plot(orcl$date,orcl$Open,type="l",col="blue",xlab="Date",ylab="US$",

main="Base plot:Daily ORACLE Stock:

Open/High/Low points")

lines(orcl$date,orcl$High,col="green")

lines(orcl$date,orcl$Low,col="orange")

legend("topleft",

c("Opening","High","Low"),

col=c("blue","green","orange"),lwd=2,title = "Series",bty="n")

**A different plot option, using the
package xts**

library(xts)

#

Pull data from Oracle Database (only the necessary columns)

orcl

<-

ore.pull(ORCLSTOCK[,c("date","Close","Open","Low","High")])

#

Convert data to Time Series format

orcl.xts

<- as.xts(orcl,order.by=orcl$date,dateFormat="POSIXct")

#

Plot original series

plot(orcl.xts$Close,major.ticks=’months’,minor.ticks=FALSE,

main="Time Series plot:Daily ORACLE

Stock Closing points",col="red")

**Simple Time Series: Moving Average
Smoothing**

We might be tempted to call

functions like the Smoothing Moving Average from open-source CRAN packages

against Oracle Database Tables, but those packages do not know what to do with

an “ore.frame”. For that process to work

correctly, we can either load the data locally or send the process for remote

execution on the Database Server by using Embedded R Execution.

We will also explore the built-in

Moving Average process from ore.rollmean() as a third alternative.

**ALTERNATIVE 1 –**

The first example is pulling the data from Oracle Database into a ts (time series) object first, for a Client-side

smoothing Process.

library(TTR)

#

Pull part of the database table into a local data.frame

sm.orcl

<- ore.pull(ORCLSTOCK[,c("date","Close")])

#

Convert "Close" attribute into a Time Series (ts)

ts.orcl

<- ts(sm.orcl$Close)

#

Use SMA – Smoothing Moving Average algorithm from package TTR

ts.sm.orcl

<-ts(SMA(ts.orcl,n=30),frequency=365, start=c(2008,1) )

#

Plot both Series together

plot(sm.orcl$date,sm.orcl$Close,type="l",col="red",xlab="Date",ylab="US$",

main="ORCL Stock Close CLIENT-side

Smoothed Series n=30 days")

lines(sm.orcl$date,ts.sm.orcl,col="blue")

legend("topleft",

c("Closing","MA(30) of Closing"),

col=c("red","blue"),lwd=2,title = "Series",bty="n")

**ALTERNATIVE 2 –** In

this alternative, we will use a Server-side example for running the Smoothing

via Moving Average, without bringing all data to the client. Only the result is brought locally for

plotting. Remember that the TTR package

has to be installed on the Server in order to be called.

#

Server execution call using ore.tableApply

#

Result is an ore.list that remains in the database until needed

sv.orcl.ma30

<-

ore.tableApply(ORCLSTOCK[,c("date","Close")],ore.connect = TRUE,

function(dat) {

library(TTR)

ordered <- dat[order(as.Date(dat$date,

format="%Y-%m-%d")),]

list(res1 <-

ts(ordered$Close,frequency=365, start=c(2008,1)),

res2 <-

ts(SMA(res1,n=30),frequency=365, start=c(2008,1)),

res3 <- ordered$date)

}

);

#

Bring the results locally for plotting

local.orcl.ma30

<- ore.pull(sv.orcl.ma30)

#

Plot two series side by side

#

(the third element of the list is the date)

plot(local.orcl.ma30[[3]],local.orcl.ma30[[1]],type="l",

col="red",xlab="Date",ylab="US$",

main="ORCL Stock Close SERVER-side

Smoothed Series n=30 days")

#

Add smoothed series

lines(local.orcl.ma30[[3]],

local.orcl.ma30[[2]],col="blue",type="l")

#

Add legend

legend("topleft",

c("Closing","Server MA(30) of Closing"),

col=c("red","blue"), lwd=2,title = "Series", bty="n")

**ALTERNATIVE 3 –** In this alternative we will use a Server-side example with

the computation of Moving Averages using the native ORE in-Database functions

without bringing data to the client. Only the result is brought locally for

plotting.

Just one line of code is needed to generate an in-Database

Computation of Moving averages and the creation of a new VIRTUAL column in the Oracle

Database. We will call this new column rollmean30.

We will use the function ore.rollmean(). The option align="right" makes the

MA look at only the past k days (30 in this case), or less, depending on the

point in time. This creates a small

difference between this method and the previous methods in the beginning of the

series, since ore.rollmean() can actually calculate the first sets of days

using smaller sets of data available, while other methods discard this data.

#

Moving Average done directly in Oracle Database

ORCLSTOCK$rollmean30

<- ore.rollmean(ORCLSTOCK$Close, k = 30, align="right")

#

Check that new variable is in the database

head(ORCLSTOCK)

>head(ORCLSTOCK)

Open High Low Close Volume

2008-01-02 01:00:00 21.74414 22.00449 21.58022 21.68629 44360179

2008-01-03 01:00:00 21.62843 22.28413 21.62843 22.28413 43600532

2008-01-04 01:00:00 21.95628 22.06235 21.01130 21.24272 46391263

2008-01-07 01:00:00 21.17523 21.67664 21.01130 21.45486 41527032

2008-01-08 01:00:00 21.44522 21.52236 20.38453 20.39417 45155398

2008-01-09 01:00:00 20.57738 20.91487 20.39417 20.83773 49750304

Unadj.Close

Div Split Adj.Div date rollmean30

2008-01-02 01:00:00 22.49 NA NA NA 2008-01-02 21.68629

2008-01-03 01:00:00 23.11 NA NA NA 2008-01-03 21.98521

2008-01-04 01:00:00 22.03 NA NA NA 2008-01-04 21.73771

2008-01-07 01:00:00 22.25 NA NA NA 2008-01-07 21.66700

2008-01-08 01:00:00 21.15 NA NA NA 2008-01-08 21.41243

2008-01-09 01:00:00 21.61 NA NA NA 2008-01-09 21.31665

#

Get results locally for plotting

local.orcl

<- ore.pull(ORCLSTOCK[,c("date","Close",

"rollmean30")])

sub.orcl

<- subset(local.orcl,local.orcl$date> as.Date("2011-12-16"))

#

Plot the two series side by side

#

First plot original series

plot(local.orcl$date, local.orcl$Close,type="l", col="red",xlab="Date",ylab="US$",

main="ORCL Stock Close ORE

Computation of Smoothed Series n=30 days")

#

Add smoothed series

lines(local.orcl$date,local.orcl$rollmean30,col="blue",type="l")

#

Add legend

legend("topleft",

c("Closing","ORE MA(30) of Closing"),

col=c("red","blue"),lwd=2,title = "Series",bty="n")

**Seasonal Decomposition for Time
Series Diagnostics**

Now that we have learned how to

execute these processes using Embedded R, we can start using other

methodologies required for Time Series using the same Server-side computation

and local plotting.

It is typical for an analyst to try

to understand a Time Series better by looking at some of the basic diagnostics

like the Seasonal Decomposition of Time Series by Loess. These can be achieved by using the stl()

command in the following process:

#

Server execution

sv.orcl.dcom

<-

ore.tableApply(ORCLSTOCK[,c("date","Close")],ore.connect = TRUE,

function(dat) {

ordered <- dat[order(as.Date(dat$date,

format="%Y-%m-%d")),]

ts.orcl <- ts(ordered$Close,frequency=365,

start=c(2008,1))

res <-

stl(ts.orcl,s.window="periodic")

}

);

#

Get result for plotting

local.orcl.dcom

<- ore.pull(sv.orcl.dcom)

plot(local.orcl.dcom,

main="Server-side Decomposition of ORCL Time-Series",col="blue")

Another typical set of diagnostic

charts includes Autocorrelation and Partial Autocorrelation function plots.

These can be achieved by using the acf() command with the proper options in

Embedded R Execution, so computations happen at the Oracle Database server:

#

Server-side ACF and PACF computation

#

Use function acf() and save result as a list

sv.orcl.acf

<-

ore.tableApply(ORCLSTOCK[,c("date","Close")],ore.connect=TRUE,

function(dat){

ts.orcl <- ts(dat$Close,frequency=365,

start=c(2008,1))

list(res1 <-

acf(ts.orcl,lag.max=120,type="correlation"),res2 <- acf(ts.orcl,lag.max=30,

type="partial"))

}

);

#

Get results for plotting

#

ACF and PACF as members of the list pulled

local.orcl.acf

<- ore.pull(sv.orcl.acf)

plot(local.orcl.acf[[1]],main="Server-side ACF Analysis for

Series ORCL",col="blue",lwd=2)

plot(local.orcl.acf[[2]],main="Server-side PACF Analysis for

Series ORCL",col="blue",lwd=5)

**Simple Exponential Smoothing**

Using the popular package “forecast”,

we will use the ses() function to calculate a 90 days horizon (h=90) into the

future, using the option criterion=MSE for the model. The package forecast needs

to be installed on the Oracle Database server R engine.

Then, we will bring the resulting

model locally for plotting. Remember to load the library “forecast” locally as

well, to be able to interpret the meaning of the ses() output when it’s brought

locally.

#

Execute ses() call in the server

sv.orcl.ses

<-

ore.tableApply(ORCLSTOCK[,c("date","Close")], ore.connect=TRUE,

function(dat) {

library(forecast)

ordered <- dat[order(as.Date(dat$date,

format="%Y-%m-%d")),]

ts.orcl <-

ts(ordered$Close,frequency=365, start=c(2008,1) )

res <- ses(ts.orcl, h=90, alpha=0.1,

initial="simple")

}

);

#

Get SES result locally for plotting

#

Since remote object contains a SES model from package forecast,

#

load package locally as well

library(forecast)

plot.orcl.ses

<- ore.pull(sv.orcl.ses)

plot(plot.orcl.ses,col="blue",fcol="red",

main="ORCL with Server-side SES –

Simple Exponential Smoothing Forecast")

**Holt Exponential Smoothing**

Using the popular package “forecast”,

we will use the holt() function to calculate a 90 days horizon (h=90) into the

future, requesting the Intervals of confidence of 80 and 95%. Again. the

package “forecast” needs to be installed on the Oracle Database server R engine.

Then, we will bring the resulting

model locally for plotting. Remember to load the library forecast locally as

well, to be able to interpret the meaning of the holt() output when it’s

brought locally.

#

Execute holt() call in the server

sv.orcl.ets

<-

ore.tableApply(ORCLSTOCK[,c("date","Close")], ore.connect=TRUE,

function(dat) {

library(forecast)

ordered <- dat[order(as.Date(dat$date,

format="%Y-%m-%d")),]

ts.orcl <-

ts(ordered$Close,frequency=365, start=c(2008,1))

res

<- holt(ts.orcl, h=90, level=c(80,95), initial="optimal")

}

);

#

Get resulting model from the server

#

Since remote object contains a Holt Exponential Smoothing

#

model from package forecast, load package locally as well

library(forecast)

local.orcl.ets

<- ore.pull(sv.orcl.ets)

plot(local.orcl.ets,col="blue",fcol="red",

main="ORCL Original Series Stock

Close with Server-side Holt Forecast")

**ARIMA – Auto-Regressive Interactive
Moving Average**

There are at least two options for

fitting an ARIMA model into a Time Series. One option is to use the package “forecast”, that allows for an

automatic arima fitting (auto.arima) to find the best parameters possible based

on the series.

For more advanced users, the arima()

function in the “stats” package itself allows for choosing the model

parameters.

#

ARIMA models on the server using auto.arima() from package forecast

arimaModel

<-

ore.tableApply(ORCLSTOCK[,c("date","Close")], ore.connect=TRUE,

FUN = function(dat){

# load forecast

library to use auto.arima

library(forecast)

# sort the table into

a temp file by date

ordered <-

dat[order(as.Date(dat$date, format="%Y-%m-%d")),]

# convert column into

a Time Series

# format ts(…) and

request creation of an automatic

# ARIMA model auto.arima(…)

res <- auto.arima(ts(ordered$Close,frequency=365,

start=c(2008,1)),

stepwise=TRUE, seasonal=TRUE)

})

#

Alternative using the arima() from package “stats”.

arimaModel

<-

ore.tableApply(ORCLSTOCK[,c("date","Close")],ore.connect=TRUE,

FUN =

function(dat){

# sort table into

a temp file by date

ordered <-

dat[order(as.Date(dat$date, format="%Y-%m-%d")),]

# convert column

into a Time Series

# format ts(…) and

request creation of a specific

# ARIMA model

using arima(), for example an ARIMA(2,1,2)

res <- arima(ts(ordered$Close,frequency=365,

start=c(2008,1)),

order = c(2,1,2))

})

#

Load forecast package locally to use the model

#

for plotting and producing forecasts

library(forecast)

#

Show remote resulting Time Series model

>arimaModel

Series: ts(ordered$Close, frequency = 365, start = c(2008, 1))

ARIMA(2,1,0)

Coefficients:

ar1 ar2

-0.0935 -0.0192

s.e. 0.0282 0.0282

sigma^2 estimated as 0.2323: log likelihood=-866.77

AIC=1739.55 AICc=1739.57 BIC=1754.96

#

Get remote model using ore.pull for local prediction and plotting

local.arimaModel

<- ore.pull(arimaModel)

#

Generate forecasts for the next 15 days

fore.arimaModel

<- forecast(local.arimaModel, h=15)

#

Use the following option if you need to remove scientific notation of

#

numbers that are too large in charts

options(scipen=10)

#

Generate the plot of forecasts, including interval of confidence

#

Main title is generated automatically indicating the type of model

#

chosen by the Auto ARIMA process

plot(fore.arimaModel,type="l", col="blue", xlab="Date",

ylab="Closing value

(US$)", cex.axis=0.75, font.lab="serif EUC",

sub="Auto-generated ARIMA for ORCL

Stock Closing"

)

#

Generate and print forecasted data points plus standard errors

#

of the next 15 days

forecasts

<- predict(local.arimaModel, n.ahead = 15)

>forecasts

$pred

Time Series:

Start = c(2011, 165)

End = c(2011, 179)

Frequency = 365

[1] 33.29677 33.29317 33.29395

33.29395 33.29393 33.29393 33.29393 33.29393 33.29393 33.29393 33.29393

[12] 33.29393 33.29393 33.29393 33.29393

$se

Time Series:

Start = c(2011, 165)

End = c(2011, 179)

Frequency = 365

[1] 0.4819417 0.6504925

0.7807798 0.8928901 0.9924032 1.0827998 1.1662115 1.2440430 1.3172839 1.3866617

[11] 1.4527300 1.5159216 1.5765824 1.6349941 1.6913898

**leave a comment**for the author, please follow the link and comment on his blog:

**Oracle R Enterprise**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...