An Introduction to Time Series with JSON Data

[This article was first published on DataScience+, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For this post, I wanted to take the data analysis process in a different direction. Normally, an R analysis starts with data from a comma-separated Excel file (.csv) or a tab-separated file (.txt).

However, online data is often formatted in JSON, which stands for JavaScript Online Notation. JSON has different forms, but for this data, it consists of nested arrays in two main parts. One part is the meta-data header, and the other is the observations themselves. You can see that by looking at the file online here.

This data is LA city revenue, broken down by fiscal month and department. I want to get the total monthly revenue for the entire city and looks for seasonal trends.

To do this, we need to load in our JSON file, manipulate it, and apply time series analysis. Let’s get started.

JSON Manipulation

We load in the JSON file just as we would any other file type. However, R cannot read it, and that’s where “rjson” package comes into play. Once we extract the JSON file, we obtain the observations in the data section and create a variable for the number of observations.

#Install package
install.packages("rjson")
library(rjson)
jsonfile <- "yourpath/losangelesrevenue.json" 

#Extract JSON data using rjson
rev <- fromJSON(txt=jsonfile)
datalength <- length(rev$data)

As I mentioned, let’s looks at the observations data, which is in arrays but can be manipulated like a matrix. We first create arrays to store the values, then loop over the arrays to get each value by its index. In JSON, we first get to the correct observation with the [[x]] parameter, then choose the value within the vector with the [9] parameter. We now have arrays for the fiscal year, month, department, revenue, and fiscal period.

datalength <- length(rev$data)

yeardata <- {}
monthdata <- {}
department <- {}
revenue <- {}
fiscalperiod <- {}

for (x in 1:datalength) {
  yeardata <- c(yeardata,as.integer(noquote(rev$data[[x]][9]))) 
  monthdata <- c(monthdata,toString(noquote(rev$data[[x]][10])))
  department <- c(department,toString(noquote(rev$data[[x]][12])))
  revenue <- c(revenue,as.double(noquote(rev$data[[x]][13])))
  fiscalperiod <- c(fiscalperiod,toString(noquote(rev$data[[x]][19])))

Data Frame Manipulation

First we need to combine our vectors into a data frame to combine monthly revenue.

#Bind columns and convert it to dataframe
revdata <- as.data.frame(cbind(department, yeardata, monthdata, revenue,fiscalperiod))
revdata[,4] <- as.double(revenue)
head(revdata)
                     department yeardata monthdata   revenue fiscalperiod
1       FIRE INSURANCE PROCEEDS     2016  DECEMBER 182775.00       201606
2 PROP A LOCAL TRANSIT REL COST     2016  DECEMBER  35656.06       201606
3     INTERFD OPER TRANS-PROP A     2016  DECEMBER  88446.86       201606
4     OLDER AMERICAN ACTS GRANT     2016  DECEMBER    224.00       201606
5     OLDER AMERICAN ACTS GRANT     2016  DECEMBER 114946.00       201606
6         INTEREST INCOME-OTHER     2016  DECEMBER   2473.45       201606

With our dataframe ready, it would be great to have some pivot table style manipulation to get monthly totals, right? Well, the plyr package and ddply command come to the rescue.

install.packages("plyr")
library(plyr)
revtotal <- ddply(revdata,.(fiscalperiod,monthdata), summarize, 
     monthly_revenue = sum(revenue))
head(revtotal)
  fiscalperiod monthdata monthly_revenue
1       201201      JULY      3698329178
2       201202    AUGUST      1343848647
3       201203 SEPTEMBER      1328504564
4       201204   OCTOBER      1314401233
5       201205  NOVEMBER      1399582184
6       201206  DECEMBER      2178251781

Time Series Analysis

To do a very simple seasonal analysis, we can use the “forecast” package. The stl command allows us to see an underlying trend, a seasonal adjustment, and a residual error. I’ll plot just my time series data with seasonal model (fit).

install.packages('forecast')
library(forecast)
mts <- ts(revtotal$monthly_revenue, 
     start=c(2012,1),end=c(2016,6),frequency=12) 
fit <- stl(mts, s.window="period")
plot(fit)

Here is the plot:
Revenue seasonal jpeg

Hope this tutorial has encouraged you to play around with new sources of data.

To leave a comment for the author, please follow the link and comment on their blog: DataScience+.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)