Streaming Cloud Data to R

January 4, 2017
By

(This article was first published on R Language in Datazar on Medium, and kindly contributed to R-bloggers)

Saving your data in the cloud ensures that when you send your scripts to your colleagues, you don’t have to send them your data or any additional files with it. When it’s a URL link rather than “C://…” or “/home/…”, your script is always pointing to the same path/address. In this text, we’ll go through how we can use a cloud dataset in our R scripts. We’re going to be using a dataset containing the population of Earth from 5000BC until 2016.

There are several ways and packages to access a url from R. In the Datazar SDK, we’ll be using the “httr” package. Let’s go ahead and grab the Datazar SDK for R. I’ve also included the R code here so you can just copy and paste it to your script.

datazar<-function(username,token,objectType,objectId,option) {
require(httr)
require(jsonlite)
url<-paste("https://api.datazar.com/",objectType,"/",objectId,"/",option,"",sep="")
data<-GET(url,authenticate(username,token,type="basic"))
return(fromJSON(content(data),flatten=TRUE))
}
datazarData<-function(username,token,fileId) {
return(datazar(username,token,"files",fileId,"data"))
}

We’ll be using the datazarData function.

Here are the parameters we need:

  • username
  • token
  • fileId
myUsername<-"aman"
myToken<-"mysupersecrettokenthaticantshow"
fileId<-"f7cb0a20c-2f1c-4ad5-9d05-900d7af97a9c"
data<-datazarData(myUsername,myToken,fileId)

That’s it! All done. There’s no need to parse the JSON since the datazarData function takes care of it. Let’s go ahead and plot it so we can see what it looks like.

plot(data,"Year","Population")
R Plot of the streamed dataset.

Conclusion

We went over how to stream datasets directly from the cloud. This method uses HTTP “Basic Authentication” and secures your connection to the Datazar API while you’re streaming your datasets.

I have included both the R script in a project to you can use that one if you want to.

R Script link.

Just modify the parameters to your own Datazar username and token. Using this as best practice will ensure your data is always in one location and you or your colleagues will never have to change dataset location-pointers in your scripts.

Hope you enjoyed this! Feel free to ask questions if you’re stuck somewhere.

Note: there’s a related post on how to do the exact same thing with Mathematica.


Streaming Cloud Data to R was originally published in Datazar on Medium, where people are continuing the conversation by highlighting and responding to this story.

To leave a comment for the author, please follow the link and comment on their blog: R Language in Datazar on Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)