Web data acquisition: the structure of RCurl request (Part 2)

March 2, 2017

(This article was first published on R-posts.com, and kindly contributed to R-bloggers)

The acquisition of data in json structure presented in part 1 clearly showed the functioning of the client-server connection and the possibility to collect the data of interest. However, the json output appeares as a set of raw data in a json string that needs to be structured and stored in a suitable form for data processing and statistical analysis.

For this reason, it makes sense to develop the entire process using #R in order to have the data directly queried, collected, parsed, structured and made usable in a unique environment. Of course, this will be the one used in the process “last mile”, i.e. data analysis.

The curl library adopted in the command line process described in the previous post has its alter ego in the RCurl library. Together with jsonlite for ‘R-JSON translation’ these are the necessay packages for the development of the request as presented in the following code.

# before loading the libraries rememeber to install them - install.packages('library here')

# save the url of the request in an object (same as -X POST in the curl request)

url <- 'https://www.googleapis.com/qpxExpress/v1/trips/search?key={SERVER_KEY}&alt=json'
# headers (same as -H)
headers <- list('Accept' = 'application/json', 'Content-Type' = 'application/json', 'charset' = 'UTF-8')

# R structure of the input for the request (same as -d + JSON)
x = list(
  request = list(
    slice = list(
      list(origin = 'FCO', destination = 'LHR', date = '2017-06-30')),
    passengers = list(adultCount = 1, infantInLapCount = 0, infantInSeatCount = 0, childCount = 0, seniorCount = 0),
    solutions = 500,
    refundable = F))

# url, headers and x are the parameters to be used in R functions to send the request
# and save the output data in the datajson object
# postForm is the RCurl function to send the request using the POST method
# toJSON is the jsonlite function to convert the R structure of the request in JSON input

datajson <- postForm(url, .opts=list(postfields=toJSON(x), httpheader=headers))

After few seconds from the POST request necessary to send the request and collect the response, all the information related to the flights with origin FCO (Fiumicino – Rome) and destination LHR (London Heathrow) will be hosted in the datajson object, similarly to the command line procedure. The json string holds and hides all the observations and variables of interest for the statistical analysis inlcuding the most important, i.e. the flight prices.

The next post will explain how to parse the json object and structure the information in a suitable dataframe for analysis using the powerful library #tidyjson.

#R #rstats #maRche #json #curl #qpxexpress #Rbloggers

This post is also shared in www.r-bloggers.com and LinkedIn

To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)