Function to Read NDJSON (Newline Deliminated JSON) Files

July 3, 2017
By

(This article was first published on RLang.io | R Language Programming, and kindly contributed to R-bloggers)

Notice

jsonlite has a stream_in() function that works much better and faster. Do not use this

I ended up writing this while working on a web scraper for a lyrics website and thought it might be useful to some people. This is still a generic solution, but it probably won’t be of help unless you are working with the ndjson files and don’t want to rely on unnecessary libraries. The only library it needs is jsonlite, which is a fantastic library.

library("jsonlite")

As far as the function, it is pretty simple and won’t really know how you want nested values to work so you may have to modify the function to fit your needs.

#The function
read_ndjson = function(filename) {
    #Used to create matrix upon reading first row
    line <- 0;
    #Passed In Filename
    con = file(filename, "r")
    while(TRUE) {
        #Here we go
        json = readLines(con, n = 1)
        #No lines left
        if(length(json) == 0) {
            break
        }
        #Simplify so a vector is returned instead of list
        row <- fromJSON(json, simplifyVector = TRUE)
        #Create initial matrix
        if(line == 0) {
            ndjson <- matrix(nrow = 0,ncol = length(row))
        }
        #This could be done better
        ndjson <- rbind(ndjson,row)
        line <- line + 1
    }
    close(con)
    return(ndjson)
}
#Calling the function
discography <- data.frame(read_ndjson("./log.json"))

While this might not come in handy as often as some other functions, I am keeping this in my toolchain. Still working on converting the lyrics scraper to R (it is currently written in Node), but if there is interest I can post that as well.

To leave a comment for the author, please follow the link and comment on their blog: RLang.io | R Language Programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)