June 2017

New rOpenSci Packages for Text Processing in R

June 13, 2017 | Jeroen Ooms

Textual data and natural language processing are still a niche domain within the R ecosytstem. The NLP task view gives an overview of existing work however a lot of basic infrastructure is still missing. At the rOpenSci text workshop in April we discussed many ideas for improving text processing in ... [Read more...]

Joining Tables in SparkR

June 12, 2017 | statcompute

[This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. library(SparkR, lib.loc = paste(Sys.getenv("SPARK_HOME"), "/R/lib", sep = "")) sc <- sparkR.session(master = "local") df1 <- read.df("nycflights13.csv", source = "csv", header = "true", inferSchema = "true") grp1 <- groupBy(filter(df1, "month in (1, 2, 3)"), "month") sum1 <- withColumnRenamed(agg(grp1, min_dep = min(df1$dep_delay)), "month", "month1") grp2 <- groupBy(filter(df1, "month in (2, 3, 4)"), "month") sum2 <- withColumnRenamed(agg(grp2, max_dep = max(df1$dep_delay)), "month", "month2") # INNER JOIN showDF(merge(sum1, sum2, by.x = "month1", by.y = "month2", all = FALSE)) showDF(join(sum1, sum2, sum1$month1 == sum2$month2, "inner")) #+------+-------+------+-------+ #|month1|min_dep|month2|max_dep| #+------+-------+------+-------+ #| 3| -25| 3| 911| #| 2| -33| 2| 853| #+------+-------+------+-------+ # LEFT JOIN showDF(merge(sum1, sum2, by.x = "month1", by.y = "month2", all.x = TRUE)) showDF(join(sum1, sum2, sum1$month1 == sum2$month2, "left")) #+------+-------+------+-------+ #|month1|min_dep|month2|max_dep| #+------+-------+------+-------+ #| 1| -30| null| null| #| 3| -25| 3| 911| #| 2| -33| 2| 853| #+------+-------+------+-------+ # RIGHT JOIN showDF(merge(sum1, sum2, by.x = "month1", by.y = "month2", all.y = TRUE)) showDF(join(sum1, sum2, sum1$month1 == sum2$month2, [...] [Read more...]

RcppMsgPack 0.1.1

June 12, 2017 | Thinking inside the box

A new package! Or at least new on CRAN as the very initial version 0.1.0 had been available via the ghrr drat for over a year. But now we have version 0.1.1 to announce as a CRAN package. RcppMspPack provides R with MessagePack header files for use v... [Read more...]

thinning a Markov chain, statistically

June 12, 2017 | xi'an

Art Owen has arXived a new version of his thinning MCMC paper, where he studies how thinning or subsampling can improve computing time in MCMC chains. I remember quite well the message set by Mark Berliner and Steve MacEachern in an early 1990’s paper that subsampling was always increasing the ... [Read more...]

Interfacing with APIs using R: the basics

June 12, 2017 | David Smith

While R (and its package ecosystem) provides a wealth of functions for querying and analyzing data, in our cloud-enabled world there's now a plethora of online services with APIs you can use to augment R's capabilities. Many of these APIs use a RESTful interface, which means you will typically send/... [Read more...]

Workshop on Monetizing R Packages

June 12, 2017 | Ari Lamstein

Last week I gave a talk at the San Francisco EARL Conference about monetizing R packages. The talk was well received, so this Thursday at 9am... The post Workshop on Monetizing R Packages appeared first on AriLamstein.com.
[Read more...]

Clustering

June 12, 2017 | realdataweb

Hello, everyone! I’ve been meaning to get a new blog post out for the past couple of weeks. During that time I’ve been messing around with clustering. Clustering, or cluster analysis, is a method of data mining that groups similar observations together. Classification and clustering are quite alike, ...
[Read more...]

LASSO regression in R exercises

June 12, 2017 | Bassalat Sajjad

Lease Absolute Shrinkage and Selection Operator (LASSO) performs regularization and variable selection on a given model. Depending on the size of the penalty term, LASSO shrinks less relevant predictors to (possibly) zero. Thus, it enables us to consider a more parsimonious model. In this exercise set we will use the ... [Read more...]

RStudio meets MilanoR! June 29th, Milan

June 12, 2017 | MilanoR

Hello R-Users, we have a great news! We are going to host Nathan Stephens and Joseph Rickert from RStudio and R Consortium: they are coming to Milano just for us (from USA) to meet the MilanoR community and talk about the latest news from RStudio and R Consortium The post ... [Read more...]

Superstorm Sandy at Barnegat Bay Revisted

June 11, 2017 | AdventuresInData

Animations of continuous data in GIF format offer some portability advantages over video files.  A few years ago, shortly after Superstorm Sandy, a colleague and I developed a video of animated water surface elevations from USGS gages in Barnegat Bay, NJ as the eye of the storm approached.  That version ... [Read more...]

Exit poll for June 2017 election (UK)

June 11, 2017 | David Firth

It has been a while since I posted anything here, but I can’t resist this one. Let me just give three numbers.  The first two are: 314, the number of seats predicted for the largest party (Conservatives) in the UK House of Commons, at 10pm in Thursday (i.e., before ...
[Read more...]
1 8 9 10 11 12 16

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)