This post provides technical details about the making of my book “Displaying time series, spatial, and space-time data with R”, …Sigue leyendo →

It has been several months since my last post on classification tree models, because two things have been consuming all of my spare time. The first is that I taught a night class for the University of Connecticut’s Graduate School of Business, introducing R to students with little or no prior exposure to either R or programming. My hope...

The tf-idf-statistic (“term frequency – inverse document frequency”) is a common tool for the purpose of extracting keywords from a document by not just considering a single document but all documents from the corpus. In terms of tf-idf a word … Continue reading → The post The tf-idf-Statistic For Keyword Extraction appeared first on joy...

We are excited to announce the general availability of RStudio Shiny Server Pro. Shiny Server Pro is the simplest way for data scientists and R users in the enterprise to share their work with colleagues. With Shiny Server Pro you can: Secure access to Shiny applications with authentication systems such as LDAP and Active Directory

In most languages return is a statement, but in R it is a function (in fact R does not really have statements, it only has expressions). This function-like behavior of return is useful for figuring out the order in which operations are performed, e.g., the value returned by return(1)+return(2) tells us that binary operators are

In the previous post, I introduced the logic of Bayes factors for one-sample designs by means of a simple example. In this post, I will give more detail about the models and assumptions used by the BayesFactor package, and also how to do simple analyses of two- sample designs.See the previous posts for background: What is a...

by Oliver Vagner, Cloud Solutions Lead Architect at Revolution Analytics Today, I am pleased to announce our new offering in the Amazon Web Services Big Data Marketplace – Revolution R Enterprise 7 for AWS. Of course, if you follow this blog, then you are quite familiar with Revolution R Enterprise (RRE) and what it brings to the table with...

As a data scientist I have seen variations of principal component analysis and factor analysis so often blindly misapplied and abused that I have come to think of the technique as unprincipled component analysis. PCA is a good technique often used to reduce sensitivity to overfitting. But this stated design intent leads many to (falsely) Related posts:

The microbenchmark package is a popular way of comparing the time it takes to evaluate different R expressions — perhaps more popular than the alternative of just using system.time to see how long it takes to execute a loop that evaluates an expression many times. Unfortunately, when used in the usual way, microbenchmark can give inaccurate

In two weeks I am presenting a workshop at the University of Granada (Spain) on Automatic Time Series Forecasting. Unlike most of my talks, this is not intended to be primarily about my own research. Rather it is to provide a state-of-the-art overview of the topic (at a level suitable for Masters students in Computer Science). I thought I’d provide...