459 search results for "evaluation"

How Do You Know if Your Data Has Signal?

August 10, 2015
By
How Do You Know if Your Data Has Signal?

Image by Liz Sullivan, Creative Commons. Source: Wikimedia An all too common approach to modeling in data science is to throw all possible variables at a modeling procedure and “let the algorithm sort it out.” This is tempting when you are not sure what are the true causes or predictors of the phenomenon you are … Continue reading...

Read more »

Predicting Titanic deaths on Kaggle III: Bagging

August 9, 2015
By
Predicting Titanic deaths on Kaggle III: Bagging

This is the third post on prediction the deaths. The first one used randomforest, the second boosting (gbm). The aim of the third post was to use bagging. In contrast to the former posts I abandoned dplyr in this post. It gave some now you see now you ...

Read more »

Sensemaking in R: A Plenitude of Models Makes for Good Storytelling

August 3, 2015
By
Sensemaking in R: A Plenitude of Models Makes for Good Storytelling

"Sensemaking is a motivated, continuous effort to understand connections (which can be among people, places, and events) in order to anticipate their trajectories and act effectively."- Gary Klein, Brian Moon & Robert HoffmanMaking Sense of Sensema...

Read more »

Hockey Elbow and Other Response Time Injuries

July 29, 2015
By
Hockey Elbow and Other Response Time Injuries

You've heard of tennis elbow. Well, there's a non-sports, performance injury that I like to call hockey elbow. An example of such an "injury" is shown in Figure 1, which appeared in a recent computer performance analysis presentation. It's a reminder of how easy it is to become complacent when doing performance analysis and possibly end up reaching...

Read more »

The complete catalog of argument variations of select() in dplyr

July 28, 2015
By

When I read the dplyr vignette, I found a convenient way to select sequential columns such as select(data, year:day). Because I had inputted only column names to select() function, I was deeply affected by the convenient way. On closer inspection, I found that the select() function accepts many types of input. Here, I will enumerate the variety...

Read more »

Call for participation: AusDM 2015, Sydney, 8-9 August

July 23, 2015
By
Call for participation: AusDM 2015, Sydney, 8-9 August

************************************************************* The 13th Australasian Data Mining Conference (AusDM 2015) Sydney, Australia, 8–9 August 2015 URL: http://ausdm15.ausdm.org/ ************************************************************* The Australasian Data Mining Conference is devoted to the art and science of intelligent data mining: the meaningful analysis of (usually large) data … Continue reading →

Read more »

A Tutorial on Loops in R – Usage and Alternatives

July 21, 2015
By
A Tutorial on Loops in R – Usage and Alternatives

Introduction In this easy-to-follow R tutorial on loops we will examine the constructs available in R for looping, and how to make use of R’s vectorization feature to perform your looping tasks more efficiently. We will present a few looping examples; then criticize and deprecate these in favor of the most popular vectorized alternatives (amongst The post

Read more »

New package "SparkRext" – SparkR extension for closer to dplyr

July 11, 2015
By

Apache Spark is one of the hottest products in data science.Spark 1.4.0 has formally adopted SparkR package which enables to handle Spark DataFrames on R.SparkR is very useful and powerful.One of the reasons is that SparkR DataFrames present an API sim...

Read more »

Stress testing

July 6, 2015
By
Stress testing

Lately, we've been spending a lot of time "stress-testing" our method for the computation of the Expected Value of Partial Perfect Information (EVPPI $-$ I know: the terminology is a bit strange and possibly not-very helpful, as "perfect" information d...

Read more »

Google Summer of Code: Midterm Re-cap

July 5, 2015
By
Google Summer of Code: Midterm Re-cap

This is the fastest 6 weeks that I have ever experienced but the most productive in the development of my project GEO-AWS: Gene Expression Omnibus Analysis for the R Project for Statistical Computing. I have been exposed to some really great tidbits of lessons in open-source software development that I either wouldn’t have known about … Continue reading...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)