The Rise of the Samurai Pitcher

October 2, 2014
By
The Rise of the Samurai Pitcher

Masahiro Tanaka stands on the mound, rubbing the ball vigorously between his hands. It's a crisp, cool night in the Bronx. Stepping back, he digs his right foot into the rubber, winds up and, with a seven-foot stretch, steps towards the catcher, unleashing a blistering four-seam, 95 mph fastball. Less than half a second later, it explodes into the catcher's...

Read more »

R and Data Science Webinar

October 2, 2014
By

by Joseph Rickert Recently, I had the opportunity to present a webinar on R and Data Science. The challenge with attempting this sort of thing is to say something interesting that does justice to the subject while being suitable for an audience that may include both experienced R users and curious beginners. The approach I settled on had three...

Read more »

Announcing the Publication of Practical Data Science Cookbook

October 2, 2014
By
Announcing the Publication of Practical Data Science Cookbook

Four of DC2′s board members have published a new book! Tony Ojeda, Sean Murphy, Benjamin Bengfort, and Abhijit Dasgupta are proud to announce the arrival of Practical Data Science Cookbook (Packt, $10 ebook or $49.99 print+ebook). Practical Data Science Cookbook is perfect for … Continue reading → The post Announcing the Publication of Practical Data Science Cookbook appeared first on...

Read more »

devtools 1.6

October 2, 2014
By
devtools 1.6

Devtools 1.6 is now available on CRAN. Devtools makes it so easy to build a package that it becomes your default way to organise code, data and documentation. Learn more at http://r-pkgs.had.co.nz/. You can get the latest version with: install.packages("devtools") We’ve made a lot of improvements to the install and release process: Installation functions now

Read more »

Building a DGA Classifer: Part 2, Feature Engineering

October 2, 2014
By
Building a DGA Classifer: Part 2, Feature Engineering

This is part two of a three-part blog series on building a DGA classifier and it is split into the three phases of building a classifier: 1) Data preperation 2) Feature engineering and 3) Model selection. Back in part 1, we prepared the data and we are starting with a nice clean list of domains labeled as either legitamate (“legit”) or generated by...

Read more »

R Crash Course, 4/11 October 2014

October 2, 2014
By

(This article was first published on Rmetrics blogs, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on his blog: Rmetrics blogs. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web...

Read more »

Society for Judgment and Decision Making: Who Are We (Part 1)

October 1, 2014
By
Society for Judgment and Decision Making: Who Are We (Part 1)

An analysis of the academic departments of SJDM society members. The post Society for Judgment and Decision Making: Who Are We (Part 1) appeared first on Decision Science News.

Read more »

Vector Search vs. Binary Search

October 1, 2014
By
Vector Search vs. Binary Search

Read more »

Cross Validation for Kernel Density Estimation

October 1, 2014
By
Cross Validation for Kernel Density Estimation

In a post publihed in July, I mentioned the so called the Goldilocks principle, in the context of kermel density estimation, and bandwidth selection. The bandwith should not be too small (the variance would be too large) and it should not be too large (the bias would be too large). Another standard method to select the bandwith, as mentioned...

Read more »

New York Times approachably describes Bayesian Statistics

October 1, 2014
By

The New York Times published an article of interest to statisticians the other day: "The Odds, Continually Updated". Surprisingly for a general-audience newspaper, this article goes into the the distinctions between Bayesian and frequentist statistics, and does so in a very approachable way. Here's an excerpt: The essence of the frequentist technique is to apply probability to data. If...

Read more »

Got a ticket for the runoff?

October 1, 2014
By
Got a ticket for the runoff?

This is one of the very last posting before the election next Sunday. So far, the only certainty is the runoff ticket of the incumbent candidate, Dilma Rousseff (PT). The runner up candidates, the environmentalist Marina Silva (PSB) and the Social Democrat Aecio Neves are walking to a neck-and-neck dispute over the last spin. Although … Read More...

Read more »

Working with NIfTI images in R

October 1, 2014
By
Working with NIfTI images in R

The oro.nifti package is awesome for NeuRoimaging (couldn't help myself). It has functions to read/write images, introduces the S4 nifti class, and has useful plotting functions. There are some limitations and some gotchas that are important to discuss if you are working with these objects in R. Dataset Creation We'll read in some data (a

Read more »

Transparent hurricane paths in R

October 1, 2014
By
Transparent hurricane paths in R

Arthur Charpentier has written a really nice blog post about obtaining hurricane tracks and plotting them. He then goes on to do other clever Markov process models, but as a dataviz guy who knows almost nothing about meteorology, I want to … Continue reading →

Read more »

New fiscal sponsorship agreement with NumFocus foundation

October 1, 2014
By
New fiscal sponsorship agreement with NumFocus foundation

I’m very pleased to announce that rOpenSci has signed a comprehensive fiscal sponsorship agreement with the NumFocus foundation, a 501(c)3 nonprofit that supports R&D for open source scientific software projects. We are delighted to be in the company of esteemed projects such as IPython and Julia that share our goal of promoting reproducible research practices...

Read more »

Complex Domain Coloring

September 30, 2014
By
Complex Domain Coloring

Why don’t you stop doodling and start writing serious posts in your blog? (Cecilia, my beautiful wife) Choose a function, apply it to a set of complex numbers, paint  the result using the HSV technique and be ready to be impressed because images can be absolutely amazing. You only need ggplot2 package and your imagination. This is what happens

Read more »

Structured simulation of regression models – simReg package.

September 30, 2014
By

I'd like to introduce a package that simulates regression models. This includes both single level and multilevel (i.e. hierarchical or linear mixed) models up to two levels of nesting. The package produces a unified framework to simulate all types of c...

Read more »

Install R in Android, via GNURoot -no root required!

September 30, 2014
By
Install R in Android, via GNURoot -no root required!

Playing with my tablet some time ago, I wondered if installing R could be possible. You know, a small android device “to the power of R”… After searching on Google from time to time, I came across some interesting possibilities: … Sigue leyendo →

Read more »

Interactive Visualisation of the Profitable Amount of Waste to Dispose Illegally

September 30, 2014
By
Interactive Visualisation of the Profitable Amount of Waste to Dispose Illegally

"Wow!" - I said to myself after reading R Helps With Employee Churn post - "I can create interactive plots in R?!!! I have to try it out!" I quickly came up with an idea of creating interactive plot for my simple model for assessment of the profitable ratio between the volume waste that could be illegally...

Read more »

Generating Hurricanes with a Markov Spatial Process

September 30, 2014
By
Generating Hurricanes with a Markov Spatial Process

The National Hurricane Center (NHC) collects datasets with all  storms in North Atlantic, the North Atlantic Hurricane Database (HURDAT). For all sorms, we have the location of the storm, every six jours (at midnight, six a.m., noon and six p.m.). Note that we have also the date, the maximal wind speed – on a 6 hour window – and...

Read more »

Meet us at R Day and at the Strata+Hadoop World NYC Oct 15-17, 2014

September 30, 2014
By
Meet us at R Day and at the Strata+Hadoop World NYC Oct 15-17, 2014

Are you headed to Strata? It’s just around the corner! We particularly hope to see you at R Day on October 15, where we will cover a raft of current topics that analysts and R users need to pay attention to. The R Day tutorials come from Hadley Wickham, Winston Chang, Garrett Grolemund, J.J. Allaire, and

Read more »

Additional tips for structuring an individual-based model in R

September 30, 2014
By
Additional tips for structuring an individual-based model in R

 I had a reader ask me recently to help understand how to modify the code of an individual-based model (IBM) that I posted a while back. It was my first attempt at an IBM in R, and I realized that I have made some significant changes to the way th...

Read more »

Why are we still teaching T-tests?

September 30, 2014
By

The following post by Norm Matloff originally appeared on his blog, Mad(Data)Scientist, on September 15th. We rarely republish posts that have appeared on other blogs, however, the questions that Norm raises both with respect to the teaching of statistics, and his assertion that "R's statistical procedures are centered far too much on significance testing" deserve a second look. Moreover,...

Read more »

Building a DGA Classifier: Part 1, Data Preparation

September 30, 2014
By

This will be a three-part blog series on building a DGA classifier and will be split into three logical phases of building a classifier: 1) Data preparation (this) 2) Feature engineering and 3) Model selection. And before I get too far into this, I want to give a huge thank you to Click Security for releasing a DGA classifier in python as part of...

Read more »

Example 2014.11: Contrasts the basic way for R

September 30, 2014
By
Example 2014.11: Contrasts the basic way for R

As we discuss in section 6.1.4 of the second edition, R and SAS handle categorical variables and their parameterization in models quite differently. SAS treats them on a procedure-by-procedure basis, which leads to some odd differences in capabilities and default parameterizations. For example, in the logistic procedure, the default is effect cell coding, while in the genmod...

Read more »

Structural Arb Analysis and Portfolio Management Functionality in R

September 30, 2014
By
Structural Arb Analysis and Portfolio Management Functionality in R

I want to use this post to replicate an article I found on SeekingAlpha, along with demonstrating PerformanceAnalytics’s ability to … Continue reading →

Read more »

Syrian Refugee Settlement Clinic Locations

September 30, 2014
By
Syrian Refugee Settlement Clinic Locations

Previously I posted about the location of refugee settlements and how that had grown in density over time as well as in numbers.  As many NGOs and non-profits work in the area, they are providing much needed assistance to the people living around ...

Read more »

Running RStudio via Docker in the Cloud

September 30, 2014
By
Running RStudio via Docker in the Cloud

Deploying applications via Docker container is the current talk of town. I have heard about Docker and played around with it a little, but when Dirk Eddelbuettel posted his R and Docker talk last Friday I got really excited and had to have a go myself....

Read more »

seeking altruistic social scientists, demographers, survey researchers

September 30, 2014
By
seeking altruistic social scientists, demographers, survey researchers

hi everyone, please share this:  if you are an experienced user of a publicly-available survey data set from any country or international organization, let's work together on some user-friendly code and a short blog post for http://asdfree.com.&nb...

Read more »

Rcpp 0.11.3

September 29, 2014
By

A new release 0.11.3 of Rcpp is now on the CRAN network for GNU R, and an updated Debian package has been uploaded too. Rcpp has become the most popular way of enhancing GNU R with C++ code. As of today, 273 packages on CRAN depend on Rcpp for making...

Read more »