Using Metadata to find Paul Revere

June 8, 2013
By
Using Metadata to find Paul Revere

London, 1772. I have been asked by my superiors to give a brief demonstration of the surprising effectiveness of even the simplest techniques of the new-fangled Social Networke Analysis in the pursuit of those who would seek to undermine the liberty enjoyed by His Majesty’s subjects. This is in connection with the discussion of the role of “metadata” in

Read more »

Using Metadata to find Paul Revere

June 8, 2013
By
Using Metadata to find Paul Revere

London, 1772. I have been asked by my superiors to give a brief demonstration of the surprising effectiveness of even the simplest techniques of the new-fangled Social Networke Analysis in the pursuit of those who would seek to undermine the liberty enjoyed by His Majesty’s subjects. This is in connection with the discussion of the role of “metadata” in

Read more »

Bulk search for domain names using R

June 8, 2013
By

# There are several domain name servers that allow # for bulk searching of domain names.# http://www.godaddy.com/bulk-domain-search.aspx# http://www.namestation.com/bulk-domain-search# However, they do not provide any wildcard support # and instead exp...

Read more »

Matrix Operations

June 8, 2013
By

Matrix manipulation in R are very useful in Linear Algebra. Below are lists of common yet important functions in dealing operations with matrices:Transpose - tMultiplication - %*%Determinant - detInverse - solve, or ginv of MASS libraryEigenvalues and ...

Read more »

R and MongoDB

June 7, 2013
By
R and MongoDB

MongoDB is a document-based noSQL database. Different from the relational database storing data in tables with rigid schemas, MongoDB stores data in documents with dynamic schemas. In the demonstration below, I am going to show how to extract data from a MongoDB with R. Before starting the R session, we need to install the MongoDB

Read more »

Hey, I Just did a Significance Test!

June 7, 2013
By
Hey, I Just did a Significance Test!

I’ve seen it happens quite often. The sig test. Somebody simply needs to know the p-value and that one number will provide all of the information about the study that they need to know. The dataset is presented and the client/boss/colleague/etc invariably asks the question “is it significant?” and “what’s the correlation?”. To quote R.A.

Read more »

Robust logistic regression

June 7, 2013
By

Corey Yanofsky writes: In your work, you’ve robustificated logistic regression by having the logit function saturate at, e.g., 0.01 and 0.99, instead of 0 and 1. Do you have any thoughts on a sensible setting for the saturation values? My intuition suggests that it has something to do with proportion of outliers expected in the The post Robust...

Read more »

Crayfish or crawdad? Mapping US dialect variations with R

June 7, 2013
By
Crayfish or crawdad? Mapping US dialect variations with R

I grew up in Australia, where I learned to speak English. Or so I thought: when I moved overseas to the UK, and especially when I moved to the States, I soon learned these are distinct cultures separated by a common language. Words which I previously had no context for being different anywhere else, such as "runners" ("sneakers"), "lemonade"...

Read more »

The Rcpp Book is now shipping

My book about Rcpp (and its R and C++ integration) is now available from Springer. Amazon still lists it as not-yet-released; I expect this to change in the next few days.

Read more »

Happy Birthday rasterVis!

Happy Birthday rasterVis!

Two years ago the first version of rasterVis was submitted to R-Forge and some weeks after the first stable version was …Continuar leyendo »

Read more »

A Shiny App Goes Viral

June 7, 2013
By
A Shiny App Goes Viral

I am not sure how many of you have seen this Business Insider article.  It is basically about a shiny app created by Joshua Katz as NC State.  It is really fun playing with shiny app.With nearly a million facebook likes this web app buil...

Read more »

Income Distribution in London

June 7, 2013
By
Income Distribution in London

Inspired by the Institute of Fiscal Studies' "Where do you fit in" application, where people can find out their position in the UK's income distribution, I wanted to find out how the picture in London looks like. Quite different. If you are in a very high percentile nationwide, high incomes of mainly financial sector employees in London...

Read more »

Symmetric set differences in R

June 7, 2013
By

My .Rprofile contains a collection of convenience functions and function abbreviations. These are either functions I use dozens of times a day and prefer not to type in full:## my abbreviation of head() h Or problems that I'd rather figure out once, and only once: ## example: ## between( 1:10, 5.5, 6.5 ) between = low & x low & x...

Read more »

Comrades Marathon Attrition Rate

June 7, 2013
By
Comrades Marathon Attrition Rate

It is a bit of a mission to get the complete data set for this year’s Comrades Marathon. The full results are easily accessible, but come as an HTML file. Embedded in this file are links to the splits for individual athletes. So with a bit of scripting wizardry it is also possible to download

Read more »

Creating Catch Data from Individual Length Measurements

June 6, 2013
By
Creating Catch Data from Individual Length Measurements

This example has been updated in this post. I came across a “problem” today where I needed to create catch data for individual nets from length measurements made on individual fish in those nets.  In other words, I had data … Continue reading →

Read more »

Data Class Conversion

June 6, 2013
By

Data in R can be converted from one class to the other. The function is prefixed with as. then followed by the name of the data class that we wish to convert to. Data class in R are the following:numeric - as.numericvector - as.vectorcharacter - as.cha...

Read more »

How likely is the NSA PRISM program to catch a terrorist?

June 6, 2013
By
How likely is the NSA PRISM program to catch a terrorist?

Recent revelations about PRISM, the NSA’s massive program of surveillance of civilian communications have caused quite a stir. And rightfully so, as it appears that the agency has been granted warrantless direct access to just about any form of digital communication engaged in by American citizens, and that their access to such data has been

Read more »

Feature Selection 3 – Swarm Mentality

June 6, 2013
By
Feature Selection 3 – Swarm Mentality

"Bees don't swarm in a mango grove for nothing. Where can you see a wisp of smoke without a fire?" - Hla Stavhana In the last two posts, genetic algorithms were used as feature wrappers to search for more effective subsets of predictors. Here, I will do the same with another type of search algorithm: particle swarm optimization....

Read more »

Intro to Parallel Random Number Generation with RevoScaleR

June 6, 2013
By
Intro to Parallel Random Number Generation with RevoScaleR

by Joseph Rickert Random number generation is fundamental to doing computational statistics. As you might expect, R is very rich in random number resources. The R base code provides several high quality random number generators including: Wichmann-Hill, Marsaglia-Multicarry, Super-Duper, Mersenne-Twister, Knuth-TAOCP-2002 and L’Ecuyer-CMRG. (See Random for details.) And, there are at least three packages, rspring, rlecuyer, and rstream for...

Read more »

Box-plot with R – Tutorial

June 6, 2013
By
Box-plot with R – Tutorial

Uncertain Demand Forecasting and Inventory Optimizing for Short-life-cycle Products

June 6, 2013
By

For short-life-cycle products such as newspapers and fashion, it is important to match the supply with the demand. However, sometimes we order too little from supplier and sometimes we order too much due to the uncertain demand. We would lose sales and customers would be unsatisfied if ordering too little or we would let the

Read more »

Inputting Data in Matrix Format

June 6, 2013
By

Matrix in R is formed using matrix, rbind, or cbind function. These functions have the following descriptions:matrix - used to transform a concatenated data into matrix form of compatible dimensions. rbind - short for row bind, that binds a conca...

Read more »

At what sample size do correlations stabilize?

June 6, 2013
By
At what sample size do correlations stabilize?

Maybe you have encountered this situation: you run a large-scale study over the internet, and out of curiosity, you frequently check the correlation between two variables. My experience with this practice is usually frustrating, as in small sample sizes (and we will see what “small” means in this context) correlations go up and down, change sign,

Read more »

Hillslope Position by Soil Series

June 5, 2013
By

Soil survey data are typically built upon a foundation of soil-landscape relationships that have been verified in the field. SSURGO data contain several geomorphic descriptions of landscape, landform, hillslope position, and surface shape for each...

Read more »

KDNuggets 2013 software poll results

June 5, 2013
By
KDNuggets 2013 software poll results

The results of the 2013 KDNuggets software poll are in, with RapidMiner and R in a near-tie for first place. Of a record 1880 respondents, 737 reported using Rapid-I RapidMiner/RapidAnalytics, and 704 reported using R. Excel came in third: with 527 respondents, it was the lone commercial tool in the top 5. You can see the top 10 responses...

Read more »

Running R Scripts Directly From Dropbox

June 5, 2013
By

I have written a little function that allows users to run R scripts out of Dropbox directly from any location.  It was aided by this post on biobucket.  The reason I am particularly interested in this feature is because I am often using a ser...

Read more »

Hillslope Position by Soil Series

June 5, 2013
By

Soil survey data are typically built upon a foundation of soil-landscape relationships that have been verified in the field. read more

Read more »

Oracle R Distribution for R 2.15.3 is released

June 5, 2013
By
Oracle R Distribution for R 2.15.3 is released

We are pleased to announce that Oracle R Distribution (ORD) for R 2.15.3 is available for download today. This update consists of mostly minor bug fixes, and is the final release of the R 2.x series. Oracle recommends using yum to install ORD from our public yum server.  To install...

Read more »

The Frisch–Waugh–Lovell Theorem for Both OLS and 2SLS

June 5, 2013
By
The Frisch–Waugh–Lovell Theorem for Both OLS and 2SLS

The Frisch–Waugh–Lovell (FWL) theorem is of great practical importance for econometrics. FWL establishes that it is possible to re-specify a linear regression model in terms of orthogonal complements. In other words, it permits econometricians to partial out right-hand-side, or control, variables. This is useful in a variety of settings. For example, there may be cases

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de







ODSC

ODSC

CRC R books series





Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.