340 search results for "hadoop"

In case you missed it: July 2012 Roundup

August 10, 2012
By

In case you missed them, here are some articles from June of particular interest to R users. The Environmental Performance Index website uses R to rank countries by measures like environmental health and ecosystem vitality. A log-linear regression in R predicted the gold-winning Olympic 100m sprint time to be 9.68 seconds (it was actually 9.63 seconds). Some R-related talks...

Read more »

Adventures at My First JSM (Joint Statistical Meetings) #JSM2012

August 6, 2012
By
Adventures at My First JSM (Joint Statistical Meetings) #JSM2012

During the past few decades that I have been in graduate school (no, not literally) I have boycotted JSM on the notion that “I am not a statistician.” Ok, I am a renegade statistician, a statistician by training. JSM 2012 was held in San Diego, CA, one of the best places to spend a week during the summer. This...

Read more »

Surveys continue to rank R #1 for Data Mining

August 3, 2012
By
Surveys continue to rank R #1 for Data Mining

KDnuggets recently posted its annual poll on data mining software, and the R language retains its #1 ranking as the most commonly-used software for data mining: R is now used by 52.5% of poll respondents, compared with 45% last year. Donnie Berkholz provides an analysis of the year-on-year trends for Redmonk. He provides the chart below, and notes "the...

Read more »

Data Parallelism Using Oracle R Enterprise

August 2, 2012
By

Modern computer processors are adequately optimized for many statistical calculations, but large data operations may require hours or days to return a result.  Oracle R Enterprise (ORE), a set of R packages designed to process large data computations in Oracle Database, can run many R operations in parallel, significantly reducing processing time. ORE supports parallelism through the transparency layer,...

Read more »

Edge Prediction in a Social Graph: My Solution to Facebook’s User Recommendation Contest on Kaggle

July 31, 2012
By
Edge Prediction in a Social Graph: My Solution to Facebook’s User Recommendation Contest on Kaggle

A couple weeks ago, Facebook launched a link prediction contest on Kaggle, with the goal of recommending missing edges in a social graph. I love investigating social networks, so I dug around a little, and since I did well enough to score one of the coveted prizes, I’ll share my approach here. (For some background, the contest provided...

Read more »

Big data, big analytics, big opportunity

July 30, 2012
By
Big data, big analytics, big opportunity

Data, data, every where Nor any byte to think The world today is awash with data. Corporations, governments, and individuals are busy generating petabytes of data on culture, economy, environment, religion, and society.  While data has become abundant and ubiquitous, data analysts needed to turn raw data into knowledge are in fact in short...

Read more »

Community Detection in Networks with R

Community Detection in Networks with R

I mainly post this visualization because I think it’s pretty. It reminds a little of the work by the famous Dutch painter Mondrian. The complete matrix can be found here. The plot is a heatmap of an adjacency matrix generated by a weighted dir...

Read more »

Revolution Newsletter: July 2012

July 25, 2012
By

The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full July edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Quick Start Program for Hadoop: Revolution Analytics makes it easy for data analysts and...

Read more »

Learning R has really made me appreciate SAS

July 25, 2012
By

For the past 18 months, it seems like all I’ve heard about in the digital marketing industry is “big data”, and with that, mentions of using Hadoop and R to solve these sorts of problems.  Why are these tools the … Continue reading →Learning R has really made me appreciate SAS is an article from randyzwitch.com,...

Read more »

Community Detection in Networks with R

Community Detection in Networks with R

I mainly post this visualization because I think it’s pretty. It reminds a little of the work by the famous Dutch painter Mondrian. The complete matrix can be found here. The plot is a heatmap of an adjacency matrix generated by a weighted dir...

Read more »