591 search results for "SQL"

Build a search engine in 20 minutes or less

March 27, 2013
By
Build a search engine in 20 minutes or less

…or your money back. author = "Ben Ogorek"Twitter = "@baogorek"email = paste0(sub("@", "", Twitter), "@gmail.com") Setup Pretend this is Big Data: doc1 <- "Stray cats are running all over the place. I see 10 a day!"doc2 <- "Cats are killers. They...

Read more »

Python vs R vs SPSS … Can’t All Programmers Just Get Along?

March 26, 2013
By
Python vs R vs SPSS … Can’t All Programmers Just Get Along?

Programmers have long been very proud and loyal with their tools, and often very vocal. This has led to well-contested rivalries and “fights” about which tool is better: emacs or vi; Java or C++; Perl or Python; Django or Rails; … Continue reading →The post Python vs R vs SPSS … Can’t All Programmers Just Get Along?...

Read more »

Massive online data stream mining with R

Massive online data stream mining with R

A few weeks ago, the stream package has been released on CRAN. It allows to do real time analytics on data streams. This can be very usefull if you are working with large datasets which are already hard to put in RAM completely, let alone to build some statistical model on it without getting into RAM problems. Most of...

Read more »

Baseball Statistics with R – Batting Average

March 18, 2013
By
Baseball Statistics with R – Batting Average

I'm working on a new book about the R programming language. R is a language that is designed for use with statistics and data. I use it to analyze sports and social networking. I thought that it would be fun to write the book focusing on baseball statistics using data from Major League Baseball. This post...

Read more »

column-store R or: how i learned to stop worrying and love monetdb

March 18, 2013
By

"Combining R's sophisticated calculations and MonetDB's excellent data access performance is a no-brainer. One gets the best of two (open source) worlds with minimal hassle." - Dr. Hannes Mühleisen"oh wow that was fast like a cheetah with a jetpack or something" - anthony damicowhy try monetdb + ra speed test of four analysis commands on sixty-seven million...

Read more »

Using maps and ggplot2 to visualize college hockey championships

March 13, 2013
By
Using maps and ggplot2 to visualize college hockey championships

Short: I plot the frequency of college hockey championships by state using the maps package, and ggplot2 Note: this example is based heavily on the example provided athttp://www.dataincolour.com/2011/07/maps-with-ggplot2/ data reference:http://en.wikipedia.org/wiki/NCAA_Men%27s_Ice_Hockey_Championship Question of interestAs a good Minnesotan, I've believed for quite some time that the colder, Northern states enjoy a competitive advantage when it...

Read more »

Getting flexible with SAP HANA

Getting flexible with SAP HANA

Most of you might not be aware of a feature introduced on SAP HANA SPS5. This new feature is called "Flexible Tables", which means that you can define a table that will grow depending on your needs. Let's see an example...You define a table with ID, NA...

Read more »

ddply in action

March 7, 2013
By
ddply in action

Top Batting Averages Over Time Top Batting Averages Over Time reference:http://www.baseball-databank.org/ ShortI'm going to use plyr and ggplot2 to look at how top batting averages have changed over time First load the data: options(width = 100)library(ggplot2) ## Warning message: package 'ggplot2' was built under R version 2.14.2 library(plyr)data(baseball)head(baseball) ## ...

Read more »

Shading and Points with xtsExtra plot.xts

February 28, 2013
By
Shading and Points with xtsExtra plot.xts

For some reason, I feel like have much better control with plot.xts function from the xtsExtra package described here over some of the other more refined R graphical packages. Maybe, it is just my simple mind, but recently I wanted to shade holding per...

Read more »

New ways to Hadoop with R

February 26, 2013
By

Today, there are two main ways to use Hadoop with R and big data: 1. Use the open-source rmr package to write map-reduce tasks in R (running within the Hadoop cluster - great for data distillation!) 2. Import data from Hadoop to a server running Revolution R Enterprise, via Hbase, ODBC (for high-performance Hadoop/SQL interfaces), or streaming data direct...

Read more »