561 search results for "SQL"

The myth of the missing Data Scientist

January 7, 2013
By
The myth of the missing Data Scientist

Much has been said about the dire shortage of Data Scientists looming on the horizon. With the spectre of Big …Continue reading »

Read more »

analyze the medical expenditure panel survey (meps) with r

January 7, 2013
By

the meps household component leads the pack for examining individual-level medical expenditures by payor and type of service.  total expenditures captured by the survey tend to be low, but unbiased across the board and can be adjusted to match the...

Read more »

Batch forecasting in R

January 6, 2013
By

I sometimes get asked about forecasting many time series automatically. Here is a recent email, for example: I have looked but cannot find any info on generating forecasts on multiple data sets in sequence. I have been using analysis services for sql server to generate fitted time series but it is too much of a black box (or I...

Read more »

Search and replace: Are you tired of nested `ifelse`?

January 6, 2013
By

It happens all the time: you have a vector of fruits and you want to replace all bananas with apples, all oranges with pineapples, and leave all the other fruits as-is, or maybe change them all to figs. The usual solution? A big old nested `ifelse`: ...

Read more »

100 most read R posts in 2012 (stats from R-bloggers) – big data, visualization, data manipulation, and other languages

January 2, 2013
By
100 most read R posts in 2012 (stats from R-bloggers) – big data, visualization, data manipulation, and other languages

R-bloggers.com is now three years young. The site is an (unofficial) online journal of the R statistical programming environment, written by bloggers who agreed to contribute their R articles to the site. Last year, I posted on the top 24...

Read more »

The (near) Future of Data Analysis – A Review

January 2, 2013
By
The (near) Future of Data Analysis – A Review

Sean Murphy co-organizes Data Business DC, among many other things. Hadley Wickham, having just taught workshops in DC for RStudio, shared with the DC R Meetup his view on the future, or at least the near future of Data Analysis. … Continue reading → The post The (near) Future of Data Analysis – A Review appeared first on...

Read more »

Efficiecy of Extracting Rows from A Data Frame in R

January 1, 2013
By
Efficiecy of Extracting Rows from A Data Frame in R

In the example below, 552 rows are extracted from a data frame with 10 million rows using six different methods. Results show a significant disparity between the least and the most efficient methods in terms of CPU time. Similar to the finding in my previous post, the method with data.table package is the most efficient

Read more »

Software engineer’s guide to getting started with data science

December 30, 2012
By
Software engineer’s guide to getting started with data science

Many of my software engineer friends ask me about learning data science. There are many articles on this subject from renowned data scientists (Dataspora, Gigaom, Quora, Hilary Mason). This post captures my journey (a software engin...

Read more »

Opening Large CSV Files in R

December 26, 2012
By
Opening Large CSV Files in R

Before heading home for the holidays, I had a large data set (1.6 GB with over 1.25 million rows) with columns of text and integers ripped out of the company (Kwelia) Database and put into a .csv file since I was going to be offline a lot over the break. I tried opening the csv file

Read more »

Aggregation by Group in R

December 23, 2012
By
Aggregation by Group in R

Efficiency Comparison among 4 Methods above

Read more »