566 search results for "SQL"

The (near) Future of Data Analysis – A Review

January 2, 2013
By
The (near) Future of Data Analysis – A Review

Sean Murphy co-organizes Data Business DC, among many other things. Hadley Wickham, having just taught workshops in DC for RStudio, shared with the DC R Meetup his view on the future, or at least the near future of Data Analysis. … Continue reading → The post The (near) Future of Data Analysis – A Review appeared first on...

Read more »

Efficiecy of Extracting Rows from A Data Frame in R

January 1, 2013
By
Efficiecy of Extracting Rows from A Data Frame in R

In the example below, 552 rows are extracted from a data frame with 10 million rows using six different methods. Results show a significant disparity between the least and the most efficient methods in terms of CPU time. Similar to the finding in my previous post, the method with data.table package is the most efficient

Read more »

Software engineer’s guide to getting started with data science

December 30, 2012
By
Software engineer’s guide to getting started with data science

Many of my software engineer friends ask me about learning data science. There are many articles on this subject from renowned data scientists (Dataspora, Gigaom, Quora, Hilary Mason). This post captures my journey (a software engin...

Read more »

Opening Large CSV Files in R

December 26, 2012
By
Opening Large CSV Files in R

Before heading home for the holidays, I had a large data set (1.6 GB with over 1.25 million rows) with columns of text and integers ripped out of the company (Kwelia) Database and put into a .csv file since I was going to be offline a lot over the break. I tried opening the csv file

Read more »

Aggregation by Group in R

December 23, 2012
By
Aggregation by Group in R

Efficiency Comparison among 4 Methods above

Read more »

Data Import Efficiency – A Case in R

December 23, 2012
By
Data Import Efficiency – A Case in R

Below is a piece of R snippet comparing the data import efficiencies among CSV, SQLITE, and HDF5. Similar to the case in Python posted yesterday, HDF5 shows the highest efficiency.

Read more »

Chocolate and nobel prize – a true story?

December 22, 2012
By
Chocolate and nobel prize – a true story?

Few of us can resist chocolate, but the real question is: should we even try to resist it? The image is CC by Tasumi1968. As a dark chocolate addict I was relieved to see Messerli's ecological study on chocolate consumption and the...

Read more »

Removing Records by Duplicate Values in R – An Efficiency Comparison

December 20, 2012
By
Removing Records by Duplicate Values in R – An Efficiency Comparison

After posting “Removing Records by Duplicate Values” yesterday, I had an interesting communication thread with my friend Jeffrey Allard tonight regarding how to code this in R, a combination of order() and duplicated() or sqldf(). Afterward, I did a simple efficiency comparison between two methods as below. The comparison result is pretty self-explanatory. In terms

Read more »

Querying, parsimony and golden hammers

December 20, 2012
By
Querying, parsimony and golden hammers

I love it when things are easy. I love it so much that I’ll spend a great deal of time and effort to keep things simple. At the same time, though, I think there’s some value in expending effort in pursuit of something. If you want to understand a thing, you have to spend time

Read more »

analyze the behavioral risk factor surveillance system (brfss) with r and monetdb

December 17, 2012
By

experimental.  the behavioral risk factor surveillance system (brfss) aggregates behavioral health data from 400,000 adults via telephone every year.  it's um *clears throat* the largest telephone survey in the world and it's gotta lotta uses...

Read more »