582 search results for "SQL"

Aggregation by Group in R

December 23, 2012
By
Aggregation by Group in R

Efficiency Comparison among 4 Methods above

Read more »

Data Import Efficiency – A Case in R

December 23, 2012
By
Data Import Efficiency – A Case in R

Below is a piece of R snippet comparing the data import efficiencies among CSV, SQLITE, and HDF5. Similar to the case in Python posted yesterday, HDF5 shows the highest efficiency.

Read more »

Chocolate and nobel prize – a true story?

December 22, 2012
By
Chocolate and nobel prize – a true story?

Few of us can resist chocolate, but the real question is: should we even try to resist it? The image is CC by Tasumi1968. As a dark chocolate addict I was relieved to see Messerli's ecological study on chocolate consumption and the...

Read more »

Removing Records by Duplicate Values in R – An Efficiency Comparison

December 20, 2012
By
Removing Records by Duplicate Values in R – An Efficiency Comparison

After posting “Removing Records by Duplicate Values” yesterday, I had an interesting communication thread with my friend Jeffrey Allard tonight regarding how to code this in R, a combination of order() and duplicated() or sqldf(). Afterward, I did a simple efficiency comparison between two methods as below. The comparison result is pretty self-explanatory. In terms

Read more »

Querying, parsimony and golden hammers

December 20, 2012
By
Querying, parsimony and golden hammers

I love it when things are easy. I love it so much that I’ll spend a great deal of time and effort to keep things simple. At the same time, though, I think there’s some value in expending effort in pursuit of something. If you want to understand a thing, you have to spend time

Read more »

analyze the behavioral risk factor surveillance system (brfss) with r and monetdb

December 17, 2012
By

experimental.  the behavioral risk factor surveillance system (brfss) aggregates behavioral health data from 400,000 adults via telephone every year.  it's um *clears throat* the largest telephone survey in the world and it's gotta lotta uses...

Read more »

Data Science, Data Analysis, R and Python

The October 2012 issue of Harvard Business Review prominently features the words “Getting Control of Big Data” on the cover, and the magazine includes these three related articles:“Big Data: The Management Revolution,” by Andrew McAfee and Erik Brynjolfsson, pages 61 – 68;“Data Scientist: The Sexiest Job of the 21st Century,” by Thomas H. Davenport and D.J. Patil, pages...

Read more »

What is Correctness for Statistical Software?

December 14, 2012
By
What is Correctness for Statistical Software?

Introduction A few months ago, Drew Conway and I gave a webcast that tried to teach people about the basic principles behind linear and logistic regression. To illustrate logistic regression, we worked through a series of progressively more complex spam detection problems. The simplest data set we used was the following: This data set has

Read more »

analyze the american community survey (acs) with r and monetdb

December 10, 2012
By

experimental.  think of the american community survey (acs) as the united states' census for off-years - the ones that don't end in zero.  every year, one percent of all americans respond, making it the largest complex sample administered by ...

Read more »

Please stop using Excel-like formats to exchange data

December 7, 2012
By
Please stop using Excel-like formats to exchange data

I know “officially” data scientists all always work in “big data” environments with data in a remote database, streaming store or key-value system. But in day to day work Excel files and Excel export files get used a lot and cause a disproportionate amount of pain. I would like to make a plea to my Related posts:

Read more »