Blog Archives

Query Pandas DataFrame with SQL

November 1, 2014
By
Query Pandas DataFrame with SQL

Similar to SQLDF package providing a seamless interface between SQL statement and R data.frame, PANDASQL allows python users to use SQL querying Pandas DataFrames. Below are some examples showing how to use PANDASQL to do SELECT / AGGREGATE / JOIN operations. More information is also available on the GitHub (https://github.com/yhat/pandasql).

Read more »

Flexible Beta Modeling

October 27, 2014
By
Flexible Beta Modeling

Read more »

Model Segmentation with Recursive Partitioning

October 26, 2014
By
Model Segmentation with Recursive Partitioning

Read more »

Estimating a Beta Regression with The Variable Dispersion in R

October 19, 2014
By
Estimating a Beta Regression with The Variable Dispersion in R

Read more »

Fitting Lasso with Julia

October 7, 2014
By
Fitting Lasso with Julia

Julia Code R Code

Read more »

By-Group Aggregation in Parallel

October 4, 2014
By
By-Group Aggregation in Parallel

Similar to the row search, by-group aggregation is another perfect use case to demonstrate the power of split-and-conquer with parallelism. In the example below, it is shown that the homebrew by-group aggregation with foreach pakage, albeit inefficiently coded, is still a lot faster than the summarize() function in Hmisc package.

Read more »

Vector Search vs. Binary Search

October 1, 2014
By
Vector Search vs. Binary Search

Read more »

Row Search in Parallel

September 28, 2014
By
Row Search in Parallel

I’ve been always wondering whether the efficiency of row search can be improved if the whole data.frame is splitted into chunks and then the row search is conducted within each chunk in parallel. In the R code below, a comparison is done between the standard row search and the parallel row search with the FOREACH

Read more »

Chain Operations: An Interesting Feature in dplyr Package

July 28, 2014
By
Chain Operations: An Interesting Feature in dplyr Package

Read more »

Efficiency of Importing Large CSV Files in R

February 10, 2014
By
Efficiency of Importing Large CSV Files in R

Read more »