Summary of My First Trip to Strata #strataconf

February 28, 2013

In this post I am goIing to summarize some of the things that I learned at Strata Santa Clara 2013. For now, I will only discuss the conference sessions as I have a much longer post about the tutorial sessions that I am still working on and will post at a ... [Read more...]

Review of 2011 Data Scientist Summit

May 13, 2011

Some time over the past 6 weeks I randomly saw a tweet announcing the “Data Scientist Summit” and shortly below it I saw that it would be held in Las Vegas at the Venetian. Being a Data Scientist myself is reason enough to not pass up this opportunity, but Vegas definitely ... [Read more...]

EC2 Trials and Tribulations, Part 1 (Web Crawling)

May 11, 2011

Elastic Compute Cloud (EC2) is a service provided a Amazon Web Services that allows users to leverage computing power without the need to build and maintain servers, or spend money on special hardware. The idea is simple, the user “boots” up one or more machines and then accesses those machines ... [Read more...]

Location Tracking on Android, too!

April 23, 2011

This week it was revealed that the iPhone stores users’ locations, and this immediately caused a huge firestorm of commentary by tech geeks, panic among privacy advocates, and delight to data geeks like myself. Even better/worse, it seems that the iPhone caches location traces long-term, possibly back to the ... [Read more...]

My First Few Days with RStudio

March 9, 2011

As most readers are probably aware, the free IDE for R, called RStudio, was recently released for general use and it immediately made huge waves within the R community. IDE stands for Integrated Development Environment. IDEs typically provides a rich set tools developing in some target language. For standard programming ... [Read more...]

40 Fascinating Blogs for the Ultimate Statistics Geek!

January 20, 2011

I am happy to report that ByteMining is listed on “40 Fascinating Blogs for the Ultimate Statistics Geek“! Some of the ones that I frequently read, or are written by Twitter friends/followers (in no particular order): R-bloggers, an aggregate site containing blog posts tagged as posts about R. High quality ... [Read more...]

My Day at ACM Data Mining Camp III

November 13, 2010

My first time at ACM Data Mining Camp was so awesome, that I was thrilled the make the trip up to San Jose for the November 2010 version. In July, I gave a talk at the Emerging Technologies for Online Learning Symposium conference with a faculty member in the Department of ... [Read more...]

UCLA Statistics: Analyzing Thesis/Dissertation Lengths

September 29, 2010

As I am working on my dissertation and piecing together a mess of notes, code and output, I am wondering to myself “how long is this thing supposed to be?” I am definitely not into this to win the prize for longest dissertation. I just want to say my piece, ... [Read more...]

Taking R to the Limit, Part II – Large Datasets in R

August 20, 2010

For Part I, Parallelism in R, click here. Tuesday night I again had the opportunity to present on high performance computing in R, at the Los Angeles R Users’ Group. This was the second part of a two part series called “Taking R to the Limit: High Performance Computing in ... [Read more...]

Hitting the Big Data Ceiling in R

May 16, 2010

As a true R fan, I like to believe that R can do anything, no matter how big, how small or how complicated: there is some way to do it in R. I decided to approach my large, sparse matrix problem with this attitude. But here I sit a broken ... [Read more...]

Opening Statements on Markov Chain Monte Carlo

April 1, 2010

This quarter I am TAing UCLA’s Statistics 102C. Introduction to Monte Carlo Methods for Professor Qing Zhou. This course did not exist when I was an undergraduate, and I think it is pretty rare to teach Monte Carlo (minus the bootstrap if you count that) or MCMC to undergrads. ... [Read more...]

My Experience at ACM Data Mining Camp #DMcamp

March 21, 2010

My parents and I made plans to visit San Jose and Saratoga on my grandmother’s birthday, March 19, since that is where she grew up. I randomly saw someone tweet about the ACM Data Mining Camp unconference that happened to be the next day, March 20, only a couple of miles ... [Read more...]

Exact Complexity of Mergesort, and an R Regression Oddity

February 13, 2010

It’s nice to be back after a pretty crazy two weeks or so. Let me start off by stating that this blog post is simply me pondering and may not be correct. Feel free to comment on inaccuracies or improvements! In preparation for an exam and my natural tendencies ... [Read more...]

Advanced Graphics in R

January 27, 2010

Each quarter the UCLA Statistical Consulting Center hosts minicourses twice per week in R and LaTeX. Tonight was my turn to present. I presented Advanced Graphics in R. This was the same presentation I gave at the LA R Users’ Group in August will a fellow consultant. She and I ... [Read more...]

