890 search results for "SQL"

The Geometry of Classifiers

December 18, 2014
By
The Geometry of Classifiers

As John mentioned in his last post, we have been quite interested in the recent study by Fernandez-Delgado, et.al., “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?” (the “DWN study” for short), which evaluated 179 popular implementations of common classification algorithms over 120 or so data sets, mostly from the UCI … Continue reading...

Read more »

How to analyze a new dataset (or, analyzing ‘supercar’ data, part 1)

December 16, 2014
By

I love cars. The way they sound. The engineering. The craftsmanship. And let’s be honest: fast cars are just fun. Given my love of cars, I frequently watch Top Gear clips on YouTube. A couple of weeks ago, I stumbled across this:   Watching the video, I’m thinking, “253 miles per hour? You’ve got to The post

Read more »

Parallelism via “parSapply”

December 13, 2014
By

In an earlier post, I used mclapply to kick off parallel R processes and to demonstrate inter-process synchronization via the flock package. Although I have been using this approach to parallelism for a few years now, I admit, it has certain important disadvantages. It works only on a single machine, and also, it doesn’t work

Read more »

O’Reilly Data Scientist Salary and Tools Survey, November 2014

December 10, 2014
By
O’Reilly Data Scientist Salary and Tools Survey, November 2014

The O'Reilly Data Scientist Survey for 2014 is out, with fresh data on the salaries and tools used by data scientists. Jon King has a summary of the results, but not much has changed since last year: median income is down very slightly ($100k in 2013 vs $98k in 2014), and the most popular analysis tools (excluding operating systems)...

Read more »

Identifying Position Change Groupings in Rank Ordered Lists

December 9, 2014
By
Identifying Position Change Groupings in Rank Ordered Lists

The title says it all, doesn’t it?! Take the following example – it happens to show race positions by driver for each lap of a particular F1 grand prix, but it could be the evolution over time of any rank-based population. The question I had in mind was – how can I identify positions that

Read more »

dplyr – some more reflections

December 4, 2014
By

Yesterday, I published a post here on DataScience.LA with a simple/basic benchmark for dplyr. It...

Read more »

The World We Live In #3: Breastfeeding

December 1, 2014
By
The World We Live In #3: Breastfeeding

Facts are stubborn, but statistics are more pliable (Mark Twain) According to World Health Organization, exclusive breastfeeding is recommended up to 6 months of age, with continued breastfeeding along with appropriate complementary foods up to two years of age or beyond. Thus, the defining characteristic of continued breastfeeding is that the infant between 6 months and … Continue reading...

Read more »

Storing Forecasts in a Database

November 29, 2014
By

In my last post I mentioned that I started using RSQLite to store computed results. No rocket science here, but my feeling is that this might be useful to others, hence, this post. This can be done using any database, but I will use (R)SQLite as an illustration. Let’s assume we are running a long

Read more »

R, an Integrated Statistical Programming Environment and GIS

November 27, 2014
By
R, an Integrated Statistical Programming Environment and GIS

This article was originally published in Geoinformatics magazine. R is well known as a powerful, extensible and relatively fast statistical programming language and open software project with a command line interface (CLI). What is less well known is that R also has cutting edge spatial packages that allow it to behave as a fully featured Geographical Information System...

Read more »

Synchronization for R with the flock Package

November 20, 2014
By

Have you tried synchronizing R processes? I did and it wasn’t straightforward. In fact, I ended up creating a new package – flock. One of the improvements I did not too long ago to my R back-testing infrastructure was to start using a database to store the results. This way I can compute all interesting

Read more »

Sponsors

Mango solutions



plotly webpage

dominolab webpage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)