561 search results for "SQL"

Choosing an SQL Engine for Analytics

March 9, 2009
By
Choosing an SQL Engine for Analytics

I’ve been struggling for a while on which database to use for my working data. I used to use MS Access quite a lot. The problems with MS Access include but are not limited to: 2 GB file size limit, at least historically Versions change with each edition of MS Office Sort of tough to write SQL scripts Very

Read more »

Competitive balance and home court advantage in the NBA

July 6, 2014
By
Competitive balance and home court advantage in the NBA

Two years ago, the entire NBA season went into lockout because of mostly financial reasons. However, one central point was also about keeping a competitive balance within the NBA, so that large and small-market teams alike would have a chance to compete for a championship. THis brings us to the obvious question “Is there competitive

Read more »

How To: 20 Minute Guide to Get Started with PivotalR

July 1, 2014
By
How To: 20 Minute Guide to Get Started with PivotalR

In this article, Pivotal engineer and predictive analytics expert Hai Qian explains how someone new to R can get started performing statistical analysis on data stores in Greenplum Database, Pivotal HD and PostgreSQL in just 20 minutes using PivotalR. First, there is some background on R’s popularity, then the articles dives into important topics such as installation, data loading,...

Read more »

Maybe I Don’t Really Know R After All

June 26, 2014
By
Maybe I Don’t Really Know R After All

Lately, I’ve been feeling that I’m spreading myself too thin in terms of programming languages. At work, I spend most of my time in Hive/SQL, with the occasional Python for my smaller data. I really prefer Julia, but I’m alone at work on that one. And since I maintain a package on CRAN (RSiteCatalyst), I frequently spend Related posts:

Read more »

Tailoring univariate probability distributions

June 26, 2014
By
Tailoring univariate probability distributions

This post shows how to build a custom univariate distribution in R from scratch, so that you end up with the essential functions: a probability density function, cumulative distribution function, quantile function and random number generator. In the beginning all you need is an equation of the probability density function, … Continue reading →

Read more »

New Version of RStudio: R Markdown v2 and More

June 18, 2014
By
New Version of RStudio: R Markdown v2 and More

Today we’re very pleased to announce a new version of RStudio (v0.98.932) which is available for download now. New features in this release include: A next generation implementation of R Markdown with a raft of new features including support for HTML, PDF, and Word output, many new options for customizing document appearance, and the ability to create presentations

Read more »

analyze the american housing survey (ahs) with r

June 17, 2014
By

plenty of nationwide surveys collect information at the household-level, only the american housing survey (ahs) focuses on the physical structure rather than the inhabitants.  when asked to pick their favorite public-use file, urban planners, real...

Read more »

R and Vertica

June 14, 2014
By
R and Vertica

I’ve been spending the last few months working my way through the integration of R and Vertica, and will try to keep here things that I find handy. I’m quite sad to see there is not much about this Vertica feature on the web, that’s a little disappointing. But, it didn’t stop us from creating a

Read more »

Five Hard-Won Lessons Using Hive

June 12, 2014
By

I’ve been spending a ton of time lately on the data engineering side of ‘data science’, so I’ve been writing a lot of Hive queries. Hive is a great tool for querying large amounts of data, without having to know very much about the underpinnings of Hadoop. Unfortunately, there are a lot of things about Five Hard-Won...

Read more »

AlienVault Longitudinal Study Part 4

AlienVault Longitudinal Study Part 4

In Part 1 we looked at acquiring raw data, and wrangling it into a time series dataset. In Part 2 we looked at types of threats in the time series. In Part 3 we looked at countries. Now we will examine countries and types in combination in the AlienVault reputation database. Just as we shaped our dataset for better understanding in previous...

Read more »