Blog Archives

Five Hard-Won Lessons Using Hive

June 12, 2014
By

I’ve been spending a ton of time lately on the data engineering side of ‘data science’, so I’ve been writing a lot of Hive queries. Hive is a great tool for querying large amounts of data, without having to know very much about the underpinnings of Hadoop. Unfortunately, there are a lot of things about Five Hard-Won...

Read more »

Building JSON in R: Three Methods

May 13, 2014
By

When I set out to build RSiteCatalyst, I had a few major goals: learn R, build a CRAN-worthy package and learn the Adobe Analytics API. As I reflect back on how the package has evolved over the past two years and what I’ve learned, I think my greatest learning was around how to deal with JSON Building JSON...

Read more »

Real-time Reporting with the Adobe Analytics API

March 10, 2014
By

Starting with version 1.3.1 of RSiteCatalyst, you can now access the real-time reporting capabilities of the Adobe Analytics API through a familiar R interface. Here’s how to get started… GetRealTimeConfiguration Before using the real-time reporting capabilities of Adobe Analytics, you first need to indicate which metrics and elements you are interested in seeing in real-time. To Real-time Reporting...

Read more »

RSiteCatalyst Version 1.3 Release Notes

February 4, 2014
By
RSiteCatalyst Version 1.3 Release Notes

Version 1.3 of the RSiteCatalyst package to access the Adobe Analytics API is now available on CRAN! Changes include: Search via regex functionality in QueueRanked/QueueTrended functions Support for Realtime API reports: Overtime and one-element Ranked report Allow for variable API request timing in Queue functions Fixed validate flag in JSON request to work correctly Deprecated RSiteCatalyst Version...

Read more »

Quickly Create Dummy Variables in a Data Frame

January 2, 2014
By
Quickly Create Dummy Variables in a Data Frame

On Quora, a question was asked about how to fix the error of the randomForest package in R not being able to handle more than 32 levels in a categorical variable. Seeing as how I’ve seen this question asked on Kaggle forums, StackOverflow and elsewhere, here’s the answer: code your own dummy variables instead of Quickly Create...

Read more »

Adobe Analytics Implementation Documentation in 60 seconds

December 9, 2013
By

When I was working as a digital analytics consultant, no question quite had the ability to cause belly laughs AND angst as, “Can you send me an updated copy of your implementation documentation?” I saw companies that were spending six-or-seven-figures annually on their analytics infrastructure, multi-millions in salary for employees and yet the only way Adobe Analytics Implementation...

Read more »

RSiteCatalyst Version 1.2 Release Notes

November 5, 2013
By
RSiteCatalyst Version 1.2 Release Notes

Version 1.2 of the RSiteCatalyst package to access the Adobe Analytics API is now available on CRAN! Changes include: Removed RCurl package dependency Changed argument order for GetAdminConsoleLog to avoid error when date not passed Return proper numeric type for metric columns Fixed bug in GetEVars function Added validate:true flag to API to improve error reporting Removed remaining RSiteCatalyst Version 1.2...

Read more »

Clustering Search Keywords Using K-Means Clustering

September 17, 2013
By
Clustering Search Keywords Using K-Means Clustering

One of the key tenets to doing impactful digital analysis is understanding what your visitors are trying to accomplish. One of the easiest methods to do this is by analyzing the words your visitors use to arrive on site (search keywords) and what words they are using while on the site (on-site search). Although Google has Clustering Search Keywords...

Read more »

Fun With Just-In-Time Compiling: Julia, Python, R and pqR

September 2, 2013
By
Fun With Just-In-Time Compiling: Julia, Python, R and pqR

Recently I’ve been spending a lot of time trying to learn Julia by doing the problems at Project Euler. What’s great about these problems is that it gets me out of my normal design patterns, since I don’t generally think about prime numbers, factorials and other number theory problems during my normal workday. These problems Fun With Just-In-Time...

Read more »

RSiteCatalyst Version 1.1 Release Notes

August 25, 2013
By

RSiteCatalyst version 1.1 is now available on CRAN. Changes from version 1 include: Support for Correlations/Subrelations in the QueueRanked function Support for Current Data in all ‘Queue‘ functions Support Anomaly Detection for QueueOvertime and QueueTrended functions (example usage with ggplot2 graph) Decrease in wait time for API calls (from 5 seconds to 2 seconds) and extending RSiteCatalyst Version 1.1...

Read more »