Articles by atmathew

Working With SEM Keywords in R

September 20, 2015 | atmathew

The following post is taken from two previous posts from an older blog of mine that is no longer available. These are from several years ago, and related to two critical questions that I encountered. One, how can I automatically generate hundreds of thousands of keywords for a search engine ... [Read more...]

Logistic Regression in R – Part Two

September 2, 2015 | atmathew

My previous post covered the basics of logistic regression. We must now examine the model to understand how well it fits the data and generalizes to other observations. The evaluation process involves the assessment of three distinct areas – goodness of fit, tests of individual predictors, and validation of predicted values – ... [Read more...]

Logistic Regression in R – Part One

September 1, 2015 | atmathew

Please note that an earlier version of this post had to be retracted because it contained some content which was generated at work. I have since chosen to rewrite the document in a series of posts. Please recognize that this may take some time. Apologies for any inconvenience.   Logistic regression ... [Read more...]

Examining Email Addresses in R

August 22, 2015 | atmathew

I don’t normally work with personal identifiable information such as emails. However, the recent data dump from Ashley Madison got me thinking about how I’d examine a data set composed of email addresses. What are the characteristics of an email that I’d look to extract? How would ... [Read more...]

Homework during the hiring process…no thanks!

August 17, 2015 | atmathew

For the past four months, I’ve been on the job market looking for work as an applied statistician or data scientist within the the online marketing industry. One thing I’ve come to expect with almost every company is some sort of homework assignment or challenge where a spreadsheet ... [Read more...]

Evaluating Logistic Regression Models

August 17, 2015 | atmathew

Logistic regression is a technique that is well suited for examining the relationship between a categorical response variable and one or more categorical or continuous predictor variables. The model is generally presented in the following format, where β refers to the parameters and x represents the independent variables. log(odds)=β0+β1∗x1+...+β... [Read more...]

Wikipedia and the Fashion Weeks: A Look at Usage Patterns

August 3, 2015 | atmathew

Unlike many of the entries on Wikipedia relating to statistics or computer science, fashion related topics have not not been thoroughly documented. For example, the entries on Martin Margiela and Rei Kawakubo pale in comparison to the breadth of content on John Bayes, structural equation modeling, or R. In lieu ...
[Read more...]

Turning Data Into Awesome With sqldf and pandasql

April 29, 2015 | atmathew

Both R and Python possess libraries for using SQL statements to interact with data frames. While both languages have native facilities for manipulating data, the sqldf and pandasql provide a simple and elegant interface for conducting tasks using an intuitive framework that’s widely used by analysts.         R and sqldf ...
[Read more...]

I like you and you like me…but what does it all mean. (Part 1)

August 19, 2014 | atmathew

Tinder is a popular matchmaking application that allows users to connect with others whom they share a physical attraction. New members build their profile by importing their age, gender, geographic information, and photos from their Facebook account. Users are then presented with profiles which meet their search criteria and are ... [Read more...]

R 101: Summarizing Data

March 25, 2014 | atmathew

When working with large amounts of data that is structured in a tabular format, a common operation is to summarize that data in different ways using specific variables. In Microsoft Excel, pivot tables are a nice feature that is used for this purpose. While not as “efficient” in relation to ... [Read more...]

Above Average: Analyzing Self-Rated Qualities in R

March 16, 2014 | atmathew

Numerous psychological studies have demonstrated that people often have an inflated perception of their personal qualities. From work performance to driving skills, people report being above average in relation to others when it comes to many arenas. This extends to how people perceive their own physical attractiveness and intelligence levels. ... [Read more...]

Summarizing Data in R

April 10, 2013 | atmathew

When work with large amounts of data that is structured in a tabular format, a common operation is to summarize that data in different ways using specific variables. In Microsoft Excel, pivot tables are a nice feature that is used for this purpose. Of course, R also has similar calculations ... [Read more...]

Creating ‘Tags’ For PPC Keywords

February 7, 2013 | atmathew

When performing search engine marketing, it is usually beneficial to construct a system for making sense of keywords and their performance. While one could construct Bayesian Belief Networks to model the process of consumers clicking on ads, I have found that using ’tags’ to categorize keywords is just as useful ... [Read more...]
1 2

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)