Using R Functions in Python

The following post is taken from two previous posts from an older blog of mine that is no longer available. These are from several years ago, and related to two critical questions that I encountered. One, how can I automatically generate hundreds of thousands of keywords for a search engine marketing campaign. Two, how can I

My previous post covered the basics of logistic regression. We must now examine the model to understand how well it fits the data and generalizes to other observations. The evaluation process involves the assessment of three distinct areas – goodness of fit, tests of individual predictors, and validation of predicted values – in order to

Please note that an earlier version of this post had to be retracted because it contained some content which was generated at work. I have since chosen to rewrite the document in a series of posts. Please recognize that this may take some time. Apologies for any inconvenience. Logistic regression is used to analyze the

I don’t normally work with personal identifiable information such as emails. However, the recent data dump from Ashley Madison got me thinking about how I’d examine a data set composed of email addresses. What are the characteristics of an email that I’d look to extract? How would I perform that task in R? Here’s some

For the past four months, I’ve been on the job market looking for work as an applied statistician or data scientist within the the online marketing industry. One thing I’ve come to expect with almost every company is some sort of homework assignment or challenge where a spreadsheet would be presented along with some guidelines

Logistic regression is a technique that is well suited for examining the relationship between a categorical response variable and one or more categorical or continuous predictor variables. The model is generally presented in the following format, where β refers to the parameters and x represents the independent variables. log(odds)=β0+β1∗x1+...+βn∗xn The log(odds), or log-odds ratio, is defined

Unlike many of the entries on Wikipedia relating to statistics or computer science, fashion related topics have not not been thoroughly documented. For example, the entries on Martin Margiela and Rei Kawakubo pale in comparison to the breadth of content on John Bayes, structural equation modeling, or R. In lieu of this, I wanted to investigate whether

Both R and Python possess libraries for using SQL statements to interact with data frames. While both languages have native facilities for manipulating data, the sqldf and pandasql provide a simple and elegant interface for conducting tasks using an intuitive framework that’s widely used by analysts. R and sqldf sqldf("SELECT COUNT(*) FROM df2 WHERE

Tinder is a popular matchmaking application that allows users to connect with others whom they share a physical attraction. New members build their profile by importing their age, gender, geographic information, and photos from their Facebook account. Users are then presented with profiles which meet their search criteria and are able to like or dislike them. Unlike traditional online