1902 search results for "RStudio"

Reverse Engineering with Correlated Features

February 11, 2016
By
Reverse Engineering with Correlated Features

In econometric modeling, I usually have a problem with correlated features. A few weeks ago, I was discussing feature selection when features are correlated. This week, I was wondering about reverse engineering when features might be correlated (not to say very correlated). The way I see reverse engineering is the following someone has some dataset, and based on that dataset, a...

Read more »

Clustering French Cities (based on Temperatures)

February 11, 2016
By
Clustering French Cities (based on Temperatures)

In order to illustrate hierarchical clustering techniques and k-means, I did borrow François Husson‘s dataset, with monthly average temperature in several French cities. > temp=read.table( + "http://freakonometrics.free.fr/FR_temp.txt", + header=TRUE,dec=",") We have 15 cities, with monthly observations > X=temp > boxplot(X) Since the variance seems to be rather stable, we will not ‘normalize’ the variables here, > apply(X,2,sd) Janv Fevr Mars...

Read more »

Plot the new SVG R logo with ggplot2

February 11, 2016
By
Plot the new SVG R logo with ggplot2

High resolution and SVG versions of the new R logo are finally available. I converted the SVG to WKT (file here) which means we can use it like we would a shapefile in R. That includes plotting! Here’s a short example of how to read that WKT and plot the logo using ggplot2: library(sp) library(maptools)

Read more »

A Tall Drink of Water

February 10, 2016
By
A Tall Drink of Water

In a previous post, I used water consumption data from Utah’s Open Data Catalog to explore what kind of users consume the most water in my home here in Salt Lake City, what the annual pattern of water use is, and how the drought of the past few years has affected water...

Read more »

Craft httr calls cleverly with curlconverter

February 10, 2016
By
Craft httr calls cleverly with curlconverter

When you visit a site like the LA Times’ NH Primary Live Results site and wish you had the data that they used to make the tables & visualizations on the site: Sometimes it’s as simple as opening up your browsers “Developer Tools” console and looking for XHR (XML HTTP Requests) calls: You can actually

Read more »

Clusters of Texts

February 10, 2016
By
Clusters of Texts

Another popular application of classification techniques is on texmining (see e.g. an old post on French president speaches). Consider the following example,  inspired by Nobert Ryciak’s post, with 12 wikipedia pages, on various topics, > library(tm) > library(stringi) > library(proxy) > titles = c("Boosting_(machine_learning)", + "Random_forest", + "K-nearest_neighbors_algorithm", + "Logistic_regression", + "Boston_Bruins", + "Los_Angeles_Lakers", + "Game_of_Thrones", + "House_of_Cards_(U.S._TV_series)", + "True Detective...

Read more »

Clusters of (French) Regions

February 9, 2016
By
Clusters of (French) Regions

For the data scienec course of tomorrow, I just wanted to post some functions to illustrate cluster analysis. Consider the dataset of the French 2012 elections > elections2012=read.table( "http://freakonometrics.free.fr/elections_2012_T1.csv",sep=";",dec=",",header=TRUE) > voix=which(substr(names( + elections2012),1,11)=="X..Voix.Exp") > elections2012=elections2012 > X=as.matrix(elections2012) > colnames(X)=c("JOLY","LE PEN","SARKOZY","MÉLENCHON","POUTOU","ARTHAUD","CHEMINADE","BAYROU","DUPONT-AIGNAN","HOLLANDE") > rownames(X)=elections2012 The hierarchical cluster analysis is obtained using > cah=hclust(dist(X)) > plot(cah,cex=.6) To get five groups, we have...

Read more »

Databases in containers

February 8, 2016
By
Databases in containers

A great number of readers reacted very positively to Nina Zumel‘s article Using PostgreSQL in R: A quick how-to. Part of the reason is she described an incredibly powerful data science pattern: using a formerly expensive permanent system infrastructure as a simple transient tool. In her case the tools were the data manipulation grammars SQL … Continue reading...

Read more »

Tutorial: Credit Card Fraud Detection with SQL Server 2016 R Services

February 8, 2016
By
Tutorial: Credit Card Fraud Detection with SQL Server 2016 R Services

If you have a database of credit-card transactions with a small percentage tagged as fraudulent, how can you create a process that automatically flags likely fraudulent transactions in the future? That's the premise behind the latest Data Science Deep Dive on MSDN. This tutorial provides a step by step to using the R language and the big-data statistical models...

Read more »

My favorite tools for helping future me

My favorite tools for helping future me

Reproducible research is a topic that people like to talk about these days. Thinking about reproducible research and learning the important tools is what improved my work more than anything. Not in a sense that my results got better. More in a sense that my feeling about the work got better and my analyses got easier to understand for future...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de









ODSC

CRC R books series













Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)