Sanitizing data in SAP HANA with R

Sanitizing data in SAP HANA with R

From April 10 to April 11, my team (Anne, Juergen and myself) host an InnoJam in Boston. It was a really great event, but the data provided by the City of Boston wasn't exactly in the best shape, so we took a lot of efforts (with a help of the SAP Guru...

Read more »

When R met SAP Gateway

When R met SAP Gateway

A couple of days ago, I was toying with the idea of doing something with SAP Gateway...I thought of using SUP, but as I recently wrote 2 blogs about it, I decided to go back to one of my favorite programming languages...R...So...Gateway can be consumed...

Read more »

Generating all subsets of a set

April 20, 2012
By
Generating all subsets of a set

Recently I have calculated Banzhaf power index. I required generation of all subsets of a given set. The code given there was a bit complex and I have decided to write a simple function calculating it. As an example of its application I reproduce Figur...

Read more »

From the Guardian’s data blog: Visualising risk

April 20, 2012
By
From the Guardian’s data blog: Visualising risk

The Guardian published a nice summary and link collection of an interdisciplinary visualisation workshop hosted by Microsoft dedicated to visualising probability and risk. Check it out here.OECD better life indexThe links I found most interesting were ...

Read more »

PhD week 7: Plotting and NIR spectroscopy

April 19, 2012
By
PhD week 7: Plotting and NIR spectroscopy

Near-infrared (NIR) spectroscopy is a technique that measures the amount of heat absorbed or emitted by certain materials. It is used in a variety of applications, but in the agricultural world, it is often used to determine the quality and composition of mixed materials such as stock forage. It uses electromagnetic radiation...

Read more »

PostgreSQL, Excel, R, and a Really Big Data Set!

April 19, 2012
By
PostgreSQL, Excel, R, and a Really Big Data Set!

At work I’ve started to work with the biggest data set I’ve ever seen!  First, let me qualify my use of the term “Big Data”.  The number of rows in the resultant data set (after much transformation and manipulation in … Continue reading →

Read more »

RcppArmadillo 0.3.0.2 released and on CRAN

April 19, 2012
By

Earlier today, Conrad Sanderson released another bug-fix version 3.0.2 for the still fairly recent 3.0.0 version of his excellent Armadillo C++ template library for linear algebra. The new RcppArmadillo release 0.3.0.2 also appeared on CRAN this morn...

Read more »

Zurich, Mar 2012 – Rmetrics gldist Package

April 19, 2012
By

(This article was first published on Rmetrics blogs, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on his blog: Rmetrics blogs. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web...

Read more »

Getting Historical Weather Data in R and SAP HANA

April 19, 2012
By

For many of my latest data blogs, I needed historical weather data to perform data mash-ups to pin-point the cause.  For example, for my continued exploration into the airlines/airports historical data using SAP HANA and R, I wanted to find out wh...

Read more »

Simple tools for building a recommendation engine

April 19, 2012
By

By Joseph Rickert Revolution’s resident economist, Saar Golde, is very fond of saying that “90% of what you might from a recommendation engine can be achieved with simple techniques”. To illustrate this point (without doing a lot of work), we downloaded the million row movie dataset from www.grouplens.org with the idea of just taking the first obvious exploratory step:...

Read more »

An R Script to Automatically download PubMed Citation Counts By Year of Publication

April 19, 2012
By
An R Script to Automatically download PubMed Citation Counts By Year of Publication

Ever wanted to look at PubMed trends and make elegant graphs of them? Here’s an R script that will do it automatically for you.

Read more »

Tutorial: Using plot() function

April 19, 2012
By
Tutorial: Using plot() function

Hello Readers! This is my first post as a member ofR-bloggers. In this post I'm going to talk about the basic plotting in R, fortwo dimensional. This is a tutorial for beginners in R.To begin with, let's define a vector first. Say wehave vector x, whic...

Read more »

Dummies for Dummies

April 19, 2012
By
Dummies for Dummies

Most R functions used in econometrics convert factor variables into a set of dummy/binary variables automatically. This is useful when estimating a linear model, saving the user from the laborious activity of manually including the dummy variables as regressors. However, what if you want to reshape your dataframe so that it contains such dummy variables?

Read more »

Matrix vs Data Frame in R

April 19, 2012
By

Today I ran into a double question that might be relevant to other R users: Why can’t I assign a dataframe row into a matrix row? And why won’t my function accept this dataframe row as an input argument? A … Continue reading →

Read more »

Adding a transparent image layer to a plot

April 19, 2012
By
Adding a transparent image layer to a plot

The following example shows how to add a transparent image-type layer to a plot. The add.alpha function (below) simply adds transparency to a vector of colors which is then introduced in the "col" argument of an image plot. Read more »

Read more »

User Input in R vs Python

April 18, 2012
By

Both R and Python have facilities where the coder can write a script which requests a user to input some information. In Python 2.6, the main function for this task is raw_input (in Python 3.0, it’s input()). In R, there are a series of functions that can be used to request an input from the user,

Read more »

"Correlation / Covariance" Spectrum (This time with "R")

April 18, 2012
By
"Correlation / Covariance"  Spectrum (This time with "R")

I treat this matter with other software´s, and of course you can do the same with "R".Once I have the spectra of my samples with a math treatment, I want to draw a correlation spectrum to see which wavelengths have better correlation with the constitu...

Read more »

Efficient Frontier of Funds and Allocation Systems

April 18, 2012
By
Efficient Frontier of Funds and Allocation Systems

I did a very basic experiment in Efficient Frontier of Buy-Hold and Tactical System where I determined the efficient frontier of the S&P 500 with itself transformed by a Mebane Faber 10-month moving average tactical allocation. The result was inter...

Read more »

Visualizing iOS Text Editors

April 18, 2012
By
Visualizing iOS Text Editors

The other day Brett Terpstra posted a gigantic and quite beautifully-executed feature comparison of all of the text editors available for iOS devices. The table is really terrific and also a bit overwhelming, as there's so much data. On the bus home ye...

Read more »

Small Countries Stablize by Exporting High-Tech

April 18, 2012
By
Small Countries Stablize by Exporting High-Tech

Smaller countries lead the way. When you think of 'high-tech', which countries come to mind? What is 'High-Tech'? Before continuing, what is meant by the term 'high-tech'? As defined by the World Data Bank, high-technology exports ar...

Read more »

knitr Performance Report-Attempt 2

April 18, 2012
By
knitr Performance Report-Attempt 2

Over the years I have changed my learning process from reading thoroughly first before proceeding to reading minimally and then applying immediately.  I very quickly see the gaps in my knowledge.  This method is far more painful but seems to ...

Read more »

When do you need all the data for Big Analytics?

April 18, 2012
By

In the 2012 edition of the SAP Sybase Capital Markets Guide, Revolution Analytics' Senior Advisor for Products and Strategy (and former CEO) Norman Nie writes about the "Five Benefits of Big Analytics". (You can also read his article at Enterprise Innovation.) Norman makes the argument that while sampling and aggregation are often useful ways of handling very large data...

Read more »

Simple Moving Average Strategy with a Volatility Filter

April 18, 2012
By
Simple Moving Average Strategy with a Volatility Filter

I would describe my trading approach as systematic long term trend following. A trend following strategy can be difficult mentally to trade after experiencing multiple consecutive losses when a trade reverses due to a volatility spike or the trend reverses. Volatility tends to increase when prices fall. This is not good for a long only … Continue reading...

Read more »

How to organize R user group

April 18, 2012
By

The first thing, what you have to do is to estimate how many users will be interested in local R group. I would say, that out of one million inhabitants you can expect 10-20 users. Based on this raw number, you can know, what challenges are waiting for you. If you expect 100 or more users, you have

Read more »

A word cloud where the x and y axes mean something

April 17, 2012
By
A word cloud where the x and y axes mean something

Ok so I have now done two iterations on a better way to visualize term frequencies using R, ggplot2 and plyr. The first was ok but ugly, the second was better but still ugly. How to read it: Frequency is segmented in to 20% quantiles The frequency is on the y axis Word size is

Read more »

Visualizing iOS Text Editors

April 17, 2012
By
Visualizing iOS Text Editors

The other day Brett Terpstra posted a gigantic and quite beautifully-executed feature comparison of all of the text editors available for iOS devices. The table is really terrific and also a bit overwhelming, as there’s so much data. On the bus h...

Read more »

Visualizing iOS Text Editors

April 17, 2012
By
Visualizing iOS Text Editors

The other day Brett Terpstra posted a gigantic and quite beautifully-executed feature comparison of all of the text editors available for iOS devices. The table is really terrific and also a bit overwhelming, as there’s so much data. On the bus h...

Read more »

Quickly Explore the Penn World Tables in R

April 17, 2012
By
Quickly Explore the Penn World Tables in R

The Penn World Tables are one of the greatest source of worldwide macroeconomic data, but dealing with its web interface is somewhat cumbersome. Fortunately, the data is also available as a R package on CRAN. Having some tools at hand … Continue reading →

Read more »

More Spectra patterns (1ª derivative)

April 17, 2012
By
More Spectra patterns (1ª derivative)

In the case of the first derivative for the absortion band, the maximum becomes a cero crossing.Using SG filters, we can calculate it with R, and to see, like in the last posts, the Corrgram matrix.Corrgram for the first derivative for this band:L...

Read more »