## Next Kölner R User Meeting: 5 October 2012

September 25, 2012
The next Cologne R user group meeting is scheduled for 5 October 2012. All details and the agenda are available on the KölnRUG Meetup site. Please sign up if you would like to come along. Notes from the last Cologne R user group meeting are available ...

## Formula for Kickstarter Success: Copious Planning (just like real life)

September 24, 2012
It seems like the press can’t wait two days between awful depressing articles about how Kickstarter encourages fraud, broken promises, and tears.  Today I’d like to take you through a project thatshipped — let’s see what we can learn.  I interviewed Diana Rodgers of Wonder Threads.     Diana sought funding to expand her tech … Continue reading...

## Transition probabilities when adjacent sequence items must be different

September 24, 2012
Generating a random sequence from a fixed set of items is a common requirement, e.g., given the items A, B and C we might generate the sequence BACABCCBABC. Often the randomness is tempered by requirements such as each item having each item appear a given number of times in a sequence of a given length,

## Top 20 Data Visualization Tools

September 24, 2012
Every researcher or practitioner of quality (or pretty much any other subject, for that matter) needs a great toolbox packed with flexible visualization tools. I am very happy to see this list

## Learn R and Python, and Have Fun Doing It

September 24, 2012
If you need to catch up on all those years you spent not learning how to code (you need to know how to code), here are a few resources to help you quickly learn R and Python, and have a little fun doing it.First, the free online Coursera course Co...

## From continuous to categorical

September 24, 2012
During data analysis, it is often super useful to turn continuous variables into categorical ones.  In Stata you would do something like this:gen catvar=0replace catvar=1 if contvar>0 & contvar<=3replace catvar=2 if contvar>3 & co...

## Data Frames and Transactions

September 24, 2012
Transactions are a very useful tool when dealing with data mining.  It provides a way to mine itemsets or rules on datasets. In R the data must be in transactions form.  If the data is only available in a data.frame then to create (or coerce) the data frame to transaction the researcher may use the

## Coursera’s free online R course starts today

September 24, 2012
Coursera offers a number of on-line courses, all available for free and taught by experts in their fields. Today, the course Computing for Data Analysis begins. Taught by Johns Hopkins Biostatistics professor (and co-author of the Simply Statistics blog) Roger Peng, the course will teach you how to program in R and use the language for data analysis. Here's...

## An R Users’ Group in Davis

September 24, 2012
I’m excited to share that we’ve started a new R users’ group at UC Davis! Right now our main purpose is to run weekly 2-hour work/hack sessions where R users can get together to work through problems together. More info here

## Example 10.3: Enhanced scatterplot with marginal histograms

September 24, 2012
Back in example 8.41 we showed how to make a graphic combining a scatterplot with histograms of each variable. A commenter suggested we change the R graphic to allow post-hoc plotting of, for example, lowess lines. In addition, there are further refinements to be made. In this R-only entry, we'll make the figure...

## Use GBIF and googleVis to Make Maps with Species Occurrence Data

September 24, 2012
This is a short follow up on THIS posting.. I will briefly show how to use the dismo- and the googeVis package to plot species occurrences on an interactive Google map, like the one below (HERE is the R-script)MapID2ce4348e653

## Computing kook density in R

September 24, 2012
Do you ever see strange lights in the sky? Do you wonder what really goes on in Area 51? Would you like to use your R hacking skills to get to the bottom of the whole UFO conspiracy? Of course, you would! UFO data from infochimps is the focus of a dat...

## qgraph version 1.1.0 and how to simply make a GUI using ‘rpanel’

September 24, 2012
Last week I have updated the ‘qgraph‘ package to version 1.1.0, available on CRAN now. Besides some internal changes (especially the self-loops have been substantially improved) the most important change is the addition of a GUI interface, which can be … Continue reading →

## The fear-index: is the VIX efficient to be warned about high volatility? (Finance & Systematic Processus)

September 24, 2012
## Simple visually-weighted regression plots

September 24, 2012
There has recently been a lot of discussion of so-called “visually-weighted regression” plots. Folk hero Hadley Wickham suggests that such plots would be easy to implement with ggplot2, and so I have attempted to prove him right. The approa...

## New Zealand school performance: beyond the headlines

September 24, 2012
I like the idea of having data on school performance, not to directly rank schools—hard, to say the least, at this stage—but because we can start having a look at the factors influencing test results. I imagine the opportunity in … Continue reading →

## Variance targeting in garch estimation

September 24, 2012
What is variance targeting in garch estimation?  And what is its effect? Previously Related posts are: A practical introduction to garch modeling Variability of garch estimates garch estimation on impossibly long series The last two of these show the variability of garch estimates on simulated series where we know the right answer.  In response to … Continue reading...

## Popularity indicator, with images (NFL)

September 23, 2012
It’s Friday night, there’s nothing good on TV, mmm conditions are perfect for shaggin about in R. So I’m an NFL fan, and (shameless plug) avid fan of this NFL podcast. They run their own pickem league which unless users … Continue reading →

## Universal portfolio, part 11

September 23, 2012
First an apology, the links to the Universal Portfolio paper have stopped working.  This is because the personal webpage of Thomas Cover at Stanford has been taken down, but fortunately the content moved elsewhere.  The new link is Universal ...

## Minimum Correlation Algorithm Example

September 23, 2012
Today I want to follow up with the Minimum Correlation Algorithm Paper post and show how to incorporate the Minimum Correlation Algorithm into your portfolio construction work flow and also explain why I like the Minimum Correlation Algorithm. First, let’s load the ETF’s data set used in the Minimum Correlation Algorithm Paper using the Systematic

## Video: Analyzing Big Data using Oracle R Enterprise

September 23, 2012
Learn how Oracle R Enterprise is used to generate new insight and new value to business, answering not only what happened, but why ...

## Football model; plots and usage

September 23, 2012
After reading data, making a predictions display and building a football data model it is time to put this to validate a bit more (regression plots) and put to usage. It appears that the regression plots in the car package were not ...

## Project Euler — problem 20

September 23, 2012
It’s been quite a while since my last post on Euler problems. Today a visitor post his solution to the second problem nicely, which encouraged me to keep solving these problems. Just for fun! 10! = 10 * 9 * … * 3 * 2 * 1 … Continue reading →

## The infamous apply function

September 23, 2012
For R beginners, the apply() function seems like a secret doorway into programming bliss. It seems so powerful, and yet, beyond reach. For those just starting out, examples of how to use apply() can really help with the intuition of how to h...

## Text Analysis Tutorial on Spam Email in R

September 23, 2012
Hi everyone – I just wrote a tutorial on text analysis in R using the tm and wordcloud packages. Thought some of you here might be interested in it: text-analysis-75-925

## Maximum likelihood estimates for multivariate distributions

September 22, 2012
Consider our loss-ALAE dataset, and - as in Frees & Valdez (1998) - let us fit a parametric model, in order to price a reinsurance treaty. The dataset is the following, > library(evd) > data(lossalae) > Z=lossalae > X=Z;Y=Z ...

## Spacing measures: heterogeneity in numerical distributions

Numerically-coded data sequences can exhibit a very wide range of distributional characteristics, including near-Gaussian (historically, the most popular working assumption), strongly asymmetric, light- or heavy-tailed, multi-modal, or discrete (e.g., count data).  In addition, numerically coded values can be effectively categorical, either ordered, or unordered.  A specific example that illustrates the range of distributional behavior often seen in a collection...

Read more »

## Good programming practices in R

September 22, 2012
I write sloppy R scripts. It is a byproduct of working with a high-level language that allows you to quickly write functional code on the fly (see this post for a nice description of the problem in Python code) and the result of my limited formal training in computer programming. The lack of formal training

## KLEMS (1)

September 22, 2012
This post is actually a homework I did. The data file contains input use, output, quantities, costs, and prices for total U.S. nondurable manufacturing for 1949-2001. The data are deﬁned as follows: , , , , = Inputs corresponding to capital, labor, energy, materials, and purchased services, = represents total output, = respective quantity indexes, ...

