4899 search results for "git"

Overfitted Backtests

October 23, 2013
By
Overfitted Backtests

It has been a while since I discussed testing for overfitting in backtests.  Since then, Marcos López de Prado and coauthors have done some very thoughtful work (see the bottom), and they even started a blog.  Their newest paper builds on discoveries they made in their earlier work, and...

Read more »

New R package: scholar

October 23, 2013
By
New R package: scholar

My new R package, scholar, has just been posted on CRAN. The scholar package provides functions to extract citation data from Google Scholar. In addition to retrieving basic information about a single scholar, the package also allows you to compare multiple scholars and predict future h-index values. There’s a full guide on Github (along

Read more »

My experience of learning R – from basic graphs to performance tuning

October 23, 2013
By
My experience of learning R – from basic graphs to performance tuning

BackgroundR as some of you may know is a statistical and graphics programming language (see Wikipedia ) used by academia and recently by IT professionals of our ever growing software industry. There is a sudden demand for Data Scientists, Data Analysts and Statisticians with a background in R among other things data and development related subjects. I have...

Read more »

Pre-calculating large tables of values

Pre-calculating large tables of values

Pre-calculating large tables of values I'm currently working on a project where we want to know, based on a euclidian distance measure, what is the probability that the value is a match to the another value. i.e. given an actual value, and a theoretical value from calculation, what is the probability that they are the same? This can be calculated...

Read more »

Build your own Twitter Archive and Analyzing Infrastructure with MongoDB, Java and R [Part 1] [Update]

October 22, 2013
By
Build your own Twitter Archive and Analyzing Infrastructure with MongoDB, Java and R [Part 1] [Update]

UPDATE: The JAVA script is now also available with the streaming API. You can find the script on my github account Hey everybody, you sure know the problems which appear when you want to work with the Twitter API. Twitter created a lot of different restrictions minimizing the fun of the Data Mining process. Another …

Read more »

Review: Kölner R Meeting 18 October 2013

October 22, 2013
By
Review: Kölner R Meeting 18 October 2013

The Cologne R user group met last Friday for two talks on split apply combine in R and XLConnect by Bernd Weiß and Günter Faes respectively, before the usual Schnitzel and Kölsch at the Lux.Split apply combine in R The apply family of functions in R is incredible powerful, yet for newcomers often somewhat mysterious....

Read more »

Using R to parse time (and taxon names) with GBIF’s API

October 21, 2013
By
Using R to parse time (and taxon names) with GBIF’s API

GBIF has recently made a bunch of handy tools available via their revamped API. These tools include a species name parser, which seems very useful for cleaning long lists of taxon names. Here’s a simple R function that takes a … Continue reading →

Read more »

Tracking the 2013 Hurricane Season

October 21, 2013
By
Tracking the 2013 Hurricane Season

With it being the end of hurricane season it’s only appropriate to do a brief summary of the activity this year.   It’s been a surprisingly low-key season as far as hurricanes are concerned.  There have been only a few hurricanes and the barometric pressure of any hurricane this season has not even come close

Read more »

analyze the national vital statistics system (nvss) with r and monetdb

October 21, 2013
By

ever since the dawn of the internet, the centers for disease control and prevention (cdc) has maintained a big data archive called the national vital statistics system (nvss).  the hardworking quants in hyattsville release two major annual microda...

Read more »

How Do You Write Your Model Definitions?

October 20, 2013
By
How Do You Write Your Model Definitions?

I’m often irritated by that when a statistical method is explained, such as linear regression, it is often characterized by how it can be calculated rather than by what model is assumed and fitted. A typical example of this is that linear regression is often described as a method that uses ordinary least squares to calculate the best...

Read more »