Using data.table for binning

Using data.table for binning

I discovered the impressive data.table package more than a year ago. In order to learn how to use it, I …Continuar leyendo »

Read more »

garch and the distribution of returns

April 22, 2013
By
garch and the distribution of returns

Using garch to learn a little about the distribution of returns. Previously There are posts on garch — in particular: A practical introduction to garch modeling The components garch model in the rugarch package garch and long tails There has also been discussion of the distribution of returns, including a satire called “The distribution of … Continue reading...

Read more »

Data Analysis for Marketing Research with R Language (1)

April 22, 2013
By
Data Analysis for Marketing Research with R Language (1)

Data Analysis technologies such as t-test, ANOVA, regression, conjoint analysis, and factor analysis are widely used in the marketing research areas of A/B Testing, consumer preference analysis, market segmentation, product pricing, sales driver analysis, and sales forecast etc. Traditionally the analysis tools are mainly SPSS and SAS, however, the open source R language is catching

Read more »

analyze the medical large claims experience study (mlces) with r

April 21, 2013
By

not a survey, not even remotely current, the society of actuaries' medical large claims experience study (mlces) might be the best private health insurance claims data available to the public.  this data should be used to calibrate other data sets...

Read more »

Programmatically Download CORINE Land Cover Seamless Vector Data with R

April 21, 2013
By
Programmatically Download CORINE Land Cover Seamless Vector Data with R

Thanks to a helpful SO-Answer I was able to download all CLC vector data (43 zip-files) programmatically:require(XML)path_to_files dir.create(path_to_files)setwd(path_to_files)doc urls # function to get zip file namesget_zip_name # function to plug into sapplydl_urls # download all zip-filessapply(urls, dl_urls)# function for unzippingtry_unzip # unzip all files in dir and delete them afterwardssapply(list.files(pattern = "*.zip"),...

Read more »

You Can Quote Me on That

April 21, 2013
By

The other day I came across the Empirical Quotes page on Mark Byran's blog. Some of his quotes related specifically to econometrics, and I thought I'd share a few others. That certainly doesn't mean that I agree with them all! "It is the preparation skill of the econometric chef that catches the professional eye, not the...

Read more »

Evaluating Event Impact Through Social Media Follower Histories, With Possible Relevance to cMOOC Learning Analytics

April 21, 2013
By
Evaluating Event Impact Through Social Media Follower Histories, With Possible Relevance to cMOOC Learning Analytics

Last year I sat on a couple of panels organised by I’m a Scientist’s Shane McCracken at various science communication conferences. A couple of days ago, I noticed Shane had popped up a post asking Who are you Twitter?, a quick review of a social media mapping exercise carried out on the followers of the

Read more »

What Is the Probability of a 16 Seed Beating a 1 Seed?

April 21, 2013
By
What Is the Probability of a 16 Seed Beating a 1 Seed?

Note: I started this post way back when the NCAA men's basketball tournament was going on, but didn't finish it until now. Since the NCAA Men's Basketball Tournament has moved to 64 teams, a 16 seed as never upset a 1 seed. You might be tempted to say ...

Read more »

Ordinal data, models with observers

April 21, 2013
By

I recently made three posts regarding analysis of ordinal data. A post looking at all methods I could find in R, a post with an additional method and a post using JAGS. Common in all three was using the cheese data, a data set where...

Read more »

In three months, I’ll be in Vegas (trying to win against the house)

April 20, 2013
By
In three months, I’ll be in Vegas (trying to win against the house)

In fact, I’m going there with my family and some friends, including two probabilists (I mean professionals, I am merely an amateur), with this incredible challenge: will I be able to convince  probabilists to go to play at the Casino? Actually, I also want to study them carefully, to understand how we should play optimally. For example, I hope...

Read more »

What Is the Probability of a 16 Seed Beating a 1 Seed?

April 20, 2013
By
What Is the Probability of a 16 Seed Beating a 1 Seed?

Note: I started this post way back when the NCAA men's basketball tournament was going on, but didn't finish it until now. Since the NCAA Men's Basketball Tournament has moved to 64 teams, a 16 seed as never upset a 1 seed. You might be tempted to say...

Read more »

Prioritizing project stakeholders using social network metrics

April 20, 2013
By
Prioritizing project stakeholders using social network metrics

Identifying project stakeholders and their requirements is a very important factor in the success of any project. Existing techniques tend to be very ad-hoc. In her PhD thesis Soo Ling Lim came up with a very interesting solution using social network analysis and what is more made her raw data available for download I have

Read more »

My new forecasting book is finally finished

April 20, 2013
By

My new online forecasting book (written with George Athanasopoulos) is now completed. I previously described it on this blog nearly a year ago. In reality, an online book is never complete, and we plan to continually update it. But it is now at the point where it is suitable for course work, and contains exercises and references. We hope...

Read more »

Modeling habitat diversity and species richness

April 20, 2013
By
Modeling habitat diversity and species richness

How does habitat diversity affect species richness? Perhaps intuition suggests that habitat diversity increases species richness by facilitating niche or resource partitioning among species. But, for a fixed area, as habitat heterogeneity increases, the area that can be allocated to each habitat type decreases. In a recent paper, Allouche and colleagues (2012) provide a theoretical and empirical treatment...

Read more »

RcppArmadillo 0.3.810.0

April 20, 2013
By

A new Armadillo release 3.810.0 by Conrad appeared yesterday, and was wrapped up in a new release 0.3.810.0 of RcppArmadillo. Upstream changes bring FFT support as well as more Sparse matrix constructors, and we have an improvement to the sample() fu...

Read more »

Basic Mathematical Functions

April 20, 2013
By

R can perform the usual mathematical operations, below are the functions:Arithmetic +    - addition-    - subtraction*    - multiplication/    - divisionTrigonometrysin    ...

Read more »

Agent-based modeling in R – habitat diversity and species richness

April 20, 2013
By
Agent-based modeling in R – habitat diversity and species richness

How does habitat diversity affect species richness? Perhaps intuition suggests that habitat diversity increases species richness by facilitating niche or resource partitioning among species. But, for a fixed area, as habitat heterogeneity increases, the area that can be allocated to each habitat type decreases. In a recent paper, Allouche and colleagues (2012) provide a theoretical and empirical treatment...

Read more »

Open Source software’s opportunity to reform government

April 19, 2013
By
Open Source software’s opportunity to reform government

The results from the 2013 Future of Open Source Survey are in — thanks to everyone who contributed by completing the survey. You can read an overview of the results here, or see the detailed breakdowns in the slides at the end of this post. For me, one of the most interesting nuggets from the survey is that a...

Read more »

Popup notification from R on Windows

April 19, 2013
By
Popup notification from R on Windows

After R is done running a long process, you may need to notify the operator to check the R console and provide the next commands. Without installing any more software or creating any batch files or VBS scripts, here is a simple way to create the popup notice in Windows Continue reading →

Read more »

A Course in Data and Computing Fundamentals

April 19, 2013
By
A Course in Data and Computing Fundamentals

Daniel Kaplan and Libby Shoop have developed a one-credit class called Data Computation Fundamentals, which was offered this semester at Macalester College. This course is part of a larger research and teaching effort funded by Howard Hughes Medical Institute (HHMI) to help students … Continue reading →

Read more »

Do the same thing to a bunch of variables with lapply()

April 19, 2013
By
Do the same thing to a bunch of variables with lapply()

It is extremely common to have a dataframe containing a bunch of variables, and to do the exact same thing to all of these variables. For instance, lets say we have a dataframe that has a bunch of limb bone measurements of different animals, and we wan...

Read more »

Using the SVD to find the needle in the haystack

April 19, 2013
By
Using the SVD to find the needle in the haystack

Sitting with a data set with too many variables? The SVD can be a valuable...

Read more »

Third Milano R net meeting: leave a comment

April 19, 2013
By

Third Milano R net meeting took place on April, 18. More than thirty R users and a lot of enthusiasm! You can download presentations and view pictures of the meeting. If you attended the meeting, please leave a comment. Stay … Continue reading →

Read more »

Photos of the third Milano R net meeting

April 19, 2013
By
Photos of the third Milano R net meeting

Photos of the third Milano R net meeting Milano; April 18, 2013

Read more »

Presentations of the third Milano R net meeting

April 19, 2013
By

Welcome presentation Nicola Sturaro, Consultant at Quantide (download PDF, 0.5 MB) Machine learning A case of digit recognition based on a shallow neural network implemented in R. Michele Vitali, Statistical consultant (download PDF, 0.2 MB) Chess betting odds How to develop … Continue reading →

Read more »

R: Streets of France

April 19, 2013
By
R: Streets of France

I was Inspired from Ben Frys all Streets project. There he plotted all streets of the United States of America (about 240 million segments). I tried this first for the countries in Europe, France has about 22 million segments, with the goal to get an all street map of Europe. My data source originate from the OpenStreetMap project and was...

Read more »

In Praise of Quandl!

April 18, 2013
By

Data - the econometrician's life-blood! Can't function without it.So, when a new source of data becomes available - especially one that's sophisticated, reliable, and FREE - it's time to sit up and take notice. Quandl is a recent Canadian start-up that delivers economic and financial time-series data, and then some.It's an interesting business model. When you go...

Read more »

Amazon AWS Summit 2013

April 18, 2013
By
Amazon AWS Summit 2013

I was fortunate enough to have been able to attend the Amazon AWS Summit in NYC and to listen to Werner Vogels give the keynote.  I will share a few of my thoughts on the AWS 2013 Summit and some of my take-aways.  I attended sessions that focused on two products in particular: Redshift and

Read more »

GSoC 2013: At the starting line

April 18, 2013
By
GSoC 2013: At the starting line

Google Summer of Code will be open for students on Monday, April 22.  The R Project has once again been selected as a mentoring organization , and a variety of mentors have proposed a number of projects for students to work on during this summer.  Here’s a bit about the program, and more on the

Read more »

Sponsors