Trading volume forecast for an illiquid stock

August 8, 2011
By
Trading volume forecast for an illiquid stock

When dealing with transaction cost analysis, a stock’s volume is assumed to be stable or foreseeable.  However, there is different picture, then we are dealing with an illiquid stock. It is relatively easy to forecast the volume of a liquid stock, because trading volume has high autocorrelation – the volumes at t and t+1 are correlated. For

Read more »

R at Wikimania

August 8, 2011
By

Wikimania 2011 came to a close yesterday. For those of you unfamiliar with Wikimania it may be described as UseR for Wikipedia, Wikimedia and MediaWiki all rolled into one. The conference brings together staff, volunteer editors, volunteer developers and users of MediaWiki projects. Of specific interest to R Bloggers readers may be the sessions on…

Read more »

RghcnV3 2.0

August 7, 2011
By
RghcnV3 2.0

Well, version 2.0 is in the can and I’ll be uploading to CRAN over the next couple of days. Lets go over the highlights. Prior to version 2.0 we had basically 3 kinds of data flowing around the package: V3 14 column format, zoo objects and mts objects.  The 14 column format has always been

Read more »

Meta-analysis

August 7, 2011
By
Meta-analysis

Introduction Effect estimation is an important task in modern research. An example is the identification of risk factors for disease and the qualification of medical treatments. Usually, researchers are interested in estimating the global, common effect. Since actual effects tend to differ across populations, estimates based on sample of a particular population seldomly generalize well.

Read more »

Usability

August 7, 2011
By

Usability. I am not an expert in Human-Computer Interaction (HCI) at all. Worse, I make the crappiest looking interfaces, typically. So, that's said. Usability. Wikipedia writes that "sability is the ease of use and learnability of a ...

Read more »

R popularity – steady growth and New York Times

August 6, 2011
By
R popularity – steady growth and New York Times

I have just came up with an idea how to test the wikipedia search traffic visualisation functions that I wrote about in my previous post. I decided to check if R is really gaining popularity that fast. ar <- wikiStat("R_(programming_language)", … Continue reading →

Read more »

Fitting mixture distributions with the R package mixtools

Fitting mixture distributions with the R package mixtools

My last two posts have been about mixture models, with examples to illustrate what they are and how they can be useful.  Further discussion and more examples can be found in Chapter 10 of Exploring Data in Engineering, the Sciences, and Medicine.  One important topic I haven’t covered is how to fit mixture models to datasets like the Old Faithful geyser...

Read more »

Visualising Wikipedia search statistics with R

August 6, 2011
By
Visualising Wikipedia search statistics with R

I have been playing with R to parse html. After reading about visualising “fantasy football” search traffic with RGoogleTrends at The Log Cabin blog I decided to write a few functions to do similar things with Wikipedia search statistics. This … Continue reading →

Read more »

Programmers Should Know R

August 6, 2011
By
Programmers Should Know R

Programmers should definitely know how to use R. I don’t mean they should switch from their current language to R, but they should think of R as a handy tool during development.Again and again I find myself working with Java code like the following. td.linenos { background-color: #f0f0f0; padding-right: 10px; } span.lineno { background-color: #f0f0f0;Related posts:

Read more »

Number of components in a mixture

August 5, 2011
By
Number of components in a mixture

I got a paper (unavailable online) to referee about testing for the order (i.e. the number of components) of a normal mixture. Although this is an easily spelled problem, namely estimate k in I came to the conclusion that it is a kind of ill-posed problem. Without a clear definition of what a component is,

Read more »

Upcoming R training classes, live from the experts

August 5, 2011
By

Revolution Analytics is hosting several hands-on R training classes over the next few months, with in-person instruction from two leading package authors and experts from the R community. Diethelm Würtz from ETH Zurich will give a two-day master class on Portfolio Selection and Optimization in Practice. Prof Würtz leads the Rmetrics project, and will provide in-depth instruction on using...

Read more »

R as a cure for ‘mindless statistics’?

August 5, 2011
By

Several years ago Gerd Gigerenzer wrote: “Statistical rituals largely eliminate statistical thinking in the social sciences. Rituals are indispensable for identification with social groups, but they should be the subject rather than the procedure of science. Statistical rituals largely eliminate … Continue reading →

Read more »

Positive coefficient regression in R

August 5, 2011
By
Positive coefficient regression in R

Ever have a regression model where the coefficients don't make sense? I've been trying to predict electricity and gas consumption from daily activity schedules but a simple linear regression kept saying that demands should go down the more an activity is performed. Fortunately I found the nnls package and show here how you can use it to...

Read more »

More on JSM

August 5, 2011
By

While my time at the 2011 Joint Statistical Meetings was short--I unfortunately missed some presentations I would have like to have attended--it was a great experience. The collection of academics and professionals is very different from the other con...

Read more »

Image Data from ImageJ to R and Vice Versa

August 5, 2011
By

In recent years many R packages have been developed to enable image analysis in R. As an alternative the combination of R with a powerful image analysis software like ImageJ offers many advanced image analysis interfaces and algorithms not yet available in R. Bio7 integrates both applications in a Rich Client Plattform based on Eclipse

Read more »

Outlier Detection with DPM Slides from JSM 2011

August 5, 2011
By
Outlier Detection with DPM Slides from JSM 2011

Here are the 14 slides I used during my talk at the Joint Statistical Meetings 2011: shotwell-jsm-2011.pdf. I'm trying hard to minimize the text in my presentation slides. But, this usually requires that I practice more. Hence, you will know which talks I have practiced thoroughly by the amount of text in the slides .

Read more »

Friday Links: R, OpenHelix Bioinformatics Tips, 23andMe, Perl, Python, Next-Gen Sequencing

August 5, 2011
By
Friday Links: R, OpenHelix Bioinformatics Tips, 23andMe, Perl, Python, Next-Gen Sequencing

I haven't posted much here recently, but here is a roundup of a few of the links I've shared on Twitter (@genetics_blog) over the last two weeks.Here is a nice tutorial on accessing high-throughput public data (from NCBI) using R and Bioconductor.Cloud...

Read more »

New Rcpp master classes scheduled for New York and San Francisco

August 4, 2011
By

Together with Revolution Analytics, I will be offering two more one-day classes on the Rcpp package for seamless integration of R and C++.The format will follow the workshop Romain and I gave during the tutorial day preceding this year's R/Financ...

Read more »

Aug 4, 2011 "plunge" headlines are in the air tonight

August 4, 2011
By
Aug 4, 2011 "plunge" headlines are in the air tonight

Today's financial headlines are littered with the word 'plunge.'  Considering today's (cl-cl) drop on the S&P500 was just about -5%, I don't know that I would exactly call that a plunge.         &nb...

Read more »

CHCN: Canadian Historical Climate Network

August 4, 2011
By
CHCN: Canadian Historical Climate Network

A reader asked a question about data from   environment canada.  He wanted to know if that data could somehow be integrated into the RGhcnV3 package.  That turned out to be a bit more challenging that I expected.  In short order I’d found a couple other people who had done something similar.  DrJ of course was

Read more »

Statisticians at JSM consider themselves "Data Scientists"

August 4, 2011
By
Statisticians at JSM consider themselves "Data Scientists"

At the JSM 2011 conference in Miami earlier this week, we conducted an informal poll of attendees on their attitudes to respect to Big Data, statistical software, and data science. JSM is the largest gathering of statisticians in North America, and attendees were invited to complete a survey after logging into the Wi-Fi network. Of the 190 respondents to...

Read more »

Lattice-xyplot without Border/Box, with Axes at Bottom & Left Side Only, with Custom Ablines/Grid & Axis-Labelling

August 4, 2011
By
Lattice-xyplot without Border/Box, with Axes at Bottom & Left Side Only, with Custom Ablines/Grid & Axis-Labelling

Here's how you do a lattice-xyplot without border/box, with axes at bottom & left side only, with custom ablines/grid & axis-labelling Read more »

Read more »

Does Jon Skeet have mental powers that make us upvote his answers? (The effect of reputation on upvotes)

August 4, 2011
By
Does Jon Skeet have mental powers that make us upvote his answers? (The effect of reputation on upvotes)

Of course since we all know Jon Skeet does have various powers, I will move onto unanswered questions, whether a users reputation makes them receive more upvotes for answers. I’ve seen this theory mentioned in multiple places (see any of the comments to Jon Skeet’s answer that are along the lines of “If this was

Read more »

Q-Q Plots for Multi-modal Performance Data

August 3, 2011
By
Q-Q Plots for Multi-modal Performance Data

I'm in the process of putting together some slides on how to apply Quantile-Quantile plots to performance data. Q-Q plots are a handy tool for visually inspecting how well your data matches a known probability distribution (prob dsn). If the match is g...

Read more »

Hotness

August 3, 2011
By
Hotness

We have an internal image that floated around work several years ago that details network utilization of TCP over a wide variety of configurations. It is a heatmap created in matlab that is just sweet, sweet eye candy. We actually hung it on the outside of a cube for a short while and people couldn't help but stop and...

Read more »

How Google uses R to make online advertising more effective

August 3, 2011
By

At JSM 2011 today, three Google employees (amongst the more than 20 Google delegates there) gave a little insight into how statistical analysis with R yields better results for companies using Google's various advertising products. Bill Heavlin from Google kicked off the session with a talk about conditional regression models, a statistical technique at Google used to evaluate the...

Read more »

A Bayesian Guessing Game

August 3, 2011
By
A Bayesian Guessing Game

You, the player, must think of some set, eg "odd numbers" or "perfect squares," and that'll be your little secret. Now think of some numbers that live in the intersection of your set and the integers {1, 2, ... , 100} -- for example, if you've chosen ...

Read more »

Faster files in R

August 3, 2011
By

R is fairly slow in reading files. read.table() is slow, scan() a bit faster, and readLines() fastest.But all these are nowhere as fast as other tools that scan through files. Let us look at an example. I have in front of me a 283M file.(Small update: the timings where off before. First because R hashes strings, one has to...

Read more »

Tomboy Notes: Personal R Help File

August 3, 2011
By
Tomboy Notes: Personal R Help File

When learning R it is helpful to have your own personal help file. One you create for yourself, with the notes, links, and language you understand (sometimes the help files are not very helpful). Let me introduce you to Tomboy Notes.Tomboy Notes is a l...

Read more »