Plotting Math in R
The most popular programming languages in 2008
Working with directories
Something quite annoying to me is when I get an R script and I have to change all of the file references in the script. I get something like this this:
source("C:\\Documents and Settings\\UserName\\Data\\...\\File1.R")
data<-read.table("C:\\Documents and Settings\\UserName\\Data\\...\\SomeData.dat")
Besides these being unsightly, file references are also a pain to change. Usually I just delete all but the filename. How can I get away with that? because R remembers what directory you are dealing with and makes all file references relative. so the above becomes:
What I’ll be presenting at O’Reilly Money Tech 2009
(April 2009 Update: Unfortunately, The Money Tech Conference was indefinitely postponed, but fortunately I will be presenting a version of this talk in July at OSCON 2009).
I’ve been invited to speak at O’Reilly’s Money Tech conference this coming February 4-6th in New York City and thought I’d share the abstract for my talk here. I’ll likely be in New York for several days, if you’d like to get together to chat about data drop me a line!
My talk is entitled “Open Source Analytics: Visualization and Predictive Modeling of Big Data with the R Programming Language”
ABSTRACT
Just as the explosion of online data catalyzed the development of
storage technologies such as Hadoop, new challenges in data analytics
– turning terabytes into actionable insights — demand new tools. R,
an open-source language for statistical computing and graphics, is an
extensible, embeddable, and industry-strength solution for analytics.
In this session, I showcase R’s power by building predictive models
for Brazilian soybean harvests and baseball slugger salaries.
DESCRIPTION
The economics of data aggregation and analysis are being disrupted by
falling costs for storage and CPU power, the continuing shift of
business processes online, and the deluge of data that is being
generated as a consequence.
Satellite images, SEC filings, supply chain data (RFID data streams),
online prices, and newsgroup content represent just a few of the data
sources that hold potential for predictive modeling of markets.
Much of this data does not fit within existing paradigms for business
analysis: either its size overwhelms traditional desktop tools such as
Excel, or else its unique dimensions (such as geocodes) prevent its
being pipelined into more powerful, but narrowly designed, analysis
tools. Finally, closed-source tools cannot keep pace with the leading
edge of innovation in statistical and machine-learning algorithms.
Enter the open source programming language R. R has been dubbed the
lingua franca for statistical computing and graphical analysis, with a
pedigree tracing back several decades at Bell Labs. Though its
million-plus users are concentrated within academia, R is gaining
currency within several high-profile quantitative analysis groups,
including Google’s Customer Insights team and Barclays Global
Investors. In addition, R’s extensibility via user-contributed
packages has spawned an active developer community.
In this session, I will focus on applying R’s powerful visualization
tools to guide the construction of predictive models, using the kind
of large, multidimensional data sets that increasingly confront
quantitative analysts. Along the way, I will highlight R’s packages
for inferential statistics, its compact modeling syntax, and its ease
of connectivity with persistent data stores.
The two specific examples I will discuss are:
- an analysis of NASA’s Landsat imagery of Brazil’s center-west
agricultural regions to detect correlates for soybean harvest yields,
and a derived predictor of the Brazilian soybean market based in part
on these correlates.
- a validation of Bill James’ sabermetrics approach to batting
performance using 30 years of Major League Baseball statistics, and a
derived predictor for batters’ salaries.
For all of its strengths, R has an admittedly steep learning curve.
While source code for the examples will be provided online, this talk
will emphasize techniques and working examples over technical details.
The goal of this session is to give quantitative analysts the courage
to invest in learning the R language, by showcasing R’s power,
highlighting its features, and providing examples of its use for
innovative applications.
Time series packages on R
There is now an official CRAN Task View for Time Series. This will replace my earlier list of time series packages for R, and provide a more visible and useful entry point for people wanting to use R for time series analysis. If I have missed anything on the list, please let me know.
Time series packages on R
There is now an official CRAN Task View for Time Series. This will replace my earlier list of time series packages for R, and provide a more visible and useful entry point for people wanting to use R for time series analysis. If I have missed anything on the list, please let me know.
R’s working directory
Do you usually start R with a desktop icon or some other shortcut? Are you tired of using setwd and getwd each time after you start R to get the working directory correctly? If so, then your days of suffering might be just coming to an end.
Having the working directory set correctly is very convenient. You can both read and write files to the proper place without typing (on Windows, usually very long) path names. There are couple of solutions:
- Use
setwdin scripts.One way to achieve this is to have a
setwdfunction call at the top of your scripts. You then run it every time you do the computations in that script. For example to have at the top of a file a following line:setwd("c:/path/to/my/directory/")It is a nice approach, but things get complicated if you move files to different computers, say from home to your office, and have different directory structures, disk names etc. Of course you can change it every time. Or perhaps keep couple of versions and have all of them but one commented, for example:
# setwd("c:/path/to/my/directory/at/home") setwd("c:/path/to/my/directory/at/work")Which is also OK, but for me is too much micromanagement. Also, it becomes a problem if the script is not intended for interactive use.
- Use Windows shortcuts.
An alternative might be work with Windows shortcuts for starting R. In shortcut’s properties there is a “Start in” field in which you can put the path to the desired folder. If you start R with the modified icon then R’s working directory will be correctly set. With that approach you can have, say, couple of R icons on your desktop, each to different project folders.
This is convenient unless you work on 10 projects. Each time you may have to create yet another shortcut.
- Use PATH environment variable.
Another approach is to set the environment PATH variable. If you add the path to R’s executable to it then you will be able to start R from whatever directory in the system you want.
To modify the PATH variable you need to right-click on “My Computer” and select “Properties” then go to “Advanced” tab and “Environment variables” button. The way to modify the PATH variable depends on where you installed R. Usually it is something like
c:\program files\R\R 2.7.0\bin.I use this approach myself with Total Commander and its command line. Wherever I am on the disk I can start R in that directory just bu typing
rguiand pressing Enter. You can also use Windows Console (cmd) for that. - Make a context menu option.
Yet another way is to add a command to your context menu (the one appearing when you right-click on things). By right-clicking on a folder and choosing “R” option you can start R with that folder set as the working directory.
To set up such a command you have to modify Windows Registry and will require, I believe, administrative privileges. Look here for details how to do this.
Any other ideas or suggestions?
Posted in R

