<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>R-bloggers</title>
	<atom:link href="http://www.r-bloggers.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.r-bloggers.com</link>
	<description>R news and tutorials contributed by (300) R bloggers</description>
	<lastBuildDate>Wed, 22 Feb 2012 18:59:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>The New York Yankees Payroll vs Everyone Else (Major League Baseball)</title>
		<link>http://www.r-bloggers.com/the-new-york-yankees-payroll-vs-everyone-else-major-league-baseball/</link>
		<comments>http://www.r-bloggers.com/the-new-york-yankees-payroll-vs-everyone-else-major-league-baseball/#comments</comments>
		<pubDate>Wed, 22 Feb 2012 18:59:00 +0000</pubDate>
		<dc:creator>Patrick Rhodes</dc:creator>
				<category><![CDATA[R bloggers]]></category>
		<category><![CDATA[baseball]]></category>
		<category><![CDATA[major league baseball]]></category>
		<category><![CDATA[MLB]]></category>
		<category><![CDATA[payroll]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[yankees]]></category>

		<guid isPermaLink="false">http://www.r-bloggers.com/?guid=65bfacf6fb188bf86c634594efa947db</guid>
		<description><![CDATA[(This article was first published on Graph of the Week, and kindly contributed to R-bloggers)]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
(This article was first published on  <strong><a href="http://www.graphoftheweek.org/2012/02/new-york-yankees-payroll-vs-everyone.html"> Graph of the Week</a></strong>, and kindly contributed to <a href="http://www.r-bloggers.com/">R-bloggers)</a>      
</div></p>
<br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-QgSTU1ifEmE/T0Uv659DUUI/AAAAAAAAACE/GPRQsxBAjxc/s1600/yankees_vs_mlb_publish.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" ref="nofollow" target="_blank"><img 0"="" alt="Chart: Major League Baseball Payrolls: New York Yankees vs all other teams (adjusted for inflation) border=" height="320" src="http://3.bp.blogspot.com/-QgSTU1ifEmE/T0Uv659DUUI/AAAAAAAAACE/GPRQsxBAjxc/s640/yankees_vs_mlb_publish.png" width="640" /></a></div><b>Description:</b><br />Major League Baseball payrolls for all teams since 1985. The New York Yankees payroll is highlighted with results defined by the shape of the point.<br /><br /><b>Data:</b><br /><span style="background-color: white; font-family: 'Courier New', Courier, FreeMono, monospace; font-size: 13px; line-height: 18px; text-align: left;"><a href="http://www.baseball-databank.org/" ref="nofollow" target="_blank">http://www.baseball-databank.org/</a></span><br /><a name='more'></a><b><br /></b><br /><b>Analysis:</b><br />For years fans of <a href="http://mlb.mlb.com/" ref="nofollow" target="_blank">Major League Baseball </a>(MLB) have been crying 'foul!' at the <a href="http://newyork.yankees.mlb.com/" ref="nofollow" target="_blank">New York Yankees</a> regarding their spending habits. The primary complaint is that the Yankees have 'bought' their championships, leaving the other teams to squander in mediocrity.<br /><br />What does it means to 'buy' a championship? Is there an implication that the Yankees are the only team to pay players while the rest rely on volunteers? Of course not - all players get paid (handsomely - <a href="http://www.graphoftheweek.org/2012/01/description-year-by-year-total-annual.html" ref="nofollow" target="_blank">see previous article on the rising MLB payroll</a>).<br /><br />Looking at the above graph, it is clear that the Yankees have been a top-tier paying team since 1985. However this has <b>not </b>resulted in a championship every year. In fact, during their world series run in the late 90s, their payroll was not always the highest.<br /><br />It wasn't until <i>after </i>2000 that their payroll began to quickly outpace everyone else, resulting in a high water mark in 2005 (where they failed to make it past the first round of playoffs). Since then, a slow downward trend has occurred in their payroll - perhaps they have realized that throwing money at players isn't the path to world series rings.<br /><br />"But," you ask, "does a high payroll equal a better winning percentage?" Analysis of payroll vs winning percentage (not shown) does seem to indicate a relationship, but not a strong one (r = 0.496, r<sup>2</sup>=0.214). This makes sense because a high salary does not indicate actual skill; rather, it has a lot to do with player contract composition and the subjective opinions of team management.<br /><br /><b>Questions:</b><br />1) Will the Red Sox catch up in terms of payroll?<br />2) How long can the Yankees sustain their current spending habits?<br />3) What will the low-payroll teams do in order to close the gap to the high-payroll teams?<br /><br /><b>Code:</b><br />This graph was generated using the '<a href="http://had.co.nz/ggplot2/" ref="nofollow" target="_blank">ggplot2</a>' package within the&nbsp;<a href="http://www.r-project.org/" ref="nofollow" target="_blank">R programming language</a>:<br /><pre><span style="font-size: x-small;">ggplot(adjusted.salaries.frame, aes(x=yearID, y=payroll)) +<br />  geom_point() +<br />  geom_point(aes(x=adjusted.yankees.frame$yearID, y=adjusted.yankees.frame$payroll, <br />      color="Yankees", colour=adjusted.yankees.frame$Result, <br />      shape=adjusted.yankees.frame$Result), size=5) +<br />  geom_line(aes(x=adjusted.yankees.frame$yearID, y=adjusted.yankees.frame$payroll, <br />      color="Yankees"), size=1.1) +<br />  <br />  ylab("Team Payroll (in U.S. Dollars)") +<br />  xlab("Year") +<br />  <br />  opts(title="MLB Payrolls: The New York Yankees vs All Other Teams (adjusted for inflation)",<br />    legend.title = theme_blank(),<br />    panel.background = theme_blank()) +<br />  scale_y_continuous(formatter = mysep)</span></pre><div><br class="Apple-interchange-newline" /></div><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8566036031670722613-4337032780976210571?l=www.graphoftheweek.org' alt='' /></div>
<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on his blog: <strong><a href="http://www.graphoftheweek.org/2012/02/new-york-yankees-payroll-vs-everyone.html"> Graph of the Week</a></strong>.</div>
<hr />
<a href="http://www.r-bloggers.com/">R-bloggers.com</a> offers <strong><a href="http://feedburner.google.com/fb/a/mailverify?uri=RBloggers">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="http://www.r-project.org/">R</a> news and <a title="R tutorials" href="http://www.r-bloggers.com/?s=tutorial">tutorials</a> on topics such as: visualization (<a title="ggplot and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=ggplot2">ggplot2</a>, <a title="Boxplots using lattice and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=boxplot">Boxplots</a>, <a title="Maps and gis" href="http://www.r-bloggers.com/?s=map">maps</a>, <a title="Animation in R" href="http://www.r-bloggers.com/?s=animation">animation</a>), programming (<a title="RStudio IDE for R" href="http://www.r-bloggers.com/?s=RStudio">RStudio</a>, <a title="Sweave and literate programming" href="http://www.r-bloggers.com/?s=sweave">Sweave</a>, <a title="LaTeX in R" href="http://www.r-bloggers.com/?s=LaTeX">LaTeX</a>, <a title="SQL and databases" href="http://www.r-bloggers.com/?s=SQL">SQL</a>, <a title="Eclipse IDE for R" href="http://www.r-bloggers.com/?s=eclipse">Eclipse</a>, <a title="git and github, Version Control System" href="http://www.r-bloggers.com/?s=git">git</a>, <a title="Large data in R using Hadoop" href="http://www.r-bloggers.com/?s=hadoop">hadoop</a>, <a title="Web Scraping of google, facebook, yahoo, twitter and more using R" href="http://www.r-bloggers.com/?s=Web+Scraping">Web Scraping</a>) statistics (<a title="Regressions and ANOVA analysis tutorials" href="http://www.r-bloggers.com/?s=regression">regression</a>, <a title="principal component analysis tutorial" href="http://www.r-bloggers.com/?s=PCA">PCA</a>, <a title="Time series" href="http://www.r-bloggers.com/?s=time+series">time series</a>,<a title="ecdf" href="http://www.r-bloggers.com/?s=ecdf">ecdf</a>, <a title="finance trading" href="http://www.r-bloggers.com/?s=trading">trading</a>) and more...
</div></p>]]></content:encoded>
			<wfw:commentRss>http://www.r-bloggers.com/the-new-york-yankees-payroll-vs-everyone-else-major-league-baseball/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>Non overlapping labels on a ggplot scatterplot</title>
		<link>http://www.r-bloggers.com/non-overlapping-labels-on-a-ggplot-scatterplot/</link>
		<comments>http://www.r-bloggers.com/non-overlapping-labels-on-a-ggplot-scatterplot/#comments</comments>
		<pubDate>Wed, 22 Feb 2012 11:33:56 +0000</pubDate>
		<dc:creator>simonraper</dc:creator>
				<category><![CDATA[R bloggers]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://drunks-and-lampposts.com/?p=395</guid>
		<description><![CDATA[This is a very quick post just to share a quick tip on how to add non overlapping labels to a scatterplot in ggplot using a great package called directlabels. The trick is to make each point a single member group using an aesthetic like colour and then apply the direct.label function with the first.qp &#8230; <a href="http://drunks-and-lampposts.com/2012/02/22/labelling-a-ggplot-scatterplot/">Continue reading <span>&#187;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=drunks-and-lampposts.com&#38;blog=30889107&#38;post=395&#38;subd=drunksandlampposts&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
(This article was first published on  <strong><a href="http://drunks-and-lampposts.com/2012/02/22/labelling-a-ggplot-scatterplot/"> Drunks&Lampposts » R</a></strong>, and kindly contributed to <a href="http://www.r-bloggers.com/">R-bloggers)</a>      
</div></p>
<p>This is a very quick post just to share a quick tip on how to add non overlapping labels to a scatterplot in ggplot using a great package called <a href="http://cran.r-project.org/web/packages/directlabels/index.html" ref="nofollow" target="_blank">directlabels</a>. The trick is to make each point a single member group using an aesthetic like colour and then apply the direct.label function with the first.qp method. Some example code and output is below</p>
<p><pre class="brush: r;">

library(ggplot2)
library(directlabels)
x&amp;amp;lt;-runif(10)
y&amp;amp;lt;-rnorm(10)
z&amp;amp;lt;-as.character(midwest$county[1:10])
q&amp;amp;lt;-qplot(x,y)+geom_point(aes(colour=z))
direct.label(q, first.qp)

</pre></p>
<p><a href="http://drunksandlampposts.files.wordpress.com/2012/02/labelexample.png" ref="nofollow" target="_blank"><img class="aligncenter size-full wp-image-401" title="labelexample" src="http://drunksandlampposts.files.wordpress.com/2012/02/labelexample.png?w=750" alt=""   /></a></p>
<p>If there are better ways then I&#8217;d love to know but it works well for me and has the added advantage that the labels are matched to the points by colour.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/drunksandlampposts.wordpress.com/395/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/drunksandlampposts.wordpress.com/395/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/drunksandlampposts.wordpress.com/395/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/drunksandlampposts.wordpress.com/395/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/drunksandlampposts.wordpress.com/395/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/drunksandlampposts.wordpress.com/395/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/drunksandlampposts.wordpress.com/395/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/drunksandlampposts.wordpress.com/395/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/drunksandlampposts.wordpress.com/395/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/drunksandlampposts.wordpress.com/395/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/drunksandlampposts.wordpress.com/395/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/drunksandlampposts.wordpress.com/395/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/drunksandlampposts.wordpress.com/395/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/drunksandlampposts.wordpress.com/395/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=drunks-and-lampposts.com&amp;blog=30889107&amp;post=395&amp;subd=drunksandlampposts&amp;ref=&amp;feed=1" width="1" height="1" />
<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on his blog: <strong><a href="http://drunks-and-lampposts.com/2012/02/22/labelling-a-ggplot-scatterplot/"> Drunks&Lampposts » R</a></strong>.</div>
<hr />
<a href="http://www.r-bloggers.com/">R-bloggers.com</a> offers <strong><a href="http://feedburner.google.com/fb/a/mailverify?uri=RBloggers">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="http://www.r-project.org/">R</a> news and <a title="R tutorials" href="http://www.r-bloggers.com/?s=tutorial">tutorials</a> on topics such as: visualization (<a title="ggplot and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=ggplot2">ggplot2</a>, <a title="Boxplots using lattice and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=boxplot">Boxplots</a>, <a title="Maps and gis" href="http://www.r-bloggers.com/?s=map">maps</a>, <a title="Animation in R" href="http://www.r-bloggers.com/?s=animation">animation</a>), programming (<a title="RStudio IDE for R" href="http://www.r-bloggers.com/?s=RStudio">RStudio</a>, <a title="Sweave and literate programming" href="http://www.r-bloggers.com/?s=sweave">Sweave</a>, <a title="LaTeX in R" href="http://www.r-bloggers.com/?s=LaTeX">LaTeX</a>, <a title="SQL and databases" href="http://www.r-bloggers.com/?s=SQL">SQL</a>, <a title="Eclipse IDE for R" href="http://www.r-bloggers.com/?s=eclipse">Eclipse</a>, <a title="git and github, Version Control System" href="http://www.r-bloggers.com/?s=git">git</a>, <a title="Large data in R using Hadoop" href="http://www.r-bloggers.com/?s=hadoop">hadoop</a>, <a title="Web Scraping of google, facebook, yahoo, twitter and more using R" href="http://www.r-bloggers.com/?s=Web+Scraping">Web Scraping</a>) statistics (<a title="Regressions and ANOVA analysis tutorials" href="http://www.r-bloggers.com/?s=regression">regression</a>, <a title="principal component analysis tutorial" href="http://www.r-bloggers.com/?s=PCA">PCA</a>, <a title="Time series" href="http://www.r-bloggers.com/?s=time+series">time series</a>,<a title="ecdf" href="http://www.r-bloggers.com/?s=ecdf">ecdf</a>, <a title="finance trading" href="http://www.r-bloggers.com/?s=trading">trading</a>) and more...
</div></p>]]></content:encoded>
			<wfw:commentRss>http://www.r-bloggers.com/non-overlapping-labels-on-a-ggplot-scatterplot/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://0.gravatar.com/avatar/cd329a395c507fe4c19e172a71ede3b4?s=96&amp;amp;d=identicon&amp;amp;r=G" length="" type="" />
<enclosure url="http://drunksandlampposts.files.wordpress.com/2012/02/labelexample.png" length="" type="" />
		</item>
		<item>
		<title>Log File Analysis with R</title>
		<link>http://www.r-bloggers.com/log-file-analysis-with-r/</link>
		<comments>http://www.r-bloggers.com/log-file-analysis-with-r/#comments</comments>
		<pubDate>Wed, 22 Feb 2012 01:56:00 +0000</pubDate>
		<dc:creator>C</dc:creator>
				<category><![CDATA[R bloggers]]></category>

		<guid isPermaLink="false">http://www.r-bloggers.com/?guid=a6dd4a14596b4831193cb08ed84906ea</guid>
		<description><![CDATA[&#160;R often comes up in discussions of heavy duty scientific and statistical analysis (and so it should).&#160; However, it is also incredibly handy for a variety of more routine developer activities.&#160;&#160; And so I give you… log file analysi...]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
(This article was first published on  <strong><a href="http://www.r-chart.com/2012/02/log-file-analysis-with-r.html"> R-Chart</a></strong>, and kindly contributed to <a href="http://www.r-bloggers.com/">R-bloggers)</a>      
</div></p>
<br /><div style="text-align: center;"><a href="http://www.gradesquare.com/" ref="nofollow" target="_blank"><img height="60" src="http://gradesquare.com/css/images/logo.png" width="400" /></a>&nbsp;</div><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><span style="font-family: inherit;"><span style="background-color: white;">R often comes up in discussions of heavy duty scientific and statistical analysis (and so it should).&nbsp; However, it is also incredibly handy for a variety of more routine developer activities.&nbsp;&nbsp; And so I give you… log file analysis with R! &nbsp;</span><br style="background-color: white;" /><br style="background-color: white;" /><span style="background-color: white;">I was just involved in the launch of&nbsp;</span><a href="http://gradesquare.com/" rel="nofollow" style="background-color: white; color: #234786; outline-color: initial; outline-style: initial; outline-width: 0px;" ref="nofollow" target="_blank"><span class="yshortcuts" id="lw_1329867226_2" style="color: #366388; cursor: pointer;">gradesquare.com</span></a><span style="background-color: white;">&nbsp;(go ahead – click on the link and check it out.&nbsp; We will still be here later!).&nbsp; With the flurry of recent activity, I needed a way to visualize and communicate site activity to the rest of the team.&nbsp; It only takes a few lines of R to read in a log file (of a reasonable size), format the data, and generate some usable charts. &nbsp;Like most good ideas - it is <a href="http://stackoverflow.com/questions/5664997/logfile-analysis-in-r" ref="nofollow" target="_blank">not new</a>. &nbsp;Most log files follow a similar format (such as <a href="https://en.wikipedia.org/wiki/Common_Log_Format" ref="nofollow" target="_blank">common log format</a>)&nbsp;</span><span style="background-color: white;">so there may be some minor variations to the following exercise.</span><br style="background-color: white;" /></span><br /><span style="font-family: inherit;">The only library that I used for this example was ggplot2 for charts. &nbsp;</span><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;"><b>library(ggplot2)</b></span><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><br /><span style="font-size: large;"><b>Read the Log File</b></span><br /><span style="font-family: inherit;">A sample of the log file (miserably wrapped - my apologies):</span><br /><span style="font-family: inherit;"><br /></span><br /><span style="font-size: xx-small;"><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;">66.12.71.25 - - [21/Feb/2012 23:44:11] "GET /course/1894/detail HTTP/1.1" 200 7017 5.0829</span><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;" /><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;">66.12.71.21 - - [21/Feb/2012 23:44:39] "GET /search_by_author?search_learn_exp=Khan+Academy&amp;page=193 HTTP/1.1" 200 8019 0.3288</span><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;" /><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;">66.12.71.25 - - [21/Feb/2012 23:45:21] "GET /course/19/detail HTTP/1.1" 200 6851 0.1213</span><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;" /><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;">18.4.5.14 - - [21/Feb/2012 23:45:59] "GET /search_by_subject?search_learn_exp=algebra-i-worked-examples HTTP/1.1" 200 7939 0.0370</span></span><br /><span style="font-family: inherit;"><br /></span><br />If you can't make that out - just know that it is a relatively typical log file that includes the IP address of the client request, the date and time, the HTTP method and URL path, the HTTP request status code, a count of bytes returned and the time required for the request to process.<br /><br /><span style="font-family: inherit;"><br /></span><br /><span style="font-family: inherit;">The log file can be read into a data frame as follows</span><span style="font-family: inherit;">.</span><br /><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;"><b>df = read.table('webapp.log')</b></span><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><br />There are a lot of&nbsp;different options available - and you might want to take advantage of these to minimize the amount of additional cleanup required after loading the file. &nbsp;For details:<br /><br /><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;"><b>help(read.table)</b></span><br /><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;"><br /></span><br /><span style="font-family: 'Courier New', courier, monaco, monospace, sans-serif;"><br /></span><br /><span style="font-family: 'Courier New', courier, monaco, monospace, sans-serif;"><br /></span><br /><span style="font-family: inherit;"><span style="font-family: inherit; font-size: large;"><b>Clean Up and Format</b></span>&nbsp;</span><br /><span style="font-family: inherit;">I chose to clean up manually after the fact. &nbsp;To start, we name the columns in the data frame.</span><br /><span style="font-family: 'Courier New', courier, monaco, monospace, sans-serif;"><br /></span><br /><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;"><b>colnames(df)=c('host','ident','authuser','date','time','request','status','bytes','duration')</b></span><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><br /><br />The date and time were split up when read in above. &nbsp;I am not concerned with the time at this point but do want the date to be cast to a date type.<br /><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;"><b>df$date=as.Date(df$date,"[%d/%b/%Y")</b></span><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><span style="font-family: inherit;">To see the column names and first few rows of our data frame...</span><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><span style="background-color: white;"><span style="font-family: 'Courier New', Courier, monospace;"><b>head(df)</b></span></span><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" />There are a number of different ways of getting a quick handle on the data - you could do a summary for instance. &nbsp;One item that you might want to have is a the number of requests for HTTP status.<br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;"><b>table(df$status)</b></span><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;">&nbsp;</span><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><br />But the item of immediate interest is simply the number of requests. &nbsp;The following will provide the number of requests by date.<br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;"><b>reqs=as.data.frame(table(df$date))</b></span><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><br />R is really great for these quick summarizations, and if you memorize a few functions you will be able to address most needs easily. &nbsp;At a certain point, I can better visualize data problems using SQL, and so use the sqldf library. &nbsp;For now - on to some charts using ggplot2.<br /><br /><span style="font-size: large;"><b>Make Some Charts</b></span><br /><br /><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-bJsu_YOIHcQ/T0RGVGyFcXI/AAAAAAAAApg/iacH2g9ewWQ/s1600/TrafficToSite.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" ref="nofollow" target="_blank"><img border="0" height="400" src="http://1.bp.blogspot.com/-bJsu_YOIHcQ/T0RGVGyFcXI/AAAAAAAAApg/iacH2g9ewWQ/s400/TrafficToSite.png" width="400" /></a></div>One "gotcha" that I hit fairly often with R and ggplot2 is the need to cast variables in a way that allows them to be treated as either continuous or discrete. &nbsp;In the following casting the Var1 field as a Date allows it to be treated as continuous and geom_line() renders a line as intended.<br /><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;"><b>ggplot(data=reqs, aes(x=as.Date(Var1), y=Freq)) + geom_line() + xlab('Date') + ylab('Requests') + opts(title='Traffic to Site')</b></span><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><br /><br /><br /><br />On the other hand, the format function is used in this example to cause the (http) status value to be treated as discrete.<br /><br style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;" /><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif;"><b>ggplot(data=df, aes(x=format(status))) + geom_bar() + xlab('Status') + ylab('Count') + opts(title='Status')</b></span><br /><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;"><br /></span><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-hqT0BxkjTrI/T0RGII2ApVI/AAAAAAAAApY/U5kPIVnyazY/s1600/HTTP_Status.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" ref="nofollow" target="_blank"><img border="0" height="400" src="http://1.bp.blogspot.com/-hqT0BxkjTrI/T0RGII2ApVI/AAAAAAAAApY/U5kPIVnyazY/s400/HTTP_Status.png" width="400" /></a></div><span style="font-family: inherit;">By the way, the images were exported as pngs for the blog by assigning the chart to a variable p and printing like so:</span><br /><span style="font-family: inherit;"><br /></span><br /><br /><span style="font-family: 'Courier New', Courier, monospace;"><b>png("imagename.png")</b></span><br /><span style="font-family: 'Courier New', Courier, monospace;"><b>print(p)</b></span><br /><span style="font-family: 'Courier New', Courier, monospace;"><b>dev.off()</b></span><br /><div style="font-family: inherit;"><br /></div><div style="font-family: inherit;">So there you have it - functional, useful R that addresses a practical every day need of web developers. &nbsp;It is also a great, practical task that can introduce you to R with a simple relevant exercise that provides immediate value.</div><div style="font-family: inherit;"><br /></div><div style="font-family: inherit;">The next time Google Analytics falls short, pull out R and give it a try!</div><br /><div style="text-align: justify;"><br /></div><span style="background-color: white; font-family: 'Courier New', courier, monaco, monospace, sans-serif; font-size: 16px;"><br /></span><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3867310391951630980-3427234779316482700?l=www.r-chart.com' alt='' /></div>
<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on his blog: <strong><a href="http://www.r-chart.com/2012/02/log-file-analysis-with-r.html"> R-Chart</a></strong>.</div>
<hr />
<a href="http://www.r-bloggers.com/">R-bloggers.com</a> offers <strong><a href="http://feedburner.google.com/fb/a/mailverify?uri=RBloggers">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="http://www.r-project.org/">R</a> news and <a title="R tutorials" href="http://www.r-bloggers.com/?s=tutorial">tutorials</a> on topics such as: visualization (<a title="ggplot and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=ggplot2">ggplot2</a>, <a title="Boxplots using lattice and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=boxplot">Boxplots</a>, <a title="Maps and gis" href="http://www.r-bloggers.com/?s=map">maps</a>, <a title="Animation in R" href="http://www.r-bloggers.com/?s=animation">animation</a>), programming (<a title="RStudio IDE for R" href="http://www.r-bloggers.com/?s=RStudio">RStudio</a>, <a title="Sweave and literate programming" href="http://www.r-bloggers.com/?s=sweave">Sweave</a>, <a title="LaTeX in R" href="http://www.r-bloggers.com/?s=LaTeX">LaTeX</a>, <a title="SQL and databases" href="http://www.r-bloggers.com/?s=SQL">SQL</a>, <a title="Eclipse IDE for R" href="http://www.r-bloggers.com/?s=eclipse">Eclipse</a>, <a title="git and github, Version Control System" href="http://www.r-bloggers.com/?s=git">git</a>, <a title="Large data in R using Hadoop" href="http://www.r-bloggers.com/?s=hadoop">hadoop</a>, <a title="Web Scraping of google, facebook, yahoo, twitter and more using R" href="http://www.r-bloggers.com/?s=Web+Scraping">Web Scraping</a>) statistics (<a title="Regressions and ANOVA analysis tutorials" href="http://www.r-bloggers.com/?s=regression">regression</a>, <a title="principal component analysis tutorial" href="http://www.r-bloggers.com/?s=PCA">PCA</a>, <a title="Time series" href="http://www.r-bloggers.com/?s=time+series">time series</a>,<a title="ecdf" href="http://www.r-bloggers.com/?s=ecdf">ecdf</a>, <a title="finance trading" href="http://www.r-bloggers.com/?s=trading">trading</a>) and more...
</div></p>]]></content:encoded>
			<wfw:commentRss>http://www.r-bloggers.com/log-file-analysis-with-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>Webinar Wednesday: Introduction to Revolution R Enterprise</title>
		<link>http://www.r-bloggers.com/webinar-wednesday-introduction-to-revolution-r-enterprise/</link>
		<comments>http://www.r-bloggers.com/webinar-wednesday-introduction-to-revolution-r-enterprise/#comments</comments>
		<pubDate>Tue, 21 Feb 2012 21:58:48 +0000</pubDate>
		<dc:creator>David Smith</dc:creator>
				<category><![CDATA[R bloggers]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[REvolution]]></category>

		<guid isPermaLink="false">http://www.r-bloggers.com/?guid=7c072564072772515209a00d3c8193bb</guid>
		<description><![CDATA[If you haven't yet had a chance to catch my regularly-scheduled webinar, &#34;Revolution R Enterprise - 100% R and More&#34;, it's a quick 30-minute introduction to the R language and the added features of Revolution R Enterprise. It's also a chance to ask me any questions you might have about R or Revolution Analytics during the live broadcast (starts at 11AM Pacific time). Details and registration info at the link below. Revolution Analytics webinars: Revolution R Enterprise - 100% R and More]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
(This article was first published on  <strong><a href="http://blog.revolutionanalytics.com/2012/02/webinar-wednesday-introduction-to-revolution-r-enterprise.html"> Revolutions</a></strong>, and kindly contributed to <a href="http://www.r-bloggers.com/">R-bloggers)</a>      
</div></p>

<div><p>If you haven&#039;t yet had a chance to catch my regularly-scheduled webinar, &quot;Revolution R Enterprise - 100% R and More&quot;, it&#039;s a quick 30-minute introduction to the <a href="http://www.revolutionanalytics.com/what-is-open-source-r/" ref="nofollow" target="_blank">R language</a> and the added features of Revolution R Enterprise. It&#039;s also a chance to ask me any questions you might have about R or Revolution Analytics during the live broadcast (starts at 11AM Pacific time). Details and registration info at the link below.</p>
<p>Revolution Analytics webinars: <a href="http://www.revolutionanalytics.com/news-events/free-webinars/2012/100-percent-r-and-more/" ref="nofollow" target="_blank">Revolution R Enterprise - 100% R and More</a></p></div>

<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on his blog: <strong><a href="http://blog.revolutionanalytics.com/2012/02/webinar-wednesday-introduction-to-revolution-r-enterprise.html"> Revolutions</a></strong>.</div>
<hr />
<a href="http://www.r-bloggers.com/">R-bloggers.com</a> offers <strong><a href="http://feedburner.google.com/fb/a/mailverify?uri=RBloggers">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="http://www.r-project.org/">R</a> news and <a title="R tutorials" href="http://www.r-bloggers.com/?s=tutorial">tutorials</a> on topics such as: visualization (<a title="ggplot and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=ggplot2">ggplot2</a>, <a title="Boxplots using lattice and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=boxplot">Boxplots</a>, <a title="Maps and gis" href="http://www.r-bloggers.com/?s=map">maps</a>, <a title="Animation in R" href="http://www.r-bloggers.com/?s=animation">animation</a>), programming (<a title="RStudio IDE for R" href="http://www.r-bloggers.com/?s=RStudio">RStudio</a>, <a title="Sweave and literate programming" href="http://www.r-bloggers.com/?s=sweave">Sweave</a>, <a title="LaTeX in R" href="http://www.r-bloggers.com/?s=LaTeX">LaTeX</a>, <a title="SQL and databases" href="http://www.r-bloggers.com/?s=SQL">SQL</a>, <a title="Eclipse IDE for R" href="http://www.r-bloggers.com/?s=eclipse">Eclipse</a>, <a title="git and github, Version Control System" href="http://www.r-bloggers.com/?s=git">git</a>, <a title="Large data in R using Hadoop" href="http://www.r-bloggers.com/?s=hadoop">hadoop</a>, <a title="Web Scraping of google, facebook, yahoo, twitter and more using R" href="http://www.r-bloggers.com/?s=Web+Scraping">Web Scraping</a>) statistics (<a title="Regressions and ANOVA analysis tutorials" href="http://www.r-bloggers.com/?s=regression">regression</a>, <a title="principal component analysis tutorial" href="http://www.r-bloggers.com/?s=PCA">PCA</a>, <a title="Time series" href="http://www.r-bloggers.com/?s=time+series">time series</a>,<a title="ecdf" href="http://www.r-bloggers.com/?s=ecdf">ecdf</a>, <a title="finance trading" href="http://www.r-bloggers.com/?s=trading">trading</a>) and more...
</div></p>]]></content:encoded>
			<wfw:commentRss>http://www.r-bloggers.com/webinar-wednesday-introduction-to-revolution-r-enterprise/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Berkeley Earth Surface Temperature: V1.5</title>
		<link>http://www.r-bloggers.com/berkeley-earth-surface-temperature-v1-5/</link>
		<comments>http://www.r-bloggers.com/berkeley-earth-surface-temperature-v1-5/#comments</comments>
		<pubDate>Tue, 21 Feb 2012 18:57:01 +0000</pubDate>
		<dc:creator>Steven Mosher</dc:creator>
				<category><![CDATA[R bloggers]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://stevemosher.wordpress.com/?p=1062</guid>
		<description><![CDATA[My R package designed to import all of the Berkeley Earth Surface temperature data is officially on CRAN, as BerkeleyEarth.  The version there is 1.3 and I&#8217;ve completed some testing with the help of David Vavra. The result of that is version 1.5 which is available here at the drop box. I&#8217;ll be posting that [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=stevemosher.wordpress.com&#38;blog=8212404&#38;post=1062&#38;subd=stevemosher&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
(This article was first published on  <strong><a href="http://stevemosher.wordpress.com/2012/02/21/berkeley-earth-surface-temperature-v1-5/"> Steven Mosher's Blog</a></strong>, and kindly contributed to <a href="http://www.r-bloggers.com/">R-bloggers)</a>      
</div></p>
<p>My R package designed to import all of the Berkeley Earth Surface temperature data is officially on CRAN, as BerkeleyEarth.  The version there is 1.3 and I&#8217;ve completed some testing with the help of David Vavra. The result of that is version 1.5 which is available here at the drop box. I&#8217;ll be posting that to CRAN in a bit.  For anyone who has worked with temperature data from the various sources  Berkeley Earth is a godsend. For the first time we have a dataset that brings together all the open available temperature datasets into one consistent format. The following sources are merged and reconciled.</p>
<ol>
<li>Global Historical Climatology Network – Monthly</li>
<li>Global Historical Climatology Network – Daily</li>
<li>US Historical Climatology Network – Monthly</li>
<li> World Monthly Surface Station Climatology</li>
<li> Hadley Centre / Climate Research Unit Data Collection</li>
<li> US Cooperative Summary of the Month</li>
<li> US Cooperative Summary of the Day</li>
<li> US First Order Summary of the Day</li>
<li> Scientific Committee on Antarctic Research</li>
<li> GSN Monthly Summaries from NOAA</li>
<li> Monthly Climatic Data of the World</li>
<li> GCOS Monthly Summaries from DWD</li>
<li> World Weather Records (only those published since 1961)</li>
<li> Colonial Era Weather Archives.</li>
</ol>
<p>The data files are availabe here: <a href="http://berkeleyearth.org/data/" ref="nofollow" target="_blank">http://berkeleyearth.org/data/</a></p>
<p>Let&#8217;s start with a top level description of the dataflow through the system. All the source data is collected and turned into a common format: <a href="http://berkeleyearth.org/source-files/" ref="nofollow" target="_blank">http://berkeleyearth.org/source-files/</a>.   Those files are then merged into a single file called the &#8220;multi value&#8221; file. In this file every series for every station is present. The data format for all the temperature data files is common: there are 7 columns: Station Id, Series Number, Date, Temperature, Uncertainty, Observations, and Time of Observation. So, in the &#8220;multi-value&#8221; file a single station will have multiple series numbers.  In the next step of the process &#8220;single value&#8221; files are created. There are four versions of these files depending upon the Quality control applied and whether or not seasonality is removed.  Thus there are 5 versions of the  data: Multi value, single value with no QC and no removal of seasonality, single value with QC and no removal&#8230; you get the idea. In addition, the final files are delivered as TMAX, TMIN and TAVG. In other words there are 15 datasets.</p>
<p>The 15 datasets can all be downloaded with version 1.5 of the package using the function downloadBerkeley(). The function is passed a data.frame of Urls to the files and those selected are downloaded and unzipped. In this process the package will create Three top level directories: TAVG, TMIN, and TMAX.  Files are then downloaded to sub directories under the correct directory. It&#8217;s vitally important to keep all directories and file names intact for this package to function. The file named &#8220;data.txt&#8221;, has the same name across all 15 versions, so keeping things organized via directory structure will prevent obvious mistakes. There is a safeguard of sorts in the files themselves. Every file starts with comments that indicate the type of file that it is ( Tmax, multi value ). I&#8217;ve included a function getFileInformation()  that will iterate through a directory and write this information to a local file. That function also  reads all the files and extracts the &#8220;readme&#8221; headers, writing them to a separate &#8220;readme&#8221; subdirectory.</p>
<p>The download takes a while and I suggest you fire it up and leave your system alone while it downloads and unpacks the data. Should you get any warnings  or errors you can always patch things up by hand ( download manually) or call downloadBerkeley() again subsetting the  url data.frame to target those files that get corrupted. That is, on occasion you MAY get a warning that the file size downloaded doesnt match the file description. I suggest patching these up by hand downloading. I could, of course add some code to check and verify the whole downloading process, so ask if you would like that.</p>
<p>Once you have all the files downloaded you are ready to use the rest of the package. Every folder has the same set of files: data related to the stations and the core file &#8220;data.txt&#8221; The station metadata comes in several versions from the bare minimum ( station, lat,lon, altitude) to the complete station descrition. Functions are provided to read every file: They are all named to let you know exactly what file they read   readSiteComplete()  readSiteDetail().  The filenames are all defaulted to the Berkeley defined name. You merely set the Directory name and call the function. All functions return a data.frame with standard R NA&#8217;s used in place of the Berkeley values for NA.  In addition, I&#8217;ve rearranged some of the data columns so that the station inventories can be used with the package RghcnV3.</p>
<p>The big task in all of this is reading the file &#8220;data.txt&#8221;. On the surface it&#8217;s easy. Its a 7 column file that can be read as a matrix, using read.delim(). There are two challenges. The first challenge is the &#8220;sparse&#8221; time format. There are over 44K stations. Some of those stations have a couple months of data, others have data from 1701 on. Some stations have complete records with reports for every month; other stations have gaps in there reporting. Berkeley common format only reports the  months that have data. Like So:</p>
<p>Station    Series     date  Temperature</p>
<p>1               1           1806.042    23</p>
<p>1               1           1925.125     16</p>
<p>If all the dates between 1806 and 1925 have no records ( either absent or dropped because of QC)  then the months are simply missing. There are no NA. This gives us a very compact data storage solution, however, if you want to do any real work you have to put that data into a structure like a time series or a matrix where all times of all stations are aligned. In short, you need to fill in NAs. And you have to do this every time you  read the data in. At some point I expect that people will get that storage is cheap and they will just store NAs where they are needed. Reading in sparse data and filling in NAs is simple, time consuming, and prone to bone headed mistakes. Our second challenge is memory. Once we&#8217;ve expanded the data to include NA we run the risk of blowing thru RAM. Then if we want to calculate on the data we might make intermediate versions. More memory. There isn&#8217;t a simple solution to this, but here is what version 1.5 has. It has three different routines for reading in the data:</p>
<p>readBerkeleyData():  This routine  reads all 7 data columns and does no infilling. Its primary purpose is to create a memory backed file of the data. However, if you want to analyze things like time of observation or number of observations you have to use this function. Also, if you have your own method of &#8220;infilling NA&#8221;   you can use this to grab all the data in its time sparse format. On FIRST read the function will take about 10 minutes to create a file backed version of the matrix using the package bigmemory. Every subsequent use of the call gets you immediate access to the data.bin file it creates.</p>
<p>readBerkeleyTemp(): this routine also creates a file backed matrix. On the very first call it sees if the temperature.bin file exists. Since that file doesnt exists,  it is created. It is created from &#8220;data.txt&#8221; OR &#8220;data.bin&#8221;.  Data.bin is created by readBerkeleyData(). So basically, readBerkeleyTemp() on first pass calls readBerkeleyData(). If readBerkeleyData() hasn&#8217;t been called before, it is called and data.bin is created and returned to readBerkeleyTemp(). The function then proceeds to create a file called temperature.bin.  That file has a column for every station and a row for every time. NAs are put in place. The column names are also used to represent the station Id. Row names are used for time. The Berkeley &#8220;date&#8221; format is changed as well.  This process can take over 2 hours. A buffer variable is provided to control how much data is read in before it is flushed to disk. It is set to 500K.  At some stage This buffer will be optimized to the local RAM actually available. If you have more than 4GB you can play with this number to see if that speeds things up.</p>
<p>Lastly the function  readAsArray() is provided. This function does not create a file backed matrix. It reads in &#8220;data.txt&#8221; and creates a 3D array of temperature only. The first dimension is stations, the second is months and the third  is years. dimnames are provided. This data structure is used by the analytical functions in RghcnV3.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/stevemosher.wordpress.com/1062/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/stevemosher.wordpress.com/1062/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/stevemosher.wordpress.com/1062/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/stevemosher.wordpress.com/1062/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/stevemosher.wordpress.com/1062/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/stevemosher.wordpress.com/1062/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/stevemosher.wordpress.com/1062/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/stevemosher.wordpress.com/1062/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/stevemosher.wordpress.com/1062/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/stevemosher.wordpress.com/1062/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/stevemosher.wordpress.com/1062/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/stevemosher.wordpress.com/1062/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/stevemosher.wordpress.com/1062/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/stevemosher.wordpress.com/1062/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=stevemosher.wordpress.com&amp;blog=8212404&amp;post=1062&amp;subd=stevemosher&amp;ref=&amp;feed=1" width="1" height="1" />
<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on his blog: <strong><a href="http://stevemosher.wordpress.com/2012/02/21/berkeley-earth-surface-temperature-v1-5/"> Steven Mosher's Blog</a></strong>.</div>
<hr />
<a href="http://www.r-bloggers.com/">R-bloggers.com</a> offers <strong><a href="http://feedburner.google.com/fb/a/mailverify?uri=RBloggers">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="http://www.r-project.org/">R</a> news and <a title="R tutorials" href="http://www.r-bloggers.com/?s=tutorial">tutorials</a> on topics such as: visualization (<a title="ggplot and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=ggplot2">ggplot2</a>, <a title="Boxplots using lattice and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=boxplot">Boxplots</a>, <a title="Maps and gis" href="http://www.r-bloggers.com/?s=map">maps</a>, <a title="Animation in R" href="http://www.r-bloggers.com/?s=animation">animation</a>), programming (<a title="RStudio IDE for R" href="http://www.r-bloggers.com/?s=RStudio">RStudio</a>, <a title="Sweave and literate programming" href="http://www.r-bloggers.com/?s=sweave">Sweave</a>, <a title="LaTeX in R" href="http://www.r-bloggers.com/?s=LaTeX">LaTeX</a>, <a title="SQL and databases" href="http://www.r-bloggers.com/?s=SQL">SQL</a>, <a title="Eclipse IDE for R" href="http://www.r-bloggers.com/?s=eclipse">Eclipse</a>, <a title="git and github, Version Control System" href="http://www.r-bloggers.com/?s=git">git</a>, <a title="Large data in R using Hadoop" href="http://www.r-bloggers.com/?s=hadoop">hadoop</a>, <a title="Web Scraping of google, facebook, yahoo, twitter and more using R" href="http://www.r-bloggers.com/?s=Web+Scraping">Web Scraping</a>) statistics (<a title="Regressions and ANOVA analysis tutorials" href="http://www.r-bloggers.com/?s=regression">regression</a>, <a title="principal component analysis tutorial" href="http://www.r-bloggers.com/?s=PCA">PCA</a>, <a title="Time series" href="http://www.r-bloggers.com/?s=time+series">time series</a>,<a title="ecdf" href="http://www.r-bloggers.com/?s=ecdf">ecdf</a>, <a title="finance trading" href="http://www.r-bloggers.com/?s=trading">trading</a>) and more...
</div></p>]]></content:encoded>
			<wfw:commentRss>http://www.r-bloggers.com/berkeley-earth-surface-temperature-v1-5/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/72731ded6aa5ea75024ed11fee92ea8d?s=96&amp;amp;d=identicon&amp;amp;r=G" length="" type="" />
		</item>
		<item>
		<title>polar histogram: pretty and useful</title>
		<link>http://www.r-bloggers.com/polar-histogram-pretty-and-useful/</link>
		<comments>http://www.r-bloggers.com/polar-histogram-pretty-and-useful/#comments</comments>
		<pubDate>Tue, 21 Feb 2012 16:41:39 +0000</pubDate>
		<dc:creator>CL</dc:creator>
				<category><![CDATA[R bloggers]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[histogram]]></category>
		<category><![CDATA[plyr]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://chrisladroue.com/?p=803</guid>
		<description><![CDATA[Do you have tens of histograms to show but no room to put them all on the page? As I was reading this paper in Nature Genetics, I came across a simple and clever way of packing all this information &#8230; <a href="http://chrisladroue.com/2012/02/polar-histogram-pretty-and-useful/">Continue reading <span>&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
(This article was first published on  <strong><a href="http://chrisladroue.com/2012/02/polar-histogram-pretty-and-useful/"> Christophe Ladroue » R</a></strong>, and kindly contributed to <a href="http://www.r-bloggers.com/">R-bloggers)</a>      
</div></p>
<p><a href="http://chrisladroue.com/wp-content/uploads/2012/02/g11948.png" ref="nofollow" target="_blank"><img src="http://chrisladroue.com/wp-content/uploads/2012/02/polarHistogramIcon.png" alt="" title="polarHistogramIcon" width="150" height="200" class="alignleft size-full wp-image-804" /></a>Do you have tens of histograms to show but no room to put them all on the page? As I was reading <a href="http://dx.doi.org/10.1038/ng.1073" ref="nofollow" target="_blank">this paper</a> in Nature Genetics, I came across a simple and clever way of packing all this information in a small space: arrange them all around a circle, and add some guides to help their cross-comparison.</p>
<p>It didn&#8217;t look too difficult to implement in <TT>ggplot2</TT> thanks to polar coordinates and after a busy Saturday afternoon I ended up with the following image with my data (*) (and a <a href="http://chrisladroue.com/wp-content/uploads/2012/02/polarHistogramFudged.pdf" ref="nofollow" target="_blank">poster-ready pdf</a>, after 2 seconds of prettying up with <a href="http://inkscape.org/" ref="nofollow" target="_blank">Inkscape</a>):</p>
<p><a href="http://chrisladroue.com/wp-content/uploads/2012/02/polarHistogramFudged.png" ref="nofollow" target="_blank"><img src="http://chrisladroue.com/wp-content/uploads/2012/02/polarHistogramFudged-300x266.png" alt="" title="polarHistogramFudged" width="300" height="266" class="aligncenter size-medium wp-image-806" /></a></p>
<p>The graph shows the proportion of some SNP scores (&#8216;first&#8217;, &#8216;second&#8217; and &#8216;third&#8217;) for a number of phenotypes, which are grouped by themes. I&#8217;m quite happy with the result. It&#8217;s pretty and useful: it&#8217;s very easy to compare one histogram with any of the other 60. </p>
<p>The code is still a bit rough around the edges; a few things are not terribly elegant or are hard-coded. An improved version will be shipped with our graphical package next month. In the mean-time, <a href='http://chrisladroue.com/wp-content/uploads/2012/02/polarHistogram.R.zip' ref="nofollow" target="_blank">here it is</a>, if you want to try it with your own data. </p>
<p>It returns a <TT>ggplot</TT> object containing the graph. You can either display it, with <TT>print()</TT>, save it as a pdf with <TT>ggsave(&#8220;myPlot.pdf&#8221;)</TT> or modify it with the usual <TT>ggplot2</TT> commands. I&#8217;ve called it polar histogram, which, I think, is self-explanatory. If you know how it&#8217;s actually called, please let me know. <small>(No, I will not call it polR histogram.)</small></p>
<p>And here is some fake data to get you going:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;" ref="nofollow" target="_blank">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;"># fake data for polarHistogram()</span>
<span style="color: #228B22;"># Christophe Ladroue</span>
<span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span>plyr<span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span>ggplot2<span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">source</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;polarHistogram.R&quot;</span><span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;"># a little helper that generates random names for families and items.</span>
randomName<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>n<span style="color: #080;">=</span><span style="color: #ff0000;">1</span>,syllables<span style="color: #080;">=</span><span style="color: #ff0000;">3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#123;</span>
  vowels<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;a&quot;</span>,<span style="color: #ff0000;">&quot;e&quot;</span>,<span style="color: #ff0000;">&quot;i&quot;</span>,<span style="color: #ff0000;">&quot;o&quot;</span>,<span style="color: #ff0000;">&quot;u&quot;</span>,<span style="color: #ff0000;">&quot;y&quot;</span><span style="color: #080;">&#41;</span>
  consonants<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">setdiff</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">letters</span>,vowels<span style="color: #080;">&#41;</span>
  <span style="color: #0000FF; font-weight: bold;">replicate</span><span style="color: #080;">&#40;</span>n,
            <span style="color: #0000FF; font-weight: bold;">paste</span><span style="color: #080;">&#40;</span>
              <span style="color: #0000FF; font-weight: bold;">rbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>consonants,syllables,<span style="color: #0000FF; font-weight: bold;">replace</span><span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span>,
                    <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>vowels,syllables,<span style="color: #0000FF; font-weight: bold;">replace</span><span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
              sep<span style="color: #080;">=</span><span style="color: #ff0000;">''</span>,collapse<span style="color: #080;">=</span><span style="color: #ff0000;">''</span><span style="color: #080;">&#41;</span>
            <span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span>
&nbsp;
  <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">42</span><span style="color: #080;">&#41;</span>
&nbsp;
  nFamily<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">20</span>
  nItemPerFamily<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">6</span>,nFamily,<span style="color: #0000FF; font-weight: bold;">replace</span><span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span>
  nValues<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">3</span>
&nbsp;
  df<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>
    <span style="color: #0000FF; font-weight: bold;">family</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">rep</span><span style="color: #080;">&#40;</span>randomName<span style="color: #080;">&#40;</span>nFamily<span style="color: #080;">&#41;</span>,nItemPerFamily<span style="color: #080;">&#41;</span>,
    item<span style="color: #080;">=</span>randomName<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>nItemPerFamily<span style="color: #080;">&#41;</span>,<span style="color: #ff0000;">2</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
&nbsp;
df<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span>,<span style="color: #0000FF; font-weight: bold;">as.<span style="">data</span>.<span style="">frame</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">matrix</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">runif</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#41;</span><span style="color: #080;">*</span>nValues<span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">ncol</span><span style="color: #080;">=</span>nValues<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
&nbsp;
&nbsp;
  df<span style="color: #080;">&lt;-</span>melt<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span>,<span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;family&quot;</span>,<span style="color: #ff0000;">&quot;item&quot;</span><span style="color: #080;">&#41;</span>,variable_name<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;score&quot;</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># from wide to long</span>
  p<span style="color: #080;">&lt;-</span>polarHistogram<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span>,familyLabel<span style="color: #080;">=</span>FALSE<span style="color: #080;">&#41;</span>
  <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>p<span style="color: #080;">&#41;</span></pre></td></tr></table></div></div>

<p><strong>Options:</strong><br />
Many defaults can be changed already, look at the code for the complete list. The two things you might want to change are <TT>familyLabels</TT> (logical) which displays (or not) the name of each group as well, and direction, which is either &#8216;inwards&#8217; or &#8216;outwards&#8217;. </p>
<p><strong>Coding notes:</strong><br />
It wasn&#8217;t terribly difficult but it did take me a bit longer than expected, for a few reasons:
<ol>
<LI> <TT>coord_polar()</TT> doesn&#8217;t affect the orientation of <TT>geom_text()</TT> so it had to be calculated manually.</LI><br />
<LI>You&#8217;ll notice that the label orientations change between 6 and 9 o&#8217;clock, or they would end up upside down and be difficult to read.</li>
<p><LI>There are some scoping issues with <a href="https://github.com/hadley/plyr/issues/3" ref="nofollow" target="_blank"><TT>plyr</TT></a> and <TT>ggplot2</TT> which can be a bit annoying once you encapsulate your code in a function. For example:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;" ref="nofollow" target="_blank">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;">df<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>
  x<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">runif</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">10</span><span style="color: #080;">&#41;</span>,
  y<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">runif</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">10</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
&nbsp;
z<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">10</span>
ggplot<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>geom_point<span style="color: #080;">&#40;</span>aes<span style="color: #080;">&#40;</span>x<span style="color: #080;">=</span>x<span style="color: #080;">+</span>z,y<span style="color: #080;">=</span>y<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># works</span>
&nbsp;
<span style="color: #0000FF; font-weight: bold;">rm</span><span style="color: #080;">&#40;</span>z<span style="color: #080;">&#41;</span>
fakeFunction<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#123;</span>
  z<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">10</span>
  ggplot<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>geom_point<span style="color: #080;">&#40;</span>aes<span style="color: #080;">&#40;</span>x<span style="color: #080;">=</span>x<span style="color: #080;">+</span>z,y<span style="color: #080;">=</span>y<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  <span style="color: #080;">&#125;</span>
&nbsp;
fakeFunction<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># error</span></pre></td></tr></table></div></div>

<p></LI>
</ol>
<p>Happy plotting!</p>
<p><small>(*) The numbers are fudged, don&#8217;t spend time reverse-engineering them.</small></p>

<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on his blog: <strong><a href="http://chrisladroue.com/2012/02/polar-histogram-pretty-and-useful/"> Christophe Ladroue » R</a></strong>.</div>
<hr />
<a href="http://www.r-bloggers.com/">R-bloggers.com</a> offers <strong><a href="http://feedburner.google.com/fb/a/mailverify?uri=RBloggers">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="http://www.r-project.org/">R</a> news and <a title="R tutorials" href="http://www.r-bloggers.com/?s=tutorial">tutorials</a> on topics such as: visualization (<a title="ggplot and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=ggplot2">ggplot2</a>, <a title="Boxplots using lattice and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=boxplot">Boxplots</a>, <a title="Maps and gis" href="http://www.r-bloggers.com/?s=map">maps</a>, <a title="Animation in R" href="http://www.r-bloggers.com/?s=animation">animation</a>), programming (<a title="RStudio IDE for R" href="http://www.r-bloggers.com/?s=RStudio">RStudio</a>, <a title="Sweave and literate programming" href="http://www.r-bloggers.com/?s=sweave">Sweave</a>, <a title="LaTeX in R" href="http://www.r-bloggers.com/?s=LaTeX">LaTeX</a>, <a title="SQL and databases" href="http://www.r-bloggers.com/?s=SQL">SQL</a>, <a title="Eclipse IDE for R" href="http://www.r-bloggers.com/?s=eclipse">Eclipse</a>, <a title="git and github, Version Control System" href="http://www.r-bloggers.com/?s=git">git</a>, <a title="Large data in R using Hadoop" href="http://www.r-bloggers.com/?s=hadoop">hadoop</a>, <a title="Web Scraping of google, facebook, yahoo, twitter and more using R" href="http://www.r-bloggers.com/?s=Web+Scraping">Web Scraping</a>) statistics (<a title="Regressions and ANOVA analysis tutorials" href="http://www.r-bloggers.com/?s=regression">regression</a>, <a title="principal component analysis tutorial" href="http://www.r-bloggers.com/?s=PCA">PCA</a>, <a title="Time series" href="http://www.r-bloggers.com/?s=time+series">time series</a>,<a title="ecdf" href="http://www.r-bloggers.com/?s=ecdf">ecdf</a>, <a title="finance trading" href="http://www.r-bloggers.com/?s=trading">trading</a>) and more...
</div></p>]]></content:encoded>
			<wfw:commentRss>http://www.r-bloggers.com/polar-histogram-pretty-and-useful/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Multiple Factor Model – Building Risk Model</title>
		<link>http://www.r-bloggers.com/multiple-factor-model-%e2%80%93-building-risk-model/</link>
		<comments>http://www.r-bloggers.com/multiple-factor-model-%e2%80%93-building-risk-model/#comments</comments>
		<pubDate>Tue, 21 Feb 2012 04:59:07 +0000</pubDate>
		<dc:creator>systematicinvestor</dc:creator>
				<category><![CDATA[R bloggers]]></category>
		<category><![CDATA[factor model]]></category>
		<category><![CDATA[factors]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Risk Measures]]></category>

		<guid isPermaLink="false">http://systematicinvestor.wordpress.com/?p=832</guid>
		<description><![CDATA[This is the fourth post in the series about Multiple Factor Models. I will build on the code presented in the prior post, Multiple Factor Model – Building CSFB Factors, and I will show how to build a multiple factor risk model. For an example of the multiple factor risk models, please read following references: [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=systematicinvestor.wordpress.com&#38;blog=28096251&#38;post=832&#38;subd=systematicinvestor&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
(This article was first published on  <strong><a href="http://systematicinvestor.wordpress.com/2012/02/21/multiple-factor-model-building-risk-model/"> Systematic Investor » R</a></strong>, and kindly contributed to <a href="http://www.r-bloggers.com/">R-bloggers)</a>      
</div></p>
<p>This is the fourth post in the series about Multiple Factor Models. I will build on the code presented in the prior post, <a href="http://systematicinvestor.wordpress.com/2012/02/13/multiple-factor-model-building-csfb-factors/" ref="nofollow" target="_blank">Multiple Factor Model – Building CSFB Factors</a>, and I will show how to build a multiple factor risk model. For an example of the  multiple factor risk models, please read following references:</p>
<ul>
<li><a href="http://www.alacra.com/alacra/help/barra_handbook_US.pdf" ref="nofollow" target="_blank">MSCI Barra United States Equity Multi-Factor Model, page 101</a></li>
<li><a href="http://www.northinfo.com/documents/8.pdf" ref="nofollow" target="_blank">Northfield Fundamental Risk Model</a></li>
</ul>
<p>The outline of this post:</p>
<ul>
<li>Run cross sectional regression to estimate factor returns</li>
<li>Compute factor covariance using shrinkage estimator</li>
<li>Forecast stocks specific variances using GARCH(1,1)</li>
<li>Compute portfolio risk using multiple factor model and compare it to the historical standard deviation of portfolio returns.</li>
</ul>
<p>Let’s start by loading the CSFB factors that we saved at the end of the <a href="http://systematicinvestor.wordpress.com/2012/02/13/multiple-factor-model-building-csfb-factors/" ref="nofollow" target="_blank">prior post</a>. [If you are missing data.factors.Rdata file, please execute fm.all.factor.test() function first to create and save CSFB factors.] Next, I will run cross sectional regression to estimate factor returns.</p>
<p><pre class="brush: r;">
###############################################################################
# Load Systematic Investor Toolbox (SIT)
# http://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
    source(con)
close(con)
	#*****************************************************************
	# Load factor data that we saved at the end of the fm.all.factor.test functions
	#****************************************************************** 
	load.packages('quantmod,abind')	
		
	load(file='data.factors.Rdata')
		# remove Composite Average factor
		factors.avg = factors.avg[which(names(factors.avg) != 'AVG')]	
		
	#*****************************************************************
	# Run cross sectional regression to estimate factor returns
	#****************************************************************** 
	nperiods = nrow(next.month.ret)
	n = ncol(next.month.ret)
		
	# create sector dummy variables: binary 0/1 values for each sector
	nsectors = len(levels(sectors))	
	sectors.matrix = array(double(), c(nperiods, n, nsectors))
		dimnames(sectors.matrix)[[3]] = levels(sectors)		
	for(j in levels(sectors)) {
		sectors.matrix[,,j] = matrix(sectors == j,  nr=nperiods, nc=n, byrow=T)
	}
	
	# create matrix for each factor
	factors.matrix = abind(factors.avg, along = 3)		
	
	# combine sector dummies and all factors
	all.data = abind(sectors.matrix, factors.matrix)		
	
	# create betas and specific.return
	beta = all.data[,1,] * NA
	specific.return = next.month.ret * NA
		nfactors = ncol(beta)
		
	# append next.month.ret to all.data			
	all.data = abind(next.month.ret, all.data, along = 3)
		dimnames(all.data)[[3]][1] = 'Ret'
			
	# estimate betas (factor returns)
	for(t in 12:(nperiods-1)) {		
		temp = all.data[t:t,,]
		x = temp[,-c(1:2)]
		y = temp[,1]
		b = lm(y~x)$coefficients
		
		b[2:nsectors] = b[1] + b[2:nsectors]
		beta[(t+1),] = b		
		
		specific.return[(t+1),] = y - rowSums(temp[,-1] * matrix(b, n, nfactors, byrow=T), na.rm=T)	
	}
</pre></p>
<p>To estimate factor returns (betas), we solve for coefficients of the following multiple factor model:
<pre>
Ret = b1 * F1 + b2 * F2 + ... + bn * Fn + e
where
b1...bn are estimated factor returns
F1...Fn are factor exposures. I.e. sector dummies and CSFB factor exposures
e is stock specific return, not captured by factors F1...Fn</pre>
<p>Note that we cannot include the first sector dummy variable in the regression, otherwise we will get a linearly dependent relationship of the first sector dummy variable with all other sector dummy variables. The sector effect of the first sector dummy variable is absorbed into the intercept in the regression.</p>
<p>There are a few alternative ways of estimating this regression. For example, the robust linear model can be estimated with following code:<br />
<pre class="brush: r;">
	load.packages('MASS')
	temp = rlm(y~x)$coefficients
</pre></p>
<p>The quantile regression can can be estimated with following code:<br />
<pre class="brush: r;">
	load.packages('quantreg')
	temp = rq(y ~ x, tau = 0.5)$coefficients
</pre></p>
<p>Next let’s look at the cumulative factor returns.</p>
<p><pre class="brush: r;">
	#*****************************************************************
	# helper function
	#****************************************************************** 	
	fm.hist.plot &lt;- function(temp, smain=NULL) {			
		ntemp = ncol(temp)		
		cols = plota.colors(ntemp)	
		plota(temp, ylim = range(temp), log='y', main=smain)
		for(i in 1:ntemp) plota.lines(temp[,i], col=cols[i])
		plota.legend(colnames(temp), cols, as.list(temp))
	}

	#*****************************************************************
	# Examine historical cumulative factor returns
	#****************************************************************** 	
	temp = make.xts(beta, index(next.month.ret))
		temp = temp['2000::',]
		temp[] = apply(coredata(temp), 2, function(x) cumprod(1 + ifna(x,0)))
	
	fm.hist.plot(temp[,-c(1:nsectors)], 'Factor Returns')
</pre></p>
<p><a href="http://systematicinvestor.files.wordpress.com/2012/02/plot1-small2.png" ref="nofollow" target="_blank"><img src="http://systematicinvestor.files.wordpress.com/2012/02/plot1-small2.png?w=600&#038;h=500" alt="" title="plot1.png.small" width="600" height="500" class="alignnone size-full wp-image-838" /></a></p>
<p>The Price Reversals(PR) and Small Size(SS) factors have done well. </p>
<p>Next let’s estimate the factor covariance matrix over the rolling 24 month window.</p>
<p><pre class="brush: r;">
	load.packages('BurStFin')	
	factor.covariance = array(double(), c(nperiods, nfactors, nfactors))
		dimnames(factor.covariance)[[2]] = colnames(beta)
		dimnames(factor.covariance)[[3]] = colnames(beta)

	# estimate factor covariance
	for(t in 36:(nperiods-1)) {
		factor.covariance[t,,] = var.shrink.eqcor(beta[(t-23):t,])
	}
</pre></p>
<p>Next let’s forecast stocks specific variances using GARCH(1,1). I will use the GARCH estimation routine described in the <a href="http://systematicinvestor.wordpress.com/2012/01/06/trading-using-garch-volatility-forecast/" ref="nofollow" target="_blank">Trading using Garch Volatility Forecast</a> post.</p>
<p><pre class="brush: r;">
	#*****************************************************************
	# Compute stocks specific variance foreasts using GARCH(1,1)
	#****************************************************************** 	
	load.packages('tseries,fGarch')	

	specific.variance = next.month.ret * NA

	for(i in 1:n) {
		specific.variance[,i] = bt.forecast.garch.volatility(specific.return[,i], 24) 
	}
</pre></p>
<p>Now we have all the ingredients to compute a portfolio risk:
<pre>
Portfolio Risk = (common factor variance + specific variance)^0.5
	common factor variance = (portfolio factor exposure) * factor covariance matrix * (portfolio factor exposure)'
	specific variance = (specific.variance)^2 * (portfolio weights)^2</pre>
<p><pre class="brush: r;">
	#*****************************************************************
	# Compute portfolio risk
	#****************************************************************** 
	portfolio = rep(1/n, n)
		portfolio = matrix(portfolio, n, nfactors)
	
	portfolio.risk = next.month.ret[,1] * NA
	for(t in 36:(nperiods-1)) {	
		portfolio.exposure = colSums(portfolio * all.data[t,,-1], na.rm=T)
		
		portfolio.risk[t] = sqrt(
			portfolio.exposure %*% factor.covariance[t,,] %*% (portfolio.exposure) + 
			sum(specific.variance[t,]^2 * portfolio[,1]^2, na.rm=T)
			)
	}
</pre></p>
<p>Next let’s compare portfolio risk estimated using multiple factor risk model with portfolio historical risk.</p>
<p><pre class="brush: r;">
	#*****************************************************************
	# Compute historical portfolio risk
	#****************************************************************** 
	portfolio = rep(1/n, n)
		portfolio = t(matrix(portfolio, n, nperiods))
	
	portfolio.returns = next.month.ret[,1] * NA
		portfolio.returns[] = rowSums(mlag(next.month.ret) * portfolio, na.rm=T)
	
	hist.portfolio.risk = runSD(portfolio.returns, 24)
	
	#*****************************************************************
	# Plot risks
	#****************************************************************** 			
	plota(portfolio.risk['2000::',], type='l')
		plota.lines(hist.portfolio.risk, col='blue')
		plota.legend('Risk,Historical Risk', 'black,blue')
</pre></p>
<p><a href="http://systematicinvestor.files.wordpress.com/2012/02/plot2-small3.png" ref="nofollow" target="_blank"><img src="http://systematicinvestor.files.wordpress.com/2012/02/plot2-small3.png?w=600&#038;h=500" alt="" title="plot2.png.small" width="600" height="500" class="alignnone size-full wp-image-837" /></a></p>
<p>The multiple factor risk model does a decent job of estimating portfolio risk most of the time.</p>
<p>To view the complete source code for this example, please have a look at the <a href="https://github.com/systematicinvestor/SIT/blob/master/R/factor.model.test.r" ref="nofollow" target="_blank">fm.risk.model.test() function in factor.model.test.r at github</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/systematicinvestor.wordpress.com/832/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/systematicinvestor.wordpress.com/832/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/systematicinvestor.wordpress.com/832/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/systematicinvestor.wordpress.com/832/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/systematicinvestor.wordpress.com/832/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/systematicinvestor.wordpress.com/832/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/systematicinvestor.wordpress.com/832/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/systematicinvestor.wordpress.com/832/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/systematicinvestor.wordpress.com/832/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/systematicinvestor.wordpress.com/832/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/systematicinvestor.wordpress.com/832/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/systematicinvestor.wordpress.com/832/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/systematicinvestor.wordpress.com/832/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/systematicinvestor.wordpress.com/832/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=systematicinvestor.wordpress.com&amp;blog=28096251&amp;post=832&amp;subd=systematicinvestor&amp;ref=&amp;feed=1" width="1" height="1" />
<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on his blog: <strong><a href="http://systematicinvestor.wordpress.com/2012/02/21/multiple-factor-model-building-risk-model/"> Systematic Investor » R</a></strong>.</div>
<hr />
<a href="http://www.r-bloggers.com/">R-bloggers.com</a> offers <strong><a href="http://feedburner.google.com/fb/a/mailverify?uri=RBloggers">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="http://www.r-project.org/">R</a> news and <a title="R tutorials" href="http://www.r-bloggers.com/?s=tutorial">tutorials</a> on topics such as: visualization (<a title="ggplot and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=ggplot2">ggplot2</a>, <a title="Boxplots using lattice and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=boxplot">Boxplots</a>, <a title="Maps and gis" href="http://www.r-bloggers.com/?s=map">maps</a>, <a title="Animation in R" href="http://www.r-bloggers.com/?s=animation">animation</a>), programming (<a title="RStudio IDE for R" href="http://www.r-bloggers.com/?s=RStudio">RStudio</a>, <a title="Sweave and literate programming" href="http://www.r-bloggers.com/?s=sweave">Sweave</a>, <a title="LaTeX in R" href="http://www.r-bloggers.com/?s=LaTeX">LaTeX</a>, <a title="SQL and databases" href="http://www.r-bloggers.com/?s=SQL">SQL</a>, <a title="Eclipse IDE for R" href="http://www.r-bloggers.com/?s=eclipse">Eclipse</a>, <a title="git and github, Version Control System" href="http://www.r-bloggers.com/?s=git">git</a>, <a title="Large data in R using Hadoop" href="http://www.r-bloggers.com/?s=hadoop">hadoop</a>, <a title="Web Scraping of google, facebook, yahoo, twitter and more using R" href="http://www.r-bloggers.com/?s=Web+Scraping">Web Scraping</a>) statistics (<a title="Regressions and ANOVA analysis tutorials" href="http://www.r-bloggers.com/?s=regression">regression</a>, <a title="principal component analysis tutorial" href="http://www.r-bloggers.com/?s=PCA">PCA</a>, <a title="Time series" href="http://www.r-bloggers.com/?s=time+series">time series</a>,<a title="ecdf" href="http://www.r-bloggers.com/?s=ecdf">ecdf</a>, <a title="finance trading" href="http://www.r-bloggers.com/?s=trading">trading</a>) and more...
</div></p>]]></content:encoded>
			<wfw:commentRss>http://www.r-bloggers.com/multiple-factor-model-%e2%80%93-building-risk-model/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/f5676a7cfc17017cd99e8266c814227b?s=96&amp;amp;d=identicon&amp;amp;r=G" length="" type="" />
<enclosure url="http://systematicinvestor.files.wordpress.com/2012/02/plot1-small2.png" length="" type="" />
<enclosure url="http://systematicinvestor.files.wordpress.com/2012/02/plot2-small3.png" length="" type="" />
		</item>
		<item>
		<title>Taking a Ride on the Wild Function –  Introducing the dostats package</title>
		<link>http://www.r-bloggers.com/taking-a-ride-on-the-wild-function-%e2%80%93-introducing-the-dostats-package/</link>
		<comments>http://www.r-bloggers.com/taking-a-ride-on-the-wild-function-%e2%80%93-introducing-the-dostats-package/#comments</comments>
		<pubDate>Tue, 21 Feb 2012 02:51:47 +0000</pubDate>
		<dc:creator>andrew</dc:creator>
				<category><![CDATA[R bloggers]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://r.andrewredd.us/?p=64</guid>
		<description><![CDATA[Lately I have been rather productive in my programming and frustrated at the same time. Trying to solve the problems of creating a demographics summary table proved to be a lesson in frustration with R. Since I love R, this was disheartening. I did eventually find the reporttools package which does make a great latex [...]]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
(This article was first published on  <strong><a href="http://r.andrewredd.us/?p=64"> R Blog</a></strong>, and kindly contributed to <a href="http://www.r-bloggers.com/">R-bloggers)</a>      
</div></p>
<p>Lately I have been rather productive in my programming and frustrated at the same time. Trying to solve the problems of creating a demographics summary table proved to be a lesson in frustration with R. Since I love R, this was disheartening. I did eventually find the <a href="http://cran.r-project.org/package=reporttools" ref="nofollow" target="_blank"><code>reporttools</code></a> package which does make a great latex table, but onlyin latex. Also the <a href="http://cran.r-project.org/?package=tables" ref="nofollow" target="_blank"><code>tables</code></a> package looks great, but also not entirely what I was looking for, so I do the first logical thing for an R User when faced with this sort of thing. I created a package to fill in the missing functionality.</p>
<h2 id="the-dostats-packagefunction">The <code>dostats</code> package/function</h2>
<p>The new package is <a href="http://cran.r-project.org/?package=dostats" ref="nofollow" target="_blank"><code>dostats</code></a>. There are two functions of the package.</p>
<ol style="list-style-type: decimal">
<li>Create summaries of vectors through the <code>dostats</code> function.</li>
<li>Manipulate functions.</li>
</ol>
<p>The package started out with the <code>dostats</code> function for creating more informative summary tables. It works very similar with <code>tabular</code> from <code>tables</code> package, but it is designed to work with <a href="http://cran.r-project.org/?package=plyr" ref="nofollow" target="_blank"><code>plyr</code></a> functions. The idea is to pass in a vector as the first argument and then the remaining arguments are functions that compute statistics on the vector. For example:</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(dostats)
<span class="kw">set.seed</span>(<span class="dv">20120220</span>)
<span class="kw">dostats</span>(<span class="kw">rnorm</span>(<span class="dv">100</span>), mean, sd, <span class="dt">N =</span> length)</code></pre>
<pre><code>##     mean     sd   N
## 1 0.0775 0.8975 100</code></pre>
<p>There is also the renaming construct built in to create the desired variables. This construct is nice because it facilitates easily passing as an argument into <code>ldply</code> such as</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(plyr)
<span class="kw">ldply</span>(mtcars, dostats, mean, sd, IQR)</code></pre>
<pre><code>##     .id     mean       sd     IQR
## 1   mpg  20.0906   6.0269   7.375
## 2   cyl   6.1875   1.7859   4.000
## 3  disp 230.7219 123.9387 205.175
## 4    hp 146.6875  68.5629  83.500
## 5  drat   3.5966   0.5347   0.840
## 6    wt   3.2172   0.9785   1.029
## 7  qsec  17.8487   1.7869   2.008
## 8    vs   0.4375   0.5040   1.000
## 9    am   0.4062   0.4990   1.000
## 10 gear   3.6875   0.7378   1.000
## 11 carb   2.8125   1.6152   2.000</code></pre>
<p>This makes for a more logical summary <code>data.frame</code> object that has usable columns, each with the same data type. Unfortunatly this does not always work for all data set. The above example only has numerical data. Any data frame with categorigal data would have that data treated as categorical. Another limitation is that the results of each function must be the same dimention for each variable. For this reason I introduced functions that filter by the variable class.</p>
<ul>
<li><code>class.stats</code> creates a dostats function for a given class, tested by <code>inherits</code>.</li>
<li><code>integer.stats</code> predefined class stats for integer variables. This defined as <code>class.stats('integer')</code></li>
<li><code>numeric.stats</code> for numeric variables, which would also include integer variables.</li>
<li><code>factor.stats</code> for factors.</li>
</ul>
<p>When a <code>class.stats</code> function is passed to ldply, variable not matching that class are silently removed.</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ldply</span>(iris, numeric.stats, mean, sd)</code></pre>
<pre><code>##            .id  mean     sd
## 1 Sepal.Length 5.843 0.8281
## 2  Sepal.Width 3.057 0.4359
## 3 Petal.Length 3.758 1.7653
## 4  Petal.Width 1.199 0.7622</code></pre>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ldply</span>(iris, factor.stats, <span class="dt">N =</span> length)</code></pre>
<pre><code>##       .id   N
## 1 Species 150</code></pre>
<p>You can also chain together arguments to compute on subsets using <code>ddply</code> and <code>ldply</code>.</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ddply</span>(iris, .(Species), ldply, numeric.stats,
    mean, median, sd)</code></pre>
<pre><code>##       Species          .id  mean median     sd
## 1      setosa Sepal.Length 5.006   5.00 0.3525
## 2      setosa  Sepal.Width 3.428   3.40 0.3791
## 3      setosa Petal.Length 1.462   1.50 0.1737
## 4      setosa  Petal.Width 0.246   0.20 0.1054
## 5  versicolor Sepal.Length 5.936   5.90 0.5162
## 6  versicolor  Sepal.Width 2.770   2.80 0.3138
## 7  versicolor Petal.Length 4.260   4.35 0.4699
## 8  versicolor  Petal.Width 1.326   1.30 0.1978
## 9   virginica Sepal.Length 6.588   6.50 0.6359
## 10  virginica  Sepal.Width 2.974   3.00 0.3225
## 11  virginica Petal.Length 5.552   5.55 0.5519
## 12  virginica  Petal.Width 2.026   2.00 0.2747</code></pre>
<h2 id="function-manipulations">Function manipulations</h2>
<p>Passing all these functions around also requires some extra function manipulation functions. Now that is a mouthful, but something we do with R.</p>
<h3 id="composition">Composition</h3>
<p>R lacks a function composition function. So I created one. <code>function(x)any(is.na(x))</code> is just to long to type, and I find myself doing things like this far too often. The word “function” is just too long to type and takes up lots of space. It is much easier to do <code>any%.%is.na</code> or <code>compose(any, is.na)</code> either of which results in a function that creates a new function testing if there are any missing values. The two forms are</p>
<ol style="list-style-type: decimal">
<li><code>compose(...)</code></li>
<li><code>fun1%.%fun2</code></li>
</ol>
<p><code>compose</code> takes any number of arguments and nests them with the right most being the inner most and the left being the outermost. The easy to remember is that they read the same as when they were input.</p>
<h3 id="argument-manipulations">Argument Manipulations</h3>
<p>Composition and dostats, only operate on the first argument which necessitates functions for manipulating arguments.</p>
<ol style="list-style-type: decimal">
<li><code>wargs</code>: creates a new function with changed defaults. An example would be <code>wargs(mean, rm.na=T)</code> creates a new function that automatically removes missing values.</li>
<li><code>onarg</code>: Specifies the first argument for the function. Such as <code>onarg(rep,'times')</code> makes the number of times to repeate the first argument.</li>
</ol>
<p>One example of this that is included in <code>dostats</code> is the <code>contains</code> and <code>%contains%</code> which is the reverse order of <code>%in%</code>.</p>
<h2 id="conclussion">Conclussion</h2>
<p>There will likely be more functions as I come across the necessity. If you have an idea that should be included submit to the issues <a href="http://github.com/halpo/dostats/issues" ref="nofollow" target="_blank">tracker</a>.</p>

<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on his blog: <strong><a href="http://r.andrewredd.us/?p=64"> R Blog</a></strong>.</div>
<hr />
<a href="http://www.r-bloggers.com/">R-bloggers.com</a> offers <strong><a href="http://feedburner.google.com/fb/a/mailverify?uri=RBloggers">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="http://www.r-project.org/">R</a> news and <a title="R tutorials" href="http://www.r-bloggers.com/?s=tutorial">tutorials</a> on topics such as: visualization (<a title="ggplot and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=ggplot2">ggplot2</a>, <a title="Boxplots using lattice and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=boxplot">Boxplots</a>, <a title="Maps and gis" href="http://www.r-bloggers.com/?s=map">maps</a>, <a title="Animation in R" href="http://www.r-bloggers.com/?s=animation">animation</a>), programming (<a title="RStudio IDE for R" href="http://www.r-bloggers.com/?s=RStudio">RStudio</a>, <a title="Sweave and literate programming" href="http://www.r-bloggers.com/?s=sweave">Sweave</a>, <a title="LaTeX in R" href="http://www.r-bloggers.com/?s=LaTeX">LaTeX</a>, <a title="SQL and databases" href="http://www.r-bloggers.com/?s=SQL">SQL</a>, <a title="Eclipse IDE for R" href="http://www.r-bloggers.com/?s=eclipse">Eclipse</a>, <a title="git and github, Version Control System" href="http://www.r-bloggers.com/?s=git">git</a>, <a title="Large data in R using Hadoop" href="http://www.r-bloggers.com/?s=hadoop">hadoop</a>, <a title="Web Scraping of google, facebook, yahoo, twitter and more using R" href="http://www.r-bloggers.com/?s=Web+Scraping">Web Scraping</a>) statistics (<a title="Regressions and ANOVA analysis tutorials" href="http://www.r-bloggers.com/?s=regression">regression</a>, <a title="principal component analysis tutorial" href="http://www.r-bloggers.com/?s=PCA">PCA</a>, <a title="Time series" href="http://www.r-bloggers.com/?s=time+series">time series</a>,<a title="ecdf" href="http://www.r-bloggers.com/?s=ecdf">ecdf</a>, <a title="finance trading" href="http://www.r-bloggers.com/?s=trading">trading</a>) and more...
</div></p>]]></content:encoded>
			<wfw:commentRss>http://www.r-bloggers.com/taking-a-ride-on-the-wild-function-%e2%80%93-introducing-the-dostats-package/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DNA methylation  (RRBS or target capture) analysis with R</title>
		<link>http://www.r-bloggers.com/dna-methylation-rrbs-or-target-capture-analysis-with-r/</link>
		<comments>http://www.r-bloggers.com/dna-methylation-rrbs-or-target-capture-analysis-with-r/#comments</comments>
		<pubDate>Tue, 21 Feb 2012 01:46:00 +0000</pubDate>
		<dc:creator>l2n</dc:creator>
				<category><![CDATA[R bloggers]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.r-bloggers.com/?guid=56ac28c72272793a9f3bec38e5c1f0cf</guid>
		<description><![CDATA[Reduced Representation Bisulfite sequencing (RRBS) is a popular technique for measuring methylation levels across genome. Although it does not have the full genome coverage, it covers many important regions for methylation. Below, I shared a tutorial a...]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
(This article was first published on  <strong><a href="http://zvfak.blogspot.com/2012/02/methylation-data-rrbs-or-target-capture.html"> Recipes, scripts and genomics</a></strong>, and kindly contributed to <a href="http://www.r-bloggers.com/">R-bloggers)</a>      
</div></p>
Reduced Representation Bisulfite sequencing (RRBS) is a popular technique for measuring methylation levels across genome. Although it does not have the full genome coverage, it covers many important regions for methylation. Below, I shared a tutorial and slides on how to do basic RRBS analysis using R. The same methods can also be used on the aligned reads obtained from Agilent SureSelect 
<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on his blog: <strong><a href="http://zvfak.blogspot.com/2012/02/methylation-data-rrbs-or-target-capture.html"> Recipes, scripts and genomics</a></strong>.</div>
<hr />
<a href="http://www.r-bloggers.com/">R-bloggers.com</a> offers <strong><a href="http://feedburner.google.com/fb/a/mailverify?uri=RBloggers">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="http://www.r-project.org/">R</a> news and <a title="R tutorials" href="http://www.r-bloggers.com/?s=tutorial">tutorials</a> on topics such as: visualization (<a title="ggplot and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=ggplot2">ggplot2</a>, <a title="Boxplots using lattice and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=boxplot">Boxplots</a>, <a title="Maps and gis" href="http://www.r-bloggers.com/?s=map">maps</a>, <a title="Animation in R" href="http://www.r-bloggers.com/?s=animation">animation</a>), programming (<a title="RStudio IDE for R" href="http://www.r-bloggers.com/?s=RStudio">RStudio</a>, <a title="Sweave and literate programming" href="http://www.r-bloggers.com/?s=sweave">Sweave</a>, <a title="LaTeX in R" href="http://www.r-bloggers.com/?s=LaTeX">LaTeX</a>, <a title="SQL and databases" href="http://www.r-bloggers.com/?s=SQL">SQL</a>, <a title="Eclipse IDE for R" href="http://www.r-bloggers.com/?s=eclipse">Eclipse</a>, <a title="git and github, Version Control System" href="http://www.r-bloggers.com/?s=git">git</a>, <a title="Large data in R using Hadoop" href="http://www.r-bloggers.com/?s=hadoop">hadoop</a>, <a title="Web Scraping of google, facebook, yahoo, twitter and more using R" href="http://www.r-bloggers.com/?s=Web+Scraping">Web Scraping</a>) statistics (<a title="Regressions and ANOVA analysis tutorials" href="http://www.r-bloggers.com/?s=regression">regression</a>, <a title="principal component analysis tutorial" href="http://www.r-bloggers.com/?s=PCA">PCA</a>, <a title="Time series" href="http://www.r-bloggers.com/?s=time+series">time series</a>,<a title="ecdf" href="http://www.r-bloggers.com/?s=ecdf">ecdf</a>, <a title="finance trading" href="http://www.r-bloggers.com/?s=trading">trading</a>) and more...
</div></p>]]></content:encoded>
			<wfw:commentRss>http://www.r-bloggers.com/dna-methylation-rrbs-or-target-capture-analysis-with-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>weird [lack of] control…</title>
		<link>http://www.r-bloggers.com/weird-lack-of-control%e2%80%a6/</link>
		<comments>http://www.r-bloggers.com/weird-lack-of-control%e2%80%a6/#comments</comments>
		<pubDate>Mon, 20 Feb 2012 23:12:04 +0000</pubDate>
		<dc:creator>xi'an</dc:creator>
				<category><![CDATA[R bloggers]]></category>
		<category><![CDATA[dummy variable]]></category>
		<category><![CDATA[for loop]]></category>
		<category><![CDATA[loops]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[University life]]></category>

		<guid isPermaLink="false">http://xianblog.wordpress.com/?p=14794</guid>
		<description><![CDATA[When I ran I was expecting the same output as So this means that the dummy index in R &#8220;for&#8221; loops cannot be tweaked that easily. I seem to remember doing this kind of (dirty) tricks with earlier versions&#8230; Now, Alessandra and Robin think this is a good thing that the for loop is robust [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xianblog.wordpress.com&#38;blog=5051449&#38;post=14794&#38;subd=xianblog&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
(This article was first published on  <strong><a href="http://xianblog.wordpress.com/2012/02/21/weird-lack-of-control/"> Xi'an's Og » R</a></strong>, and kindly contributed to <a href="http://www.r-bloggers.com/">R-bloggers)</a>      
</div></p>
<p style="text-align:justify;"><strong><img class="aligncenter size-full wp-image-14901" style="margin-top:5px;margin-bottom:5px;" title="(from my office: La Défense, Feb. 10, 2012)" src="http://xianblog.files.wordpress.com/2012/02/dscn1921-e1329726991744.jpg?w=450&#038;h=101" alt="" width="450" height="101" />W</strong>hen I ran</p>
<p><pre class="brush: r; gutter: false;">
&gt; test=NULL
&gt; for (i in 1:10){
+   if (i%%2!=0){
+     test=c(test,i)
+     i=i+2}}
&gt; test
[1] 1 3 5 7 9
</pre></p>
<p>I was expecting the same output as</p>
<p><pre class="brush: r; gutter: false;">
&gt; test=NULL
&gt; i=1
&gt; while (i test
[1] 1 5 9
</pre></p>
<p style="text-align:justify;">So this means that the dummy index in R &#8220;for&#8221; loops cannot be tweaked that easily. I seem to remember doing this kind of (dirty) tricks with earlier versions&#8230; Now, Alessandra and Robin think this is a good thing that the <strong><em>for</em></strong> loop is robust against this kind of non-sense, so I may be a minority in complaining about this lack of control [for me, if not for <strong><em>for</em></strong>].</p>
<br />Filed under: <a href='http://xianblog.wordpress.com/category/statistics/r-statistics/' ref="nofollow" target="_blank">R</a>, <a href='http://xianblog.wordpress.com/category/university-life/' ref="nofollow" target="_blank">University life</a> Tagged: <a href='http://xianblog.wordpress.com/tag/dummy-variable/' ref="nofollow" target="_blank">dummy variable</a>, <a href='http://xianblog.wordpress.com/tag/for-loop/' ref="nofollow" target="_blank">for loop</a>, <a href='http://xianblog.wordpress.com/tag/loops/' ref="nofollow" target="_blank">loops</a>, <a href='http://xianblog.wordpress.com/tag/r/' ref="nofollow" target="_blank">R</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/xianblog.wordpress.com/14794/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/xianblog.wordpress.com/14794/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/xianblog.wordpress.com/14794/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/xianblog.wordpress.com/14794/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/xianblog.wordpress.com/14794/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/xianblog.wordpress.com/14794/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/xianblog.wordpress.com/14794/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/xianblog.wordpress.com/14794/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/xianblog.wordpress.com/14794/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/xianblog.wordpress.com/14794/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/xianblog.wordpress.com/14794/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/xianblog.wordpress.com/14794/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/xianblog.wordpress.com/14794/" ref="nofollow" target="_blank"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/xianblog.wordpress.com/14794/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xianblog.wordpress.com&amp;blog=5051449&amp;post=14794&amp;subd=xianblog&amp;ref=&amp;feed=1" width="1" height="1" />
<p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on his blog: <strong><a href="http://xianblog.wordpress.com/2012/02/21/weird-lack-of-control/"> Xi'an's Og » R</a></strong>.</div>
<hr />
<a href="http://www.r-bloggers.com/">R-bloggers.com</a> offers <strong><a href="http://feedburner.google.com/fb/a/mailverify?uri=RBloggers">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="http://www.r-project.org/">R</a> news and <a title="R tutorials" href="http://www.r-bloggers.com/?s=tutorial">tutorials</a> on topics such as: visualization (<a title="ggplot and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=ggplot2">ggplot2</a>, <a title="Boxplots using lattice and ggplot2 tutorials" href="http://www.r-bloggers.com/?s=boxplot">Boxplots</a>, <a title="Maps and gis" href="http://www.r-bloggers.com/?s=map">maps</a>, <a title="Animation in R" href="http://www.r-bloggers.com/?s=animation">animation</a>), programming (<a title="RStudio IDE for R" href="http://www.r-bloggers.com/?s=RStudio">RStudio</a>, <a title="Sweave and literate programming" href="http://www.r-bloggers.com/?s=sweave">Sweave</a>, <a title="LaTeX in R" href="http://www.r-bloggers.com/?s=LaTeX">LaTeX</a>, <a title="SQL and databases" href="http://www.r-bloggers.com/?s=SQL">SQL</a>, <a title="Eclipse IDE for R" href="http://www.r-bloggers.com/?s=eclipse">Eclipse</a>, <a title="git and github, Version Control System" href="http://www.r-bloggers.com/?s=git">git</a>, <a title="Large data in R using Hadoop" href="http://www.r-bloggers.com/?s=hadoop">hadoop</a>, <a title="Web Scraping of google, facebook, yahoo, twitter and more using R" href="http://www.r-bloggers.com/?s=Web+Scraping">Web Scraping</a>) statistics (<a title="Regressions and ANOVA analysis tutorials" href="http://www.r-bloggers.com/?s=regression">regression</a>, <a title="principal component analysis tutorial" href="http://www.r-bloggers.com/?s=PCA">PCA</a>, <a title="Time series" href="http://www.r-bloggers.com/?s=time+series">time series</a>,<a title="ecdf" href="http://www.r-bloggers.com/?s=ecdf">ecdf</a>, <a title="finance trading" href="http://www.r-bloggers.com/?s=trading">trading</a>) and more...
</div></p>]]></content:encoded>
			<wfw:commentRss>http://www.r-bloggers.com/weird-lack-of-control%e2%80%a6/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/ba847ef5873101769043f6260d57282a?s=96&amp;amp;d=http://s0.wp.com/i/mu.gif" length="" type="" />
<enclosure url="http://xianblog.files.wordpress.com/2012/02/dscn1921-e1329726991744.jpg" length="" type="" />
		</item>
	</channel>
</rss>

