The next meeting of the Knoxville R User’s Group will consist of four 20-minute talks followed by an open planning session. It will take place on Friday, November 1, from 2:00 p.m. to 4:00 p.m. at The University of Tennessee, Haslam Business Administration Building, room 403 (1000 Volunteer Blvd., Knoxville, TN). RSVP at http://www.meetup.com/Knoxville-R-Users-Group. The topics and biographical information regarding the speakers are listed below.
Automated Forecasting using R: A Stock Market Example (2:00-2:20)
R’s forecast package can be used to generate automated ARIMA model forecasts in a method comparable to SAS Forecast Server. This talk will demonstrate how to use the R ‘quantmod’ package to query financial data from Yahoo finance and then utilize the data in the forecast package to automatically produce point forecasts and prediction intervals. Examples of how to use each package, including diagnostic plots and results, will be included.
Josh Price earned a BS and MS in statistics, both from the University of Tennessee. While working on his Master’s, he worked as a graduate assistant for Research Computing Support. After graduating, Josh worked for 7 years in industry as a consultant in both business and engineering. In January 2013, he returned to UT to work as a statistical consultant where he assists students, faculty, and staff with statistical aspects of their theses, dissertations and various research projects. Josh’s current interests include programming, forecasting methods, and quantitative finance.
BioGeoBEARS: An R package for inference and model testing in historical biogeography (2:20-2:40)
Phylogenetic biogeography is traditionally concerned with the inference of ancestral geographic ranges on a phylogeny, and of inferring the history of events that lead to present-day distributions. The field has been dominated for decades by debates about whether vicariance or dispersal is the dominant process. This talk will demonstrate, using BioGeoBEARS, that assumptions about the processes can be subject to statistical inference from the data, and show that founder-event speciation is a crucial process that has been left out of the current biogeography programs DIVA, LAGRANGE, and BayArea.
Nicholas J. Matzke is a Postdoctoral Fellow in Mathematical Biology at the National Institute for Mathematical and Biological Synthesis (NIMBioS, www.nimbios.org)) at UT Knoxville, and a member of Brian O’Meara’s lab in the Department of Ecology and Evolutionary Biology. He is also the author of the BioGeoBEARS package.
Elevating R to Supercomputers (2:40-3:00)
The biggest supercomputing platforms in the world are distributed memory machines, but the overwhelming majority of the development for parallel R infrastructure has been devoted to small shared memory machines. Additionally, most of this development focuses on task parallelism, rather than data parallelism. But as big data analytics becomes ever more attractive to both users and developers, it becomes increasingly necessary for R to add distributed computing infrastructure to support this kind of big data analytics which utilize large distributed resources. The Programming with Big Data in R (pbdR) project aims to provide such infrastructure, elevating the R language to these massive-scale computing platforms. This talk will cover some of the early successes of the pbdR project, benchmarks, challenges, and future plans.
Drew Schmidt is a researcher at the University of Tennessee’s National Institute for Computational Sciences, and is primarily interested in the intersection of mathematics, statistics, and high-performance computing. He is co-lead developer of the Programming with Big Data in R (pbdR) project, which elevates the statistics programming language R to large distributed computing platforms.
Analyzing Data by Group Using R’s plyr Package (3:10-3:30)
A common data analysis task is repeating the analysis for groups within your data set. In most analytics software, this is made trivial by the addition of a single statement, such as SAS’ “BY GROUP”. However, in R you must write a function and apply it by group. That function can be simple if you’re simply looking to print the results. However, if you wish to analyze those results further, you may need a series of function to apply. We’ll go over an example of each case, showing why it goes so quickly from simple to complex. This talk will use various tools from the popular plyr package to apply the functions.
Bob Muenchen is the author of R for SAS and SPSS Users and, with Joseph M. Hilbe, R for Stata Users. He is also the creator of r4stats.com, a popular web site devoted to helping people learn R. Bob is an Accredited Professional Statistician™ with 32 years of experience and is currently the manager of OIT Research Support (formerly the Statistical Consulting Center) at the University of Tennessee. He has conducted research for a variety of public and private organizations and has assisted on more than 1,000 graduate theses and dissertations. He has written or coauthored over 60 articles published in scientific journals and conference proceedings.
Bob has served on the advisory boards of SAS Institute, SPSS Inc., the Statistical Graphics Corporation and PC Week Magazine. His suggested improvements have been incorporated into SAS, SPSS, JMP, STATGRAPHICS and several R packages. His research interests include statistical computing, data graphics and visualization, text analysis, data mining, psychometrics and resampling.
Quo Vadis KRUG? (3:30-4:00)
The Knoxville R User’s Group, or KRUG, started off with a series of workshops but it’s well past time to discuss where KRUGgers would like to take it. How often should we meet? How long should the talks be? Is the Friday afternoon timeslot good? Is meeting at UT sufficient, or should we move the meeting around (anyone have space?) Everything is up for discussion, so we’ll devote this final session to mull it over.