by James Paul Peruvankal, Senior Program Manager at Revolution Analytics
At Revolution Analytics, we are always interested in how people teach and learn R, and what makes R so popular, yet ‘quirky’ to learn. To get some insight from a real pro we interviewed Bob Muenchen. Bob is the author of R for SAS and SPSS Users and, with Joseph M. Hilbe, R for Stata Users. He is also the creator of r4stats.com, a popular web site devoted to analyzing trends in analytics software and helping people learn the R language. Bob is an Accredited Professional Statistician™ with 30 years of experience and is currently the manager of OIT Research Support (formerly the Statistical Consulting Center) at the University of Tennessee. He has conducted research for a variety of public and private organizations and has assisted on more than 1,000 graduate theses and dissertations. He has written or coauthored over 60 articles published in scientific journals and conference proceedings.
Bob has served on the advisory boards of SAS Institute, SPSS Inc., StatAce OOD, the Statistical Graphics Corporation and PC Week Magazine. His suggested improvements have been incorporated into SAS, SPSS, JMP, STATGRAPHICS and several R packages. His research interests include statistical computing, data graphics and visualization, text analysis, data mining, psychometrics and resampling.
James: How did you get started teaching people how to use statistical software?
Bob: When I came to UT in 1979, many people were switching from either FORTRAN or SPSS to SAS. There was quite a lot of demand for SAS training, and I enjoyed teaching the workshops. Back then SAS could save results, like residuals or predicted values, much more easily than SPSS, which drove the switch.
When the Windows version of SPSS came out, people started switching back. The SPSS user interface designer, Sheri Gilley, really understood what ease of use was all about, and the SAS folks didn’t get that until quite recently. I was just as happy teaching the SPSS workshops. However, many SPSS users at UT avoid programming, which I think is a big mistake. Pointing-and-clicking your way through an analysis can be a time-saving way to work, but I always keep the program so I have a record of what I did.
I started teaching R workshops in 2005 and attendance was quite sparse. Now it’s one of our Research Computing Support team’s most popular topics.
James: Is there anything special about teaching people how to use R, any particular difficulties?
Bob: In other analytics software, the focus is on variables. It sounds too simple to even bother saying: “Every procedure accepts variables.” There are very few ways to specify them, such as by simple name, A, B, C, or lists like A TO Z or A—Z.
Rather than just variables, R has a variety of objects such as vectors, factors and matrices. Some procedures (called functions in R) require particular kinds of objects and there are many more ways to specify which objects to use. From a new user's perspective that may seem like needless complexity. However it provides significant benefits. Once an R user has defined a categorical variable as a factor, analyses will then try to “do the right thing” with that variable. For instance, you could include it in a regression equation and R would create the indicator variables needed to handle a categorical variable automatically.
Another important benefit to R’s object orientation is that it allows a total merger of what would normally be a separate matrix language into the main language of R. This attracts developers, who are helping grow R’s capabilities very rapidly.
James: How do you handle such a broad range of backgrounds in your classes?
Bob: The workshop participants do come from a very wide range of fields, but they share a common set of knowledge: what a variable is, how to analyze data, and so on. So I save a great deal of time by not having to explain all that. Instead, I redirect it into pointing out where R is likely to surprise them. You can have variables that are not in a data set? That’s a bizarre concept to a SAS, SPSS or Stata user. You can have X in one data set and Y in another, but include both in the same regression model? That sounds very strange at first and, of course, it’s quite risky if you’re not careful. I introduce most topics with, “You’re expecting this, but here comes something very different…”. Different doesn’t necessarily mean better, of course. SAS, SPSS and Stata are all top-quality packages and they do some things with less effort. I love R, but I like to point out where I think the others do a better job.
James: How do you find teaching online compared to classroom courses?
Bob: I teach my workshops in-person at The University of Tennessee and I’ve taught at the American Statistical Association’s Joint Statistical Meeting as well as the UseR! Conference. Teaching “live” is great fun, and being able to see the participants’ expressions is helpful in adjusting the presentation pace and knowing when to stop and ask for questions.
However, live workshops have major drawbacks. Travel costs can easily exceed the fee for a workshop, but worse, minimizing those expenses means cramming too much material into a short timeframe. That’s why I teach my webinars in half-day stretches skipping a day in between. We break every hour and fifteen minutes so people can relax. On their days off they can catch up on their regular work, review the workshop material, work on the exercises and email me with questions. At the end of a live workshop people are happy but exhausted and they leave quickly. At the end of a webinar-based workshop, they often stay for a long time afterwards asking questions. I stay online as long as it takes to answer them all.
James: Some people like learning actively, with their hands on the keyboard. Others prefer to focus more on what’s being said and taking notes. How do you handle these styles?
Bob: This is an excellent question! When I take a workshop myself, I usually prefer hands-on but sometimes I don’t. Each of my workshop attendees receives setup instructions a week early so their computer has the software installed and the files in the right place by the time we start. They’re ready for whichever learning style they prefer.
For hands-on learners, I use a single R program that contains the course notes as programming comments interspersed with executable code. Since the “slides” are right in front of them, they never need to take their eyes off their screens. The examples are designed to be easy to covert to their own projects. They build in a step-by-step fashion, going from simple to more complex to make sure no one gets lost. Participants can run each example as I cover it, and see the results on their own computers.
For people focused more on listening and taking notes, everyone also has a complete set of slides. The slides have the notes that describe each concept, then the code for it, followed by the output. The notes follow a numbering scheme that is used in both the program and the handout. This way, both types of learners stay in sync.
This dual approach has another benefit. It’s very easy to switch from one style to the other at any time. If someone gets tired of typing, or his or her computer malfunctions, switching to the notes is seamless. Conversely, if someone is following the printed notes and want switch to run an example, it’s very easy to find.
James: What motivated you to start writing books?
Bob: I’ve always enjoyed writing newsletter and journal articles. My books on R started out just as a set of notes that I kept for myself. When I put them online, they started getting thousands of hits and Springer called to ask if I could make it a book. I really didn’t think I had enough information, but it kept growing. The second edition of R for SAS and SPSS Users is 686 pages and I have notes on a few topics that I wish I had added. If I ever find time for a third edition, they’ll be in there.
James: Thank you Bob for your time!