One of the more challenging things about beginning graduate school was learning what tools and software I needed in order to work efficiently. Unlike college where software requirements were laid out in front of me and everyone seemed to use the same tools, in graduate school the obscurity of the tools, as well as the number of options, seemed to multiply. Additionally since, like most Biostat students in my department, I came from a math background instead of a computing background, I had to drastically increase my computer literacy in a very short period of time. To help new students, a few of us more senior Biostat students have taken to presenting the tools we use in our departmental “Computing Club,” but a blog post really makes more sense. So without further ado, and in the style of one of my favorite blogs, I present…
The Setup (Part 1):
What hardware do you use?
I’m one of the last people in my department clinging to my PC. It’s absolutely true that Macs make it easier to get up and running with the common tools Biostatisticians need, but it’s also true that you can get all of those tools on Windows with a bit of initial work (and in my opinion you can even get more functionality and customization). I have a Lenovo ThinkPad X201 Tablet. This is my second tablet – I got the first because I wanted to take digital notes, which I did for all of grad school with some success. I ended up printing all of my notes out for studying, so I’m not sure it was more convenient than, say, scanning in my notes afterwards (a mammoth project I’m currently undertaking with all of my college coursework). However the tablet has been absolutely essential for teaching, and is very nice for annotating papers. I once bought a Mac that was not a tablet but then panicked and got this computer instead. It’s hard to take away functionality! I hook it up daily to an external monitor and use that huge Microsoft ergonomic keyboard and mouse. I’m actually not sure how so many people do not use external keyboards when they use computers so heavily. Perhaps I just have sensitive wrists.
And what software?
I do all of my research in R, which is really the academic norm for research statistics (and certainly the norm at Hopkins). I do all of my scripting in my beloved Notepad++, which becomes infinitely more awesome by using the little script NppToR. An amazing resource that Hopkins provides for it’s community is a high-powered, well backed-up computing cluster (so essentially all of my research is done on an extremely local cloud). To use the cluster you need to have an ssh client and an SCP protocol. Drastically oversimplified, what this means for me is 1) I have to have something I can open R with or run batch jobs on (the ssh client), and 2) I have to have somewhere you can drag my files. I use PuTTY for the ssh client and WinSCP for the SCP protocol. My workflow is that I open PuTTY to log onto the cluster and then log into R. Then I open WinSCP and open up my R scripts using Notepad++, and send the code to R by hitting F9 (yay NppToR!). If I’m running a batch job I open a second PuTTY window to submit those to. One last detail is that you must install and run Xming in order for graphs in R on the cluster to work. I think most students do more locally than I do, but I prefer doing as much on the cluster as possible. They always have the latest version of R and do a great job of backing things up. I find it’s less hassle once you get a good system down, and I like living on the cloud enough that that’s worth it to me. It also makes me less aware that I’m running Windows.
For writing I use LaTeX, again the academic norm. TeX is confusing the first time around, so here’s the crash course: TeX is a typesetting system, and LaTeX is the markup language. What this means is that you have to install TeX onto your computer, and then install a LaTeX editor. You “code” documents in a LaTeX editor and then compile them into PDFs, where they look pretty and professional and mathy. I use MiKTeX to install TeX, LEd as my LaTeX editor, and SumatraPDF as my PDF viewer. My workflow is that I open up LEd, then open up the PDF (using SumatraPDF) of the document I’m creating. Then whenever I compile my document, it automatically refreshes to the new version in Sumatra. You can also set it up so that if you double click a word in Sumatra, it automatically highlights it in LEd (tutorial). If you use Adobe this system won’t work because Adobe won’t refresh the document (instead you’ll get an error when you try to compile the code). Don’t use Adobe, basically.
edit: Might be converting to Texmaker as my LaTeX editor. It has spell-check and a built-in PDF Viewer, and took all of two minutes to set up. But I’d still recommend downloading Sumatra.
For presentations I still use PowerPoint (I hear you hatin’). That’s more the norm in genomics than in other biostatistics fields. We like it because we can easily share slides and put in pictures/graphs, and keep out too many equations.
I use Mendeley to organize my downloaded PDF papers. I looked for a PDF organizing solution for years before having this recommended to me, and it’s perfect. It syncs online, is cross-platform (with an iPad app), and most importantly it auto-generates bibtex files which are needed to create bibliographies within LaTeX documents. This means that creating bibtex files (complete with automatically generated citation key) is a drag-and-drop process, rather than a pasting-google-scholar-bibtex-results-into-text-editor process. I see this as a huge improvement.
On the extremely unlikely event that you use a tablet PC, I really like PDF Annotator, which you can get for free if you’re a Hopkins student. I use this software for teaching, grading and annotating papers. If I’m taking a bunch of notes the native Windows Journal software is quite nice and keeps file sizes much smaller than PDF Annotator.
What’s your dream setup? Or rather, what I wish I’d done differently…
There are a few places where I know I’m not efficient in my computing that I’d like to change, so I’ll list them here. You youngins can learn from my mistakes!
You might notice that I do my R coding on the cloud, but do my writing locally. This means that I can’t easily use Sweave, which is a really cool tool that allows you to LaTeX and run R code simultaneously (and is much applauded because it allows for easy reproducibility). Instead I have to import my data and graphs into my papers manually (the xtable package in R is essential to my life). This isn’t ideal and I’d like to change it, but the analyses I run usually take days to run, so Sweave loses a lot of it’s appeal (or at least the way I envisioned using Sweave). To ensure reproducibility I always publish my code on github. In fact I’m going to start migrating all of my research information (including project descriptions, links to papers, etc.) onto github.
I wish earlier in my career I had established 1) version control (using git), and 2) uniform project architecture (see for example ProjectTemplate). These are things that are discussed in the Hopkins computing coursework, but it’s unclear how many academics really use them. Github makes version control extremely accessible, but the disadvantage is that everything must be public unless you pay.
I wish I had discovered OneNote while taking courses because I’m sure I would have used and enjoyed it.
I also think standing desks are neat. One day…
Why “Part 1″?
Because Part 2 will be all of my non-academic software and hardware!
edit: I’ve compiled all of the tools I mention in this blog into a bitly bundle for your clicking and sharing convenience.