The thing that struck me about writing a book is just how much it is like starting a business: you can do your research and assemble a meticulous plan, but you can’t quite know what the result will be. The only constant is hard work, done behind the scenes and without fanfare.
I don’t want to cast a pall on the publishing process; it’s still a glorious (if usually not glamorous) endeavor. That said, like with most pursuits of this intensity, it’s essential to know your “why” before starting a book: otherwise, it’s too easy to lose sight of the bigger mission when buried under the strain and tedium of everyday writing.
The message of Advancing into Analytics has been with me for some time: peruse the blog, and you’ll see these themes tracing back for years. I hadn’t anticipated they would take the form of a book, let alone with the vaunted O’Reilly Media. It turns out that a book is the perfect media to congeal and summarize these themes.
Here are the five motivations that carried me through the writing process:
1. To help “past me”
When asked what motivated me to get into analytics education, I can’t go far without reliving those first days as an analyst. I was hopeless and confused. I was floored by the breadth and depth of data given to me with little to no training. I knew I had received a good college education, but something didn’t compute: why was I so unaware and unprepared to do what I needed to with data? These discrepancies to me screamed out for some market solution.
Advancing into Analytics would have been a game-changing resource for past me… but I know I’m not alone. I’ve spoken with and coached plenty of analysts who’ve confronted the same unease and dissonance. This book is to help those analysts wrest control back over their data: to show them how to clean, explore, and test relationships in it.
2. To provide a clear learning path
I remember those early days, when I shared the challenges I was having with my data. “You should learn R,” well-meaning bystanders said.
Oookay… but how? Back then especially, so many of the resources seemed to be written by specialists, for specialists. It was hard to find something that met me at my level, addressed head-on my pain points, and, yes, advanced me into analytics.
Like many, I came to R (and later Python) as a spreadsheet user. As I hammer home in the book, this is a great place to be if you want to learn how to code. With this book, I want to provide as straight of a learning path as can be for spreadsheet (particularly Excel) users to begin analyzing data with R and Python.
Unfortunately, many “serious” programmers use that background as a soapbox for griping about the inferiority of spreadsheets. Griping about spreadsheets to spreadsheet users is also not a great way to pump up your learners, nor is it a wise way to contextualize analytics tools.
3. To properly situate analytics tools
The last thing I want readers to take away from this book is that Excel is somehow inherently inferior to Python or R. This is really a limiting way to think about analytics tools. I see spreadsheets like Excel and programming languages like R and Python as complements, not substitutes — that is, they are different slices of the analytics stack, meant to do different things.
In the book, I offer examples of where these different tools triumph… and where they could “phone a friend” for an alternative.
4. To properly situate analytics techniques
Of course, as I also mention in the book, not all data analysts include Python or R in their stack, and do just fine. So why do I teach them?
I find that the framework of statistical thinking is indispensible for good data analysis. There is a difference, for example, in how we work with discrete and continuous variables. Analysts should be able to identify these variable types and know what next to do with them. This involves both exploring and confirming insights, and it takes some iteration. I find that a programming language like R or Python is the single-best solution for that purpose.
BI tools are great, but often don’t provide the granularity or range of motion to achieve these aims. These tools often tend to feature low- and no-code approaches to working with data, which I find limiting for reasons mentioned at this blog post.
The book jumps off just at what might be called “data science,” in applying a train-test split and validation of a linear regression. But as I also mention, there is more that unites than divides statistics, data analytics, and data science.
5. To curate my best material
It’s wild to think I’ve been at this for years now. Between blog posts, YouTube videos, in-person workshops and more, I’ve accumulated a good deal of data education material. Still, it’s hard when someone asks me, “How do I learn data analysis?” There are simply too many circumstances, and providing suggested resources a la carte is not efficient.
This book is meant to answer the question “How do I learn data analysis?” for the typical person who asks. I’ve taken my best content and molded it into one book where in the span of 250 pages you’ll learn the fundamentals of statistics in Excel, and how to code for data analysis in R, and how to do the same in R. It’s quite a bang for your mental buck.
There’s a lot outside your control when writing a book, so it takes a good deal of intrinsic motivation to stay on course. I’ve been plugged into the analytics world for several years. The experiences, lessons, and gaps have congealed into this book. I hope it serves the community well.