Data Journalism & Interactive Visualization (Transcript)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here is a link to the podcast.
Introducing Amber Thomas
Hugo: Hi there, Amber, and welcome to DataFramed.
Amber: Hey, hi. Thanks for having me.
Hugo: Such a pleasure to have you on the show. I’m really excited to hear about your work at The Pudding today, data journalism, data storytelling, interactive visualization, all of these things, and I’d love to jump in and hear a bit about The Pudding and what you do there.
Amber: Yeah, definitely. The Pudding is this amorphous internet thing. We refer to ourselves a collection of data-driven visual essays. Other people have referred to us as an online magazine, or some people call us a blog, which I feel like we’re not quite a blog, and we’re not quite a newspaper. We’re this weird thing in the middle, but basically what we do is we go out into the world and ask all sorts of questions. Generally, questions about things that people are already talking about. We joke that a lot of our stories are things that people would argue about over beers. So, a friendly argument about something, and we try to add data to that discussion or that argument, and then tell the story in a fun, interactive, visual way on the web.
Hugo: That’s cool, and I love that you describe The Pudding as something amorphous and a weird thing in the middle, because essentially from my perspective, The Pudding is something that has arisen relatively recently as a result of many of the new technologies we have, right?
Amber: Yeah, absolutely. I mean, it wouldn’t be possible without a lot of the interactive tools that we use and the programming languages and all of the data that we have access to and the ability to analyze all of the data. None of the stories that we tell would be possible without all of that computing power and those languages. So, it is a new thing, and I think it started January of 2017. So, it’s about a year and a half old.
Hugo: And I do think this idea of interactive visualization, particularly with platforms and websites such as The Pudding, it’s very interesting because it changes the whole paradigm of storytelling and journalism in a sense, right? Instead of having thousands of words that people need to read and navigate in order to extract meaning, allowing people to interact with visuals and figures and that type of stuff allows people to move through that space of data themselves, and I think this is something we’ll speak more about later in this talk.
Amber: Yeah, absolutely. And we try to make our stories as visually driven as possible, so we actually will often write a story and then go back and cut out a bunch of the prose and make sure that the visuals are telling the story. And of course, there’s still prose to give you background information and some other insight, but we try to let the interactives and the visuals be the actual driving force of the story.
Hugo: That’s actually something I’ve admired in much of The Pudding’s work is that prose will come up interspersed through plots or when you hover over something to add more information, as opposed to directing the information flow.
Amber: Absolutely, yeah. It’s like bonus content rather than the content itself.
What do you do at The Pudding?
Hugo: Great. So, what do you do at The Pudding?
Amber: So, just as The Pudding is an amorphous thing, my role is like that as well. My title is journalist engineer, which is a weird collaboration or amalgamation of different roles that I do there, and this is the case for most of the people that work at The Pudding, but basically, I have the ability to do all of the things for a story. So, everything from coming up to the idea and collecting data, analyzing data, designing what the visuals should look like, writing the story itself, and programming the interactive visualization for the web. So, that’s why the title is a little weird. There’s the data analyst part and the journalist part and the front-end engineer part and the designer part and all of these pieces. So, that’s what I do there. All of the things.
Hugo: And actually, as you know, your colleague and founder of The Pudding, Matt Daniels, has a great medium piece called ‘The Journalist Engineer,’ which elucidates a lot of these things, and we’ll link to that in the show notes.
Amber: Yeah, definitely. He actually wrote that before we landed on that title for our roles, but we just kept coming back to the journalist engineer. We kicked around a bunch of other titles and just decided that one worked best.
Hugo: It makes sense, but as you said, there are so many moving parts in this type of position. You mentioned web developer, data analyst, designer, journalist, and I’m just wondering what was your trajectory to pick up all these moving parts so that you could work in this role? How did you get into it?
Amber: So, I took a very winding path to get to where I am now. By training, I am not any of those things. By training, I’m actually a marine biologist. I went to college for marine biology and chemistry. I went to grad school, have a Master’s Degree in Marine Sciences. I worked as a research scientist for several years. So, my background is in academic science. In terms of picking up all of those skills, I actually picked up some of them in my work as a scientist. I started learning to do data analysis and statistics in the R programming language when I was in grad school, and I used on and off throughout my work as a researcher. It really depended on who I was working with and what tools they were using. But data analysis and experimental design and all of those things I still do today, that all came up in my work as a research scientist or as an academic.
The communication of complex things also came up a lot. When you’re a scientist, it’s really important that you can communicate the results of your research, whether that’s in an academic setting or in a conference talk. When I was working as a research scientist, I was at an aquarium, and so I was studying the animals that lived there and I often had to communicate my research to children. I got, I think, pretty decent at explaining really complicated topics to anybody who was listening to me, which comes in really, really handy when that’s your job. The only difference now is that I’m writing it down instead of talking about it.
Hugo: Yeah and it looks as though D3 is part of the bread and butter of how you work at The Pudding. Is that safe to say?
Amber: Yes, definitely. That was the most steep learning curve when I started working with The Pudding ’cause I had only been using D3 for a month or two when I reached out to them about freelancing the first time, and they were really receptive to helping me out when I got stuck with D3, and I was really motivated to learn it, and now I use it all the time. And I still spend a lot of time on Stack Overflow.
Hugo: Yeah. So, I’m glad that you’ve said that. I’m glad that you’ve said it was a steep learning curve, because D3, I think, incredibly powerful, but it’s also well renowned for being pretty tough to start off with, right?
Amber: Oh, absolutely. Because I’m an R user, I was used to ggplot where you can make a bar graph or something in two lines of code, and in D3, that same bar graph would take 50 lines of code, or maybe not quite that many, but it was way more than two. And I think that was because I had this disconnect in my mind that D3 was a charting library in the same way that ggplot is a charting library. When D3 is really this … it can do a bunch of other stuff. It can make charts, but it doesn’t have to make charts. It gives you the ability to control every pixel on your screen, which is really cool, but that means that there’s a lot of information that has to go into telling it where every pixel belongs and how you want it to be presented.
Why is D3 code used most commonly in your line of work?
Hugo: So, when it takes so much work to write the D3 code, why is it then the state-of-the-art or the most powerful or why is the most standard in your line of work?
Amber: I think because it’s so powerful, especially when it comes to interactive things. You can do so much with D3, and again, because it’s not really … it doesn’t have default charts. There’s no bar chart function that you can just feed stuff into. If you want to add bars, you have to program it to add rectangles that’s bound to your data that you’ve fed into it and all of these things. Because of that, it’s incredibly customizable. We joke that all of our data vis things are bespoke. They’re all custom-written code. And sometimes that’s overkill. So, we do, on occasion for static graphics, make stuff in R, and then sometimes we’ll clean it up in Illustrator or other static design things, and then you’ll have just a JPG or a PNG, an image on your website, but sometimes it works well to make it in D3 and have everything just natively on the website instead of an image embedded there.
Hugo: Yep, exactly and it was developed actually at the New York Times by Mike Bostock? Amber: Mm-hmm (affirmative), yeah and I think it was only in 2011 that it came out, so it’s still pretty recent, even though it’s grown like crazy in popularity.
What projects have you worked on at The Pudding?
Hugo: Fantastic. So, I’m so excited about getting in and hearing about some of your projects and stories that you’ve worked on at The Pudding, so maybe you can start off by telling me about one or two of them.
Amber: Sure. So, I’ve worked on a quite a few stories now at The Pudding, and I did mention that my job entails doing all of the parts of a story, but for any given story, I’m not necessarily doing all of the pieces. A lot of our work is collaborative, where somebody does one piece of it and somebody else does another piece. This first example I worked on parts of it, but this was a huge collaborative effort, so the story is called ‘How Far is Too Far?’ and it basically focuses on how long it takes people to drive from where they are in the US to their nearest abortion clinic. This story was very sensitive, and it’s one of our stories that has a little bit more of a political tone to it, but it’s a story that we thought was really important to tell.
Hugo: Yeah. And particularly at a time when these things are changing and legislators are altering this type of landscape that may affect a lot of people on the ground, right?
Amber: Absolutely. That’s where the general idea of this came from. There’s these laws that specifically target abortion clinics, and in the state of Texas, these laws were actually causing a bunch of clinics to close. There was a group that brought this to the Circuit Court of Appeals in Texas, saying that when these clinics closed patients that need to access these clinics now have to drive over 150 miles to their nearest clinics, and one of the judges replied, “Do you know how long that takes in Texas at 75 miles an hour?” That really got us thinking about … well, I mean that’s not very good feedback to that problem, but that does bring up an interesting point that the thing that really affects people is how long it actually takes to drive from where they are to where these clinics are, and a lot of other research projects we had seen around this focused on the distance as the crow flies, when the driving distance is what really affects people.
So, we broke up the country into these hexagons and looked at how long it would take you to drive to the nearest clinic from the center of each of these hexagons and from the center of any city with a population above 50,000.
Hugo: I recall that in the first figure, there’s a slider which allows me or the user to have a look at the distribution of cities and places geographically where people can reach an abortion clinic in under a certain amount of time and change that and see how that changes with respect to the time that I’m wanting to know about.
Amber: Yep. Exactly. We wanted to let people really explore the data themselves. When we talk about something being a really long trip … I’m originally from Connecticut where anything over an hour feels so long, but I know people from other parts of the country where a five hour trip is what they consider long. So, we wanted to allow people to really experience this data with whatever frame of far distance they have. Yeah, that first chart really lets you restrict the data and change how long of a drive you want to see.
Hugo: And how does the story evolve after that?
Amber: The next thing we wanted to really drive home is that we can say that you can reach a clinic from wherever you are in, say, a two hour round trip drive, but the thing is that not all clinics treat all patients. Depending on how far along they are in the pregnancy, they may or may not treat a specific patient. So, we made another graphic that is, again, looking at the whole country with all of these little hexagons, and we showed how the round trip time to the nearest clinic changes the further along in a pregnancy you get. The further along you get, the fewer and fewer clinics will actually be able to work with you.
Hugo: I won’t give out too many spoilers, but there are quite dramatic differences there.
Amber: Yeah. Absolutely. That was something that was really surprising for us and that’s why we wanted to include this. When we started exploring the data, we didn’t expect there to be quite as drastic differences as there were. For this graphic there’s, again, a slider that allows you to change how far along in a pregnancy a patient is. We have it auto playing, so if you just scroll through, you’ll see that map animate. Hugo: What’s your impression of what this interactivity gives to the reader? Does it empower the reader? What’s your take on this?
Amber: I think so. I think it really allows the reader to make the story their own. Again, if they don’t really think an hour round-trip drive is far, they can decide how far they want to look at. Like, which distances they are most interested in looking at and how far into a pregnancy they are interested in getting this story from. They can change the lens of the story a little bit to be a little bit personalized. We do have a table in the story as well that it shows you the cities with the longest round-trip travel times, and it geolocates you. It’ll give you the cities with the longest travel times, but it’ll also show you your city for reference. Again, we’re really trying to make these stories personal to the reader so that it drives home the point a little bit more. Adding these interactions and some of these subtle elements really helps to make a story personal for whoever’s reading it.
Greetings from Mars
Hugo: That’s fantastic, and that actually provides a really nice segue into another piece of yours that I think is wonderful, which is the ‘Greetings from Mars,’ which actually locates me or whoever’s looking at it to compare martian environments to where I am, right?
Amber: Mm-hmm (affirmative). Absolutely. This one is a story that I actually did do all the pieces. So, the data comes from the Curiosity Rover, which is of course on Mars, and it collects a bunch of data, but some of the data it collects is about the weather and its current environment, and I was really excited to find this data and I didn’t know what to do with it. then, I started thinking about what if the mars rover didn’t understand what postcards were for and thought that postcards were just when you go on vacation and you say, “The weather is great, and I wish you were here.” And the Curiosity Rover was telling this story through postcards because it just wanted to tell you all about the weather using postcards, ’cause of course, that’s what postcards are for. This story, when you scroll through, it is literally postcards that flip over as you scroll, and Curiosity tells you all about the weather on Mars and it uses the weather where you currently are as a point of comparison.
When I’m looking at it … right now, I’m in Seattle, and it says, “It looks like today in Seattle, the weather is partly cloudy throughout the day,” and it gives you a range of temperatures, so yeah, it walks you through it. This one updates everyday, so the data is the most current weather on Mars and the most current weather in your area as well. This one is very personal for the reader.
Hugo: For the listener I’ll remind what I said to you earlier. I thought it was great, but it said, “It looks like Today you’re in Hoboken.” I’m very close to Hoboken, but it gives me the temperatures in Fahrenheit, and I said as an Australian I’d love them in Celsius, and your response was, “Of course if I was in Australia, I would receive them in Celsius, but because I’m in the US, I don’t.”
Amber: Right. It’s actually pretty funny. Shortly after we published the story, somebody gave us similar feedback that they wish the story was in Celsius. I asked where they were, and I believe they were in Sweden, so I was confused as to why it wasn’t working. It turned out that they had their location services disabled, so we had a default location for this story as New York. So, it was giving them information as if they were in New York, which was of course not what they were expecting, but it was functioning the way we wanted it to. We did our best to make it as personal for as many people as possible, but sometimes we can’t control for everything like that.
Hugo: You said something that I found very interesting that you did all the things on this piece, as opposed to working in a collaborative environment. I’m wondering if there’s anything very different when you’re doing it all yourself about the process. I can imagine it’s more frustrating.
Amber: Yeah. I think it’s interesting when I’m working with stuff on my own. And when I say on my own, I still do rely on the team a lot, and I did have an editor for this piece to bounce some ideas of off and things like that. I was getting feedback throughout the process, but I find that I think through things out loud. When I’m working with somebody, sometimes that process is smoother or faster because I have somebody else to bounce ideas off, and we work together on figuring out the story.
When I work on projects by myself, sometimes I get really caught up in one part of it. I will be content to analyze data for weeks and never move forward with the rest of the story, or I get really excited about one piece of it and then it makes sense in my head, but as soon as I tell it to someone else, they’re like, “Oh, that doesn’t really make sense,” and I have to take a step back and start over. I’ve gotten better about it, of at least being aware that I do that and looking for more feedback throughout the process when I work on projects by myself now. But yeah, it’s a little bit different ’cause you just put your head down and move forward on the story and hope that it ends up in a good spot.
Hugo: For sure. But you’re absolutely right. I have this all the time where I think something I have done is a great idea, and as soon as I start to explain it to someone else, even before they tell me it isn’t, I’m like, “Oh, wait a second. That doesn’t sound like it did in my head.”
Amber: Yeah. Exactly. That’s actually what happened with the Mars story. My original thought once I found out there was all this weather data, I was like, “Oh, I’m gonna write ‘Welcome to Mars,’ and it’s gonna be a welcome packet for people who just moved there,” and I was super excited and I mocked up this whole thing, and as soon as I told it to the team, they were like, “But, Amber, if people were moving to Mars, don’t you think they would know what the weather was like before they left for Mars?” I was like, “Oh, yep. You’re 100% right.” It was just something I hadn’t thought of because I was so excited about the data and the idea. It needed a reframing.
Hugo: For sure. That’s true, unless it’s one of those mystery vacation packages, but I wouldn’t expect that to be … that would be a horrible mystery vacation.
Amber: It would take so long to get there.
Hugo: Seriously. I need to be back at work now. There’s one other piece that we discussed of yours that I think is so wonderful. It’s the makeup shades piece, so I thought maybe you could tell our listeners a little bit about this.
Amber: Yeah. This actually was a story that came to us from a freelancer. So, we do work with a lot of freelancers at The Pudding, and this was one that I had stumbled upon on Twitter. This illustrator names Jason Li was trying to come up with a better color palette for skin tones for the characters that he was illustrating, and he had written about using the skin tones from emojis and given feedback from people on that. He decided to look into the shades of makeup that are being offered because makeup brands … they have a stake in their colors actually matching people’s skin tones. He wrote this little blog post about making a new color palette, and I read this and was like, “This is amazing. I want more of this.”
I reached out and asked if he would be interested in expanding the story for The Pudding, and he was really excited, and he brought on another person to help us with the story. For this one, basically the idea came down to these days foundations shades and makeup shades, they are all coming out saying that they have these diverse shade ranges. Rihanna came out with this brand called Fenty Beauty last year, and it had 40 shades when it launched. It was this huge thing, and now a bunch of other brands are rushing to get up to 40 shades, and we wanted to see if all 40 shades are created equal across all of these brands. Just because you have 40 shades are you actually making the colors diverse enough for the people that need them?
It’s this exploration through the color shades that are offered through different brands here in the US and Japan, Nigeria, and India. Jason did most of the data collection and story writing, and I did a lot of the design of the graphics. Jason and I went back and forth a lot on how the graphics should be designed and shown to the reader to help illustrate our point as well as we could do it, and then I did all of the front-end programming for it.
Hugo: Great. I won’t say too much about it, and we’ll include a link to it in the show notes, but I do think it’s quite wonderful how you’ve demonstrated the relative distributions of shades, where the highest density of shades are in certain products in certain countries, and there are some quite … ones that make sense post knowing it, but some quite surprising results in there with respect to countries. For example, in India, there result makes sense after knowing the fact, but it is also surprising at first sight, I think.
Amber: Absolutely. That’s the feedback we’ve gotten the most from readers is that India was surprising, and it was super shocking for us, too. That’s always interesting when you’re still surprised by what you’re finding.
Hugo: Absolutely. In particular, the way you show the frequencies and distributions is very, very nice visually. I won’t say anything more ’cause I want our listeners to go and check these out straightaway.
Amber: Awesome. Well, thank you.
Hugo: Of course. I really wanna thank you before we move on for … this has been a difficult task to describe interactive data visualizations in words on a podcast, and I think you’ve done it incredibly well. Moving on, I’d like to know a bit about process because in data science and science in general, a common paradigm of the process is to start with data and let it lead you to the hypotheses and then questions. But in what you do, you start with a question, right?
Amber: Mm-hmm (affirmative).
Hugo: How does this different approach change things for you as a practitioner?
Amber: I think it changes a lot of things, but primarily it changes where we find the data. Before I was working with The Pudding when I was just doing personal projects on my own, I would go and just find a data set and analyze it and just look for interesting things, but now basically the stories that we end up writing that end up being the most interesting are things that started with a question. Again, we’re trying to come up with these questions that people are already discussing. Questions that people are maybe having a friendly argument about, and we try to answer that using data, which usually, not always … a lot of the times means we need to collect the data ourselves or scrape it from some source. It’s usually not a dataset that’s already existing and on the internet and cleaned and ready to go.
Hugo: Is there a challenge with the fact that you’ve got a correct or interesting question on the data once you find it doesn’t really give you the story that you were hoping for?
Amber: Uh-huh. Yep. Yeah, that happens a lot. That actually happened with the very first story I proposed. I was really excited about this story and I thought it was gonna be great and I collected all this data that was … I had to scrape a website, and so it wasn’t a data set that already existed. Once I started analyzing it, it just wasn’t interesting, and sometimes we’re able to recover and we’ll end up answering a different question than the one we set out to answer, so this is where we fall back into line with what’s more typical. Once you have the data, you can start looking for other things because you might find surprising insights that wanna talk about either in addition to or instead.
But yeah, that first project I tried working on, every angle I could think of was not coming across the way I wanted it to. Sometimes that means we end up killing a story and we just don’t write it, which happens sometimes. I think of it a little bit as publishing negative results in science. It just doesn’t happen very often because basically all you’re saying is, “What we thought was gonna happen didn’t,” and that’s the end. There’s of course huge debate about whether or not that should be the case and whether or not we should publish uninteresting things, but at least for The Pudding, we try not to publish things that we don’t think will be interesting.
Hugo: Yeah, absolutely. Of course, in basic science research, there’s an argument that there should be more negative results being published.
Amber: Of course. Exactly. That comes down to having people not repeat projects and experiments that other people already know aren’t gonna work and things like that. In science, it does make sense to do that sort of thing. We’ve talked about making blog posts about failed stories for that purpose so people are aware that, “Hey, we tried this, and it didn’t really work, but if somebody else wants to try it, go for it.” We haven’t gotten around to that, but maybe if there’s public interest we can check that out as well.
Hugo: Something that you mentioned at the start of our chat is that at The Pudding, you think about the types of questions and conversations you write about as the types of things you wanna talk about over beer, right?
Amber: Mm-hmm (affirmative).
Hugo: Yeah, I was gonna say I think starting with a question like that probably gives you a sense of direction and some sort of interest there.
Amber: Absolutely. It really gives you a sense of focus. Like I said, sometimes I have the bad habit of when I find a new data set, I’m like, “I wanna explore all the things,” and I could easily spend weeks and weeks just having fun and playing with the data set, but when you go in with some sort of vague purpose in mind, it helps to give you a sense of, “Okay, here’s where I should at least start,” because those big data sets can also be really intimidating if you don’t know where to start. I’ve done that, too, where I’ve downloaded huge data sets and been excited, and then been like, “I don’t know what to do with all of this,” and then it just sits on my computer and I don’t do anything with it. It definitely gives you a direction to go, and then you can branch off from there.
Amber: That’s such a great question, and I think it can go in any number of ways. People are experimenting with all sorts of stuff in this space right now. They’re trying out different technologies: so, virtual reality and augmented reality are being brought into the data vis space, and we’re experimenting with the ways we present data. I mentioned a couple minutes ago that we occasionally kill stories, and that sounds like a bad thing, but we actually are set up so that we have the flexibility to do that. That allows us to experiment with things, which when you experiment, sometimes things are gonna fail, and I think we’re not the only people that are doing that. Lots of people are experimenting and trying to find new and exciting ways to present information.
On the internet, people’s attention is the most valuable commodity, and everybody’s always looking for your attention if you’re a consumer of anything on the internet. It’s hard to break through the noise. I think that’s where these interactive graphics and some of this new experimentation and things like that are really being leveraged. I think it’s an exciting future. I have high hopes for it. I know that there’s some discussion within the community of where everything is headed, but from where I’m sitting, I’m looking forward to it. I think we’re in a good spot.
Hugo: As am I, and I do think this idea of turning what essentially have been consumers as passive readers, essentially, into active storytellers to themselves is incredibly interesting. Getting people doing something and interacting as soon as possible. That’s something we strive for at DataCamp as well, really, is getting people coding as soon as possible and giving them automated but interactive feedback. We do have a philosophy here of getting people doing stuff with their hands, either on a keyboard or on a mouse or whatever it may be as soon as possible.
Amber: Absolutely. It really brings them into the story, and it makes it their own. Like you said, that’s how DataCamp works as well. You feel like you’re actually gaining the experience and doing the thing.
Does everyone have their own specialization at The Pudding?
Hugo: I’ve got the strong sense that you and your colleagues are all generalists in terms of you do a lot of different things, but does everyone have their own specialization. How does this work?
Amber: A little bit. We are a pretty small team. We very recently got up to six people, so when you have a small team and you’re trying to put out these stories and things like that, it really helps us to have people that can be generalists and do all the parts. But yeah, we do have things that we are better at. When I first started at The Pudding, I was definitely data analysis was my forte and I was still learning D3 and I was making a bunch of mistakes and stuff failed all the time. But I was able to lean into my ability to do data analysis, and that’s the case for some of my other colleagues. Some of them are great with design, and we’re constantly tapping them for design input. Some are great with the programming side, so we’re constantly tapping them for help with fixing bugs and things like that. We do all have our strengths, and so we have people who are the go-to resource if you have an issue with X, Y, or Z, but we all have the ability to do all of the pieces if we need to.
Hugo: I’m sure we have a bunch of listeners out there who would love to get involved in this type of work. What skills are needed to be successful at data journalism and visual storytelling and so on?
Hugo: And they may change in the future, right, as well?
Amber: Totally. That’s actually what I was gonna say is one of the best things to have in this field is the ability to adapt and learn new things as you go, whether that’s a new language or a new statistical test because we’re not focused on the same topic every time, we’re constantly having to learn new methods of analysis. We’re just trying to stay right on the edge of things as much as possible. The ability to just learn new things and be ready to do that stuff as you go is really important, and I think on a similar note, having a beginner’s mindset is also really helpful.
Basically what I mean by that is just our stories are aimed at the general public, and so when you’re writing a story you become a mini expert on a topic, and it’s really easy to slip into giving a bunch of jargon or making things not quite accessible. If you try to approach things as a beginner would, it really helps you to communicate things in an easy way and in a way that other people who are actually beginners can appreciate and can understand. I think being able to think about things like a beginner would is really helpful as well. If you are a beginner, you’re in good company because you have an exact beginner’s mindset.
Hugo: I think that’s really cool, and it seems to me a lot of the topics you’ve worked on you may have been a beginner at the start in some sense. I don’t know how much you knew about the atmosphere of Mars beforehand, but I’m sure you learnt along the way, which allowed you, in that case, to have that beginner’s mindset from the start, essentially.
Amber: Exactly. Yeah. I didn’t really know anything about the atmosphere of Mars. I knew very little. I knew it was cold, but I didn’t know how cold. I think you have to be willing to be a beginner all the time. That’s hard sometimes, but the more you practice it, the easier it gets to just … you’re always gonna kind of feel like you don’t know anything, but that’s okay because you can learn it and then you’ll know the things. There’s two other things I think are really important: one is the ability to give and receive feedback. I mentioned that my team works a lot, whether we’re working collaboratively or I’m working on a story on my own, I still get a lot of feedback from my team, and we have designated sessions where we just give feedback to one another. That also happens a lot online. People post things that they’re working on on Twitter, and I’ve had pretty good experiences with people giving feedback on the internet, but of course, people on the internet aren’t really very nice about things.
Finding a way to give feedback to people that’s constructive and can help them become better, and then also being able to receive that feedback without feeling like they are criticizing you as a person is really important. That’s something that I am still working on all the time. I get feedback and I’m like, “Oh, no. I worked so on that,” and them I’m like, “Oh, wait, no. They’re not criticizing me, they’re trying to make me better.” That mind flip and that ability to work on that is really important.
Hugo: For sure.
Amber: The last thing is tied into all of this is just communication is so important, and I think that’s the case for so many fields, especially in the data world ’cause you’re often not working in a silo. You’re working in collaboration with other teams and sometimes with clients and sometimes with the general public, so being able to communicate what you have done and what you’re planning on doing is really, really important.
Hugo: I agree completely, and I think communication is something that we all know is incredibly important, but you don’t necessarily see it in the job listings. You see like, “Know how to use Hadoop,” or something like that, and communication isn’t something that’s necessarily stressed in terms of working data scientists having to communicate with managers or non-technical stakeholders and this type of stuff. It is really key to keep on reiterating this again and again.
Amber: I think it really should be on more job listings and stuff, but I think it’s also fair to put that on a resume if you’re applying for jobs because I think everybody realizes that it is important, but I don’t know why it doesn’t end up on job listings.
What is your favorite data science technique?
Hugo: We haven’t spoken too much about specific techniques and methodologies. We have a bit, but I’m just wondering what one of your favorite data sciencey techniques or methodologies is.
Amber: I don’t know that I have a favorite, and part of that just comes down to I’m really bad at picking favorites of anything. I’m very indecisive when it comes to stuff like that, but I think it’s also because we’re constantly using different methods of analysis. I don’t know that I have a favorite technique, but one thing that I do a lot that I absolutely love doing, so I’m gonna take a moment to preach it here because I think it makes sense; within the R environment, I used R Markdown a lot, which is very similar to Jupyter Notebooks or things if you’re a Python user, and basically, the way that I used them is I include my analysis, but in between chunks of code I also include notes about why I did something or where the information came from, or sometimes it’s like, “This was the Stack Overflow post that helped me figure this out.” The reason I love doing that is ’cause it helped me a lot when I was learning how to do things so that I could go back and figure out what I was thinking and what my thought process was.
But also, sometimes The Pudding has a client branch called Polygraph, and so we work with clients a lot. Those markdown documents are really great to send to clients of like, “Here’s an analysis of your data, and here’s how we worked through it,” and they can read your inner thoughts and see what you’ve found before you make it all pretty and on the internet. I’ve found that that’s been really fantastic, and so I always make those documents and I have never once regretted it. I think that’s my favorite data sciencey thing that I do.
Hugo: Fantastic. I think, once again, that speaks to a certain form of comprehensive communication as well, developing this document so that other people have access to it and you have access to it downstream as well.
Amber: Honestly, I put a lot of those … when I was still learning, I had made a portfolio online and I had put a lot of those things into this portfolio, and everybody was like, “Oh, that’s so great that you make this for other people.” And I was like, “Yeah, totally.” And I do make it for other people, but I also, like you said, totally make it for myself because sometimes I forget why I did something, and I’ve gone back to stuff a year out and been like, “Why did I do that?” And I’ve been really glad that I’ve wrote it down.
Hugo: Yeah. I say this probably too often for my listeners on the podcast, but when I talk about commenting my code, it’s for other people but the most important other person is me in three weeks.
Amber: Yep. And you always think you’re gonna remember what you were doing, but I don’t know. In my experience, I never do.
Call to Action
Hugo: Future me hates me now. I can tell you that. Great. So Amber, I’m wondering, as a final question, do you have a call to action for our listeners out there?
Amber: Yeah. I think for anybody’s who’s looking to to start in this field or if you’re trying to figure out if data journalism is something you might enjoy, my biggest advice is just to get started. Start with a small-ish question that you’re really excited about and start exploring it and don’t be discouraged if you have to go and collect some data on your own. Again, keep it to a reasonable size so that you don’t get completely overwhelmed, but I’ve found starting with something you’re excited about and something you’re personally invested in helps you really follow a project all the way through. You can learn so many fun things along the way. Feel free to go on the internet and look up how to do stuff and ask people there for advice and feedback because I think there’s a lot of really strong data science and data vis community member out there who are really excited to help beginners. If you have a story idea that you think would be great for The Pudding, we are always looking for freelancers. If you’ve got something that’s a proof of concept with analysis and a little bit of a general story, send it our way. We’d love to hear it.
Hugo: I’ll just reiterate that all of Amber’s work we’re gonna put in the show notes, and check out everything on The Pudding. It’s all fantastic. Start with a small question and send through some ideas to them if you’re interested. Amber, it’s been an absolute pleasure having you on the show.
Amber: Yeah. Thanks so much. This was so great. I appreciate being here.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.