Getting Your First Data Science Job
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Hugo Bowne-Anderson, the host of DataFramed, the DataCamp podcast, recently interviewed Chris Albon, Data Scientist at Devoted Health.
Here is the podcast link.
Introducing Chris Albon
Hugo: Hi there, Chris, and welcome to DataFramed.
Chris: Hey, how’s it going?
Hugo: It’s great man. How are you?
Chris: I’m good. This is like one of the first podcast I’ve done in a while. This is like, I stopped my podcast and now I’ve gone on a few ones but then I had a kid and I did a move and that kind of stuff. Now I’m back, back on the podcasting circuit. This is it.
Hugo: Fantastic. How long has it been?
Chris: It’s been like, so my kid is two months old. So I think I did my last podcast like two months before that. So it’s been four months without-
Hugo: Right on. Well, congratulations on the new member of your family.
Chris: Thank you. Thank you. She is doing great. All is well.
Hugo: Awesome. And congrats on the new job and the move. There’s been a lot of change, right?
Chris: There’s been a lot of change. There’s definitely been a lot of change. I feel like it’s one of those things where I have a very hard time saying no to DJ So when he does think that you should come do something, you have to think really hard about whether or not you want to say no to him, which I’ve said no to him in the past, I could not say no to him this time, thus I’m with DJ on a crazy adventure.
Hugo: And so this is DJ Patil?
Chris: This is DJ Patil, former chief data scientist of the US, former head of data at LinkedIn, I don’t know his whole resume. But a well known, well known person in the data science world.
Hugo: Very well known, very well respected, doing a lot of interesting work at Devoted and otherwise as well. I mean, his recent series of articles with Mike Loukides and Hillary Mason around kind of forming a conversation around data science ethics is really interesting as well.
Chris: Yeah, I think he’s doing a lot of really important thinking in a space that we’re still figuring out. I mean, this was something that I think a lot of us have been talking about, sort of off and on for a long time of like, hey, we don’t actually, we have lots of goals for what this field can do. What does it actually mean in practice? And the more we get to do that, the more that people sort of carve out space in their workday to say, hey, I’m going to think about this topic. I’m going to produce something that other people can read and agree with and disagree with and use as a point of discussion or build something off of. Is great and I hope he does it more and I hope Hillary does it more. And I think more people should spend a little bit more time thinking about that.
Chris: And it’s part of it that was so nice to go work for him because it is sort of nice to go work for someone who’s thinking about that kind of stuff and really is big on ethics and is big on trying to use technology for social good. Like for people in this podcast, like my background is not in business. Like I have started a startup, but my main background is nonprofits, humanitarian nonprofits. I work on data on humanitarian nonprofits, did that for most of my career. And so to work for a team that’s led by someone who spends a lot of time thinking about how to build companies with a soul was very refreshing.
About Devoted Health
Hugo: Awesome. So maybe you can start off by telling us just about Devoted in general and the work you’re doing.
Chris: Sure, Devoted, a health insurance company that was started by Todd and Ed Park… Todd Park was the former CTO of the United States and Ed Park was the CEO of another health insurance company or another healthcare company before this. And what Devoted tries to do is tries to frankly create a health insurance company that you would want your own family members to be at. It is a company, it is a startup. It operates exact likely the startup, is funded exactly like the startup. I think there’s areas that would set it apart from startup. But I think overall you would look at it and be like, yes, this is a startup.
Chris: However, Devoted is on a mission. We are trying to make health care that works, that works for people, particularly right now senior citizens. So we work on Medicare. Medicare is the health insurance, those government health insurance program for senior citizens, that is what Devoted is focused on, that’s sort of what we consider ourselves as a Medicare company.
Chris: But on a daily basis if you work inside Devoted, you can see that we are trying to build something that would be a company with a soul, a company that’s trying to do something real and trying to do something that matters in people’s lives and do right by people and is hyper compliant with the law and do more than just simply profit. That was what drew me to it. I think that’s what draws a lot of people to Devoted.
Hugo: That’s great. As we’ll discuss, you’re working a lot on hiring at the moment. And what we’re going to talk about today among other things is people getting their first data science jobs and advice for such people, how it works, what it looks like from your side of the conversation as well. I think particularly at this point where Junior data science paths aren’t necessarily fleshed out, we’re starting to see a bunch of specialization in the industry, these types of things will be incredibly important going forward.
Hugo: But before we dive into that, I just want to get a bit more background about you. And if you could tell me, like you said, your background isn’t necessarily in data science. I’m wondering how you got involved in data science in the first place.
Chris: My background’s in quantitative political science. So political science, studying of politics, I studied civil wars, but it’s political science completely from the perspective of statistics, of quantitative research, of experimentation rather than qualitative work, interviewing people, looking at historic documents, that kind of thing. And when I was getting my PhD, I kept on having drinks with these people in San Francisco where I was living at the time, and they were working at places like LinkedIn, so with DJ and a number of other people, and they were doing such cool applied stuff. Amazing projects, amazing uses of data in a very, very applied way.
Chris: And ir was about then that I kind of decided that if I really wanted to have a real impact and really wanted to do something that would matter that I needed to not be in academia, I needed to go out and actually apply the skills in a way that was beyond research. And don’t get me wrong, I love research, every single person who applies for Devoted, who has any kind of PhD, I just want to talk to about their PhD all day. I think that’s a bias of mine.
Chris: But, there’s so many ways that you could apply it and then at the time, there was some chances for me to go work for some joint Kenyan US nonprofits, Kenyan nonprofits, that kind of stuff. And I spent a number of years working being sort of the first data hire over there at Ushahidi which is a Kenyan nonprofit that works on election monitoring and disaster relief. Went to go work for BRCK which is a Kenyan startup that works on providing free Wi-Fi to lower income people in Kenya. Frontline SMS, did a lot of work around election monitoring.
Chris: But taking data, real data about, say, an election in a country with an authoritarian leadership and actually seeing like is this data true, are people filing in fake elections, are people filing and fake reports, like what is actually happening on the ground with real data with people who are in places where they can get arrested if things go wrong.
Chris: That was a big eye opener to be around issues of safety, around issues of ethics, around issues of like, well, like for example, if I go to a place that has election monitoring and we’re running election monitoring campaign with a local NGO, if the cops bust down the door for some reason because something happened, I would be arrested, then flown back to the US. They would be arrested and God knows what would happen. And that means that our threat models are very different and sort of understanding that your threat model isn’t their threat model I think has lots of applications for data science in the US and everywhere that I think it was, it was an important lesson for me.
Hugo: Absolutely. And I have a question around, it seems like on these this type of work, you’ll learn a lot on the job both in terms of domain expertise, but also in terms of data scientific techniques. I’m just wondering, before you got your first data science job, did you know how to program in Python or did you have an idea of what the landscape looked like from your time in research?
Chris: I realized that there was much more into the world of software engineering and I started to move in that direction because when you want to build stuff with other engineers, it’s nice to use the tools that they use and to think about things they use and to take code that they can insert into their projects. And so I really started to push more along the lines of developing more software engineering skills, more languages that are pretty common in software engineering, such as Python. When I left my PhD, I was not the wizard programmer that I’m not today, but I could be today.
What do you do as a data scientist?
Hugo: I love that. So I usually ask a question around what your colleagues think you do as opposed to what you actually do. That might be slightly different in what you do at Devoted, people might have more of an idea. I think historically, though, in terms of the startups you’ve worked in doing analytics and data science, is there a mismatch between what people think you do and what you actually do?
Chris: I think people, so we’ll take BRCK. BRCK is a Kenyan startup that does free Wi-Fi for low income people. The whole team is Kenyan, they’re based out of Kenya and I was the first data hire at BRCK. I think a lot of them thought that I was doing wizard mathematics. There was a whiteboard in my office and I’m writing equations and solving riddles of mathematics to get them some kind of thing. Where in fact what I was doing was a lot of software engineering stuff, a lot of building functions, running things, cron jobs, like using a lot of Python, or a lot of pandas, some scikit-learn, that kind of stuff.
Chris: I was mostly using established tools, but it would provide them with actually something that was useful in their work. But I think from their perspective, I was like wizardry, right? Because I would do something like imputation where you take missing values in your data and you impute the value, you like fake what the value would probably be. And it was like wizardry, right? Like, whoa, I can’t believe that happened, like you were predicting this stuff.
Chris: So I think it feels very normal to the data scientist and to me, but I think it felt very, it was very exciting for a team that didn’t have that to start to have that kind of option for stuff. I think it worked well.
Getting that First Data Science Job
Hugo: So as we’ve said, we’re here to talk about getting your first data science job, and you’re working on hiring data scientists at Devoted and thinking a lot about people getting their first data science job today. I want to kind of figure out what companies are generally looking for when hiring first time data scientists. But before that, I suppose, yeah, a preliminary question is are a lot of companies trying to hire first time data scientists? Because a lot of the job listings want 10 years experience of distributed computing and all of that jazz. I hesitate to use the term entry level, but for a first time job, are there a lot of jobs out there?
Chris: I think there are. Although from, so the perspective I think we could take this interview is that I’m sitting on the other side of the table where I’m doing a lot of the working with a team at Devoted to do a lot of hiring for our data science team. And I talk to a lot of my friends who are doing similar roles at other companies. So in the interview, they’re on the hiring side. So there definitely are a lot of junior roles, I wouldn’t put into someone’s head that like junior roles are dying or everyone needs experience and that kind of stuff.
Chris: There is something I definitely see that as a team you can absorb infinite number of senior hires. So say you have a team of six people, you can hire six new other senior people and be pretty okay that everything’s just going to work because they’re senior, like it’ll do fine. It is much more of a risk from the organization’s perspective if you have six people on your team and then you hire six junior, because there’s, like those junior people need a lot more support and if you’re unable to give that support, it is not good for the junior people or for you.
Chris: And definitely take this from the perspective of that junior person, like you do not want to be in a place where there is one senior data scientist and there’s six junior people who were all hired at the same time because you are not going to learn what you need to learn. Like you want to be the one junior person on a team of six senior people, like that would be the ideal situation and you can just sit down and take all the time of them as you want and you can work with them very closely and your learning would be massive.
Hugo: That would be incredible.
Chris: There’s definitely junior jobs out there. But there is something from an organization’s perspective where you might have a small startup and you only have three data scientists or two data scientists at this company because data science is a specialty job and you just can’t have a posting that says you want to hire five junior people. Like don’t apply to that job, I think that’d be very horrible. Instead you want somewhere, I think if I was looking, if I was junior, so one thing we should point out that when I was junior in data science it was a very different fields. So you should not take my how I got into the field as an example of how you should get into the field because when I started there wasn’t really the concept of data science and it was sort of a bunch of people who are interested in the same topic and just sort of like got together.
Hugo: Yeah and doing it.
Chris: Doing it. I’s different, obviously. But I think I can see some things around, if you are, you know, if you’re looking for that first job, I would definitely look for places in mid to larger companies and not in smaller scrappier startups.
Chris: I think one of the things that I’ve found a few times during the hiring process is that people who got their first job at say Facebook and they’re a junior junior data scientist at Facebook, there’s so much more support that they got in that time for learning, for experimentation, for using things at scale, like learning how to work at Facebook scale but doing so as a very junior person, they have such great experience that then they move on to say, say apply for a job at Devoted or something like that, and they have that experience and we can use that experience. Like that’s great, you’re awesome. Like cool, you have this great experience. This is hard experience to get because it’s hard for a boot camp to say replicate 50 million messages a day that you have to process or something like that. That’s a weird boot camp project. But that is what a lot of companies end up doing.
Chris: So going to somewhere like Facebook or going to, I’m just like picking on Facebook, and this isn’t an ad for Facebook, but any kind of larger company that can absorb you as a junior higher and allow you to learn and allow you to learn from really senior engineers and then moving on to something else is I think a great perspective.
Hugo: For sure. And those companies I think have the infrastructure all set up as well. So you’re not like battling with your data lakes and airflow and all of that, right?
Chris: Yeah. And there’s absolutely a trap that I think junior data scientists fall into, even mid level data scientists fall into, where they join a small scrappy startup, not as a founder, but just the startup is six people or something like that and they’re the first data hire, and there’s no data infrastructure in place at all. And there’s no safety nets, there’s no ability to get data from the database in a way that’s useful. And you have to sit down and build all that which in certain times can be okay, but I think in a really scrappy startup where they’re like struggling to make payroll or they’re like grinding away or something, it can be a very, very hard experience and probably not the best experience as opposed to something where you had, you go work at some really hard problems at a larger company, but you do so in a way that you can leave work at the end of the day knowing that everything’s fine.
Chris: Then you come back the next day and all that infrastructure is in place And they’ll have talks and lectures and you can go over to some person who’s done data engineering for 10 years and say, hey, why does it work like this? And they’ll be like, oh, it’s because of x, y, and z, and that kind of stuff.
Chris: And so I think there’s just, there’s so much learning that can happen at larger companies. I’ve never worked at a larger company so this is me talking from the outside. But I do think that as someone who’s worked at a few smaller companies or smaller organizations, that if those organizations do better if you’re senior just because you can sort of be left alone and figure everything out by yourself because you’ve done this before as opposed to actually I don’t know what I’m doing, I really need someone to tell me how to do it right.
Chris: And so you learn things the right way. I think there’s a lot of things where, particularly where a data scientist is doing more software engineering stuff, where it’s really nice to sit down for someone to be like, hey, explain to me object oriented programming because I don’t understand what in the world’s happening or explain to me how testing works, like unit tests, integration tests, or patterns, like software engineering patterns, like factory of factories and that kind of stuff. Tell me about that.
Chris: Like all those are totally simple concepts that anyone listening to this podcast can know. Just need to like have someone tell you about them or know that you should read about them. Like understand that, oh, actually this comes up a lot, I should read about this, that kind of stuff, which is great in companies with more support and is terrible in really, really small scrappy places.
What are companies looking for when hiring for some data scientists?
Hugo: So in general, what are companies looking for when hiring for some data scientists, do you think? And if you don’t want to answer that you can tell me like what you in particular are looking for when hiring first time data scientists.
Chris: Sure. I’ll try a stab at the one but then we’ll definitely hit on the other one. It depends on the company. I should say organization because I worked for a lot of nonprofits, but we’ll just use company as fill in gap for that. Places that are very small and scrappy tend to look for senior people that they can run the whole, I think you would call them like a full stack data scientist. I’m not entirely enamored with that phrase, but whatever.
Hugo: Like a generalist.
Chris: A generalist. You could just have them join the software engineering team or the product team or something like that, and you give them pseudo access on your server and they will just build everything they need to build. They’ll build the pipelineing, they’ll build all the, saving backups all the data, they’ll build the tables that they need to build, then they’ll run the analysis and they’ll run it in a way that runs every single, every single day, but only on certain times and they’ll install airflow by themselves and they’ll do all this kind of stuff that just they can do. That’s awesome for that person. That’s not really a junior role.
Chris: More, I think if you go to the other end of the spectrum and you go for large companies, they do a mix. Well, they will hire people from more senior positions to do things like, say someone very specialized, say someone who just got their PhD in AI or something like that, like they might have them join the team that’s working on an algorithm and that junior person can work on a part of the algorithm or work on some testing part and grind to that kind of stuff. Or I think what’s very common is more generalist data scientists who are junior join larger companies and they do probably what I would call like advanced analyses or advanced analytics. So you have a new product and you want to understand how that product is actually doing at scale because it’s deployed globally on all Android devices or something like that, and you want to see how that product is working and how people are using it.
Chris: There is some very, very complicated analyses that need to be done for that to be true. And that is a great sort of thing for a junior person to kind of tear off and start working on and it doesn’t affect the production code, but it does affect the business.
Hugo: And in that respect, I suppose we’re talking about the data analyst breed of data scientist, right?
Chris: Yeah. Again, I don’t even know why I keep on talking about Facebook, but I know at Facebook, a lot of people who have the title of data scientist do more analysis, which is totally reasonable and it’s super hard and give them massive credit. And it is, that kind of stuff is, it’s a real role, it’s a real job. And I think people who come from academia, people who I know come from sort of political sciences, social sciences, they go into those kind of roles and have a great time because they are working on hard analyses, like hard analytical problems. You’re not making dashboards to show that someone’s clicking on something, you’re making really complicated analyses to figure out the churn model that applies Bayesian to, you know, all that kind of stuff, which is cool.
Chris: I think in the middle, so you have the teeny companies and you have the large companies, you have companies like Devoted that sort of sit in the middle where there is a mix between trying to hire senior people who can craft the foundational data science infrastructure. So Devoted is building a health insurance companies tech stack from scratch. It was at one point an empty GitHub repo. Now there’s a lot of code in there. But yeah, at some point, we’re like, okay, we are a health insurance company, this is our code base, like let’s write the first line of code, that’s where we are. And in those kind of environments, you want a mix of people who are senior, who can build the architecture for how things work and how data is moved around and how say tasks are run every day or how there’s things around testing, and if companies do testing and all that kind of stuff.
Chris: And in addition, you want people who have more junior experience that you can come in, you can provide some lift them like teaching wise, like you could do more teaching with them. And you can have them support the role where you might tear off a piece of a project and say, hey, can you make this for me so I don’t need to make it. Because it was definitely, so one of the people who’s on our team currently, Kit Rodolfa, who is the former chief data scientist of the Hillary campaign. And he’s done some amazing complicated work, and he’s also done some amazing work that was well below his level of expertise because there was just no one else there to do it. And so, there’s I think typically mid level companies like ours tend to have both. Some are longer range.
What skills or qualities are you looking for when hiring?
Hugo: And so for you, for your size of company, for a junior data scientist, what skills would they, what would be good qualities, what would they need to do or demonstrate for you to be like, hey, this person would be a good fit here?
Chris: Yeah, I think for us at Devoted, we are a health insurance company, but if you looked at the internal workings of how the tech team works, we operate far more like a Silicon Valley tech company and concepts of move fast and break things or ask for forgiveness rather than permission are real things that we are doing. I think a lot of the people at Devoted are building things that is the largest and hardest thing that they’ve built ever in their career. And that’s what we want you to be that. We want you to be the person who’s building the… Doing the best work of career, doing the hardest thing that is pushing yourself right to the limit of what you think you can do.
Chris: We have a very strong culture that says, hey, we want you to run right up until the point that you break the thing, break the thing and then come and be able to say hey, I totally broke this, can someone fix this, dear God. And then we’ll go back and we’ll work with them to fix it. I say this as them but like I have in fact broken many things at Devoted. And to have a culture that says hey, you can break these things, it’s okay, don’t worry about it, you should definitely admit that it’s broken because we should know that, but run right up until you break it, then we’ll talk about why it was broken and we’ll then run again. Like don’t stop running, keep on running.
Chris: But that fast paced, fast paced doesn’t translate into a huge amount of work hours because Devoted, like I have a family, no one at Devoted I think is grinding the midnight oil left, right and center. There’s definitely, people are working nine to five-ish. But, just, you know, what you’re doing, I think there’s a lot of pushing the boundaries of what we can do skill wise right to the end and a lot of learning and trying to build the things that are the limit of our capability is pretty common.
Chris: I think that would be a skill regardless of your level of experience, whether you have one year of experience or no years of experience or 10 years of experience, we really do look for people who are very happy to take on a task that they’ve never done before but they’re pretty, you know, they kind of understand how they might go about it and go for it. Like go and run it. And we have safety guards in place that nothing bad is going to happen. That they can push themselves, and then once they have that, once they’ve built that thing, they go, okay, cool, I understand it now, it’s great, I’ve got it, like, let’s do something else. Like let’s go harder. Let’s build the next thing.
Chris: I think that’s really, it’s hard to instill because it is not experience per se. It’s more, you know, aptitude and attitude is probably a closer one where I don’t necessarily, we don’t necessarily say we’re only looking for people with X years. We tend to have a willingness to say, hey, you’re actually, like we were kind of looking for someone more senior, but you are more junior but you really are kicking butt. And we can see that when we talk to you that you have a lot of aptitude for your level of, like for your levels of experience and we’d love your attitude for running with it. Like, let’s go, like come on board.
Chris: We’ve definitely, we’ve had a few hires like that and I think it has worked out. It is something, it’s not uniquely Devoted, but it is something that is probably pretty uncommon for our industry.
Data Analytic Skills
Hugo: Yeah, and that speaks to certain skills that you’ve mentioned in there being like an active problem solver, communication, having a passion and a drive. How about in terms of hard skills, whether it be domain expertise or background in programming, whether it’s Python or R or knowledge of how the math behind machine learning models actually works to like data analytic skills?
Chris: Yeah, I’ve noticed something with hiring, where typically when we have someone in the hiring process, I will say to them up front that there are places, like if you just got your master’s degree in machine learning and you believe that how you are going to advance your career is to do machine learning, become an expert in machine learning, that’s what you’re going to do, that is a completely valid job, like career track. You should absolutely do that. Also, that isn’t a place like Devoted. Devoted tends to be more generalist builders who are trying to, you know, we’ll use machine learning here because it’s useful and then we’ll use fuzzy matching here because it’s useful and then we’ll use some simple things somewhere else because it gets the job done.
Chris: We tend to focus on the ability of people to build and solve the needs of our internal users more than say someone’s really nuanced view of machine learning algorithms. Just because we are a small team in a new company, if you wanted to just do machine learning, there are definitely companies that are big enough that have the infrastructure to have you just do that. Someone else will worry about some of the analytics or some of the data processing. You can just sit down all day and read machine learning journal articles and then implement them. That’s totally okay. I never worked at Google Brain but I imagine Google Brain has a strong feeling around that. Which is cool!
Advice to First Time Job Seekers
Hugo: Would that be your advice to first time job seekers, learn a bunch of general things in order to build stuff as opposed to go deep into machine learning models and all of that?
Chris: If I was sitting in front of someone who is taking or looking at a junior job or something like that, I would say that you probably want to take one of two tracks. One, if you have the educational experience on paper and the credentials to go for say deep AI that just go deep on AI. Like you have the master’s degree in it or you have a PhD in it or something like that, go for that track. Like that’s an amazing, it’s incredibly high paying, it’s super in demand. And there are places where you won’t have to learn anything else but ML. That can be a career and I think there’s probably going to be a full career in that and there’s absolutely no problem with that, because those are very, very hard.
Hugo: By AI do mean…
Chris: I would almost exclusively say neural networks at this point. Yeah. Basically, if you’re, if you’re working … there’s a lot of companies that are doing things that are self driving cars or things like that, where you just, you need a lot of people with that kind of knowledge to work on those problems because those are incredibly hard problems. But it is definitely along the lines of neural networks I.e deep learning to do those.
Hugo: And for this track, there is a certain mathematical overhead, which perhaps the other track, we haven’t gotten to that yet, might not have, but for example, knowing enough about multivariate calculus to understand backdrop and the vanishing gradient problem and all this stuff.
Chris: Oh, no, absolutely. I would expect that the interviews for those would be very math heavy and very theory heavy, such that it would almost feel like a dissertation defense in that area.
Hugo: That track as you said, you expect some sort of graduate work to have been done in math or something related.
Chris: Yeah, I would expect like 95% of people to have some kind of, like very obvious that they’re going down that track. I’m sure there’s some people who have like gone around that and those people are awesome and amazing but I think typically you would expect someone to be from that perspective.
Chris: The other one, if you don’t have that like very obvious sort of machine learning, deep learning focus which I don’t, so this is more my perspective, there’s another field that is way more sort of doing data science more generally at a company. So instead of just being like all I do is machine learning all day, there’s so many other problems that need to be solved using data science whether you do like bayesian analyses or you do some kind of random forest or even if you use some deep learning stuff but you’re not doing cutting edge deep learning stuff all day, there’s far more jobs.
Chris: Although like the the hype and the focus is on this sort of AI jobs. There is far more jobs, like 50 times as many jobs that people who are semi-generalists data scientists at companies, solving those companies needs to understand what’s happening in their data, whether that’s predicting when their drone should fly over and what are the crops or predicting when a customer might return or for us like predicting when someone might have a particular illness so we could do an intervention and prevent that illness from happening.
Chris: Those kind of analyses fit better for people who sort of have a more general experience because there isn’t a very type for that. There’s no Bayesian analysis PhD and if you have that you get to do Bayesian analyses and if you don’t, you don’t get to do that it’s much more general so that the people who apply for us in those kind of roles can come from any perspective and can be everything from you know a music major or can be someone with a PhD in data science and one of the new programs or can come from a boot camp, from a PhD in some other crazy field or could not have a PhD. It’s more about just if that kind of person fits what we want and what we need. But it is, I think that’s where most people do it.
Chris: I say this only because I think there’s a lot of people who think if I know enough machine learning and I go deep enough on machine learning, like that’s the guaranteed job. Which isn’t true because you could self teach a lot of machine learning and then you apply at Google Brain and they could be like, hey, you’re just not even close because you didn’t get a PhD in this and haven’t spent and haven’t spent like a huge amount of time doing it. But then you could go to another company, take that same person and go to another company say, hey, I know a lot of machine learning. They go awesome, this is like, like you’re not going to just be doing machine learning. But we have need for people who know machine learning because our products use natural language processing and that kind of stuff. Come on, join the team, you’ll be great.
Chris: And so, I think that second job has a lot more generalist things, where you end up working, you end up sort of having to be a lot more software engineering skills, a lot more skills working with sort of the software side of skills, of working with internal customers. So say someone on the sales team wants some kind of particular analysis, that analysis is very difficult to do. it. But then you have to go and present it to a way and then work with them to tweak the analysis and the way that they want. You need to be able to work with them such that they’re happy with what you’re delivering with them.
Chris: That kind of stuff I think is more common. I think that’s what most people do. It’s also, it’s different than just saying, hey, I’m just going to like know everything about machine learning and then I won’t need to care about any other topic.
Experience for a first job?
Hugo: And so for this second job, which I think is kind of the lion’s share of what we’re discussing today, there’s a chicken and egg problem in the sense that for a first time data scientist applying for their first job, how can they demonstrate that they have kind of this general array of skills? Would it be project based, like writing a blog themselves to demonstrate it. Because it seems like you need the experience in order to get the first job essentially.
Chris: Yeah, for us when someone applies, some of the best things that they can apply with are projects that they’ve done or something at like, say, a boot camp or maybe their dissertation research or something like that, where we can take a look and say, oh, cool, like you’ve done some interesting stuff, you’ve worked with some data, some interesting ways. And then for us, we have designed a take home that is very, without giving out any kind of secret to the take home, is very open. So there’s many, many ways. I mean, there’s an infinite amount of ways essentially that someone could solve it and how they solve it really says a lot about them.
Chris: But those kinds of skills of being able to demonstrate, hey, like, I can write software, like I can write unit tests, I can sit down and do bayesian and I’m totally comfortable with it. Here’s my blog posts around how this certain type of bayesian analysis works in the setting or there’s this really sweet data visualization that I did. That kind of stuff is like a nice demonstration. I don’t think it’s required, but I think if you come from say a field that isn’t known for making a bunch of quantitative people, and I come from political science, I think people don’t think about political science and think we’re all math nerds.
Chris: The more things that you can do around that where you can sort of demonstrate that you have that kind of experience is better because otherwise you don’t hit people’s biases around what a social scientist is. Because someone could say, oh, you do political science, you must really love Kant and Rousseau. And that’s all you talk about all day. And if that’s not true, like you need to demonstrate that. It’s probably not fair that you need to demonstrate that but that is it and you need to demonstrate that.
Chris: So things like boot camps, things like projects that you can run your own, blog posts are great because it’s really easy to access them, like you can just click, like someone has a link in their resume and then you click the link and you say, oh cool, this is like a really nice, I love the way that they’re thinking about this particular problem of feature importance in random forest or something like that. Is a really nice way and I think it is helpful.
Chris: We don’t require someone has side projects, but we are trying to do filtering of thousands of candidates. And so, it’s nice to find people who have, who can show you right up front that they have those kinds of skills and can do so in a way that is more than just a resume line because as someone who’s looked at a lot of resumes, everyone says that they do every skill under the sun.
Chris: Like everyone lists every skill. So everyone is, everyone says Python and SQL and R and machine learning and random forests. Everyone says every skill. So it’s not a very strong signal. But if you also have, say a blog posts about some nuance point about random forest or some nuance point of bayesian analyses, that is a real thing, that’s a costly signal to me that you actually do know that rather than just typing in the word in your resume. And it helps. I think it helps for me to get a handle on who that person is. It helps to guide interview process in the future that like other people do and I don’t do. I can sort of say hey, like here’s this article from this person’s blog, that probably will be a subject of an interview which is totally fine and probably helps the candidate a little bit because they can sort of stay in the area that they know.
Chris: Demonstrations of that are very helpful. Do I think everyone needs to do side projects left and right otherwise they’re a crappy data scientist? No, of course not. Like this isn’t, you don’t need to live and breathe data science to get a job in data science. But it is helpful to someone who’s hiring to sort of see the things that you’ve worked on and see the things that you’ve done, more than just adding that keyword to your resume.
Hugo: And I think the other thing that writing blogs demonstrates is the ability to take a project through to the end and actually do a write up. And on top of that, demonstrates communication skills, which of course incredibly important in this line of work.
Chris: We had one candidate who I really liked. They ended up taking another job somewhere else. But I really liked, and they sat down and they actually had a GitHub project that they were working on. I don’t think it was like a full Python package yet but they basically built an open source library, very close to one. I don’t think they had released it, but they had like a little open source library for some project and they had testing in there, and they had documentation and they had that object oriented Python in there.
Chris: And they were importing like relative importance of modules so they could do the tests and all that kind of stuff. And it was cool. Like you could look at that and be like, okay, I kind of, like this person has this level. This person has this this amount of knowledge because they’ve clearly written it in their own GitHub account and I can see it. It’s a nice signal for me. You could do that any way. There’s no particular way that I like absolutely want. But I think the resumes that send the least signal to me or other people who are hiring on Devoted are the ones that have just here’s the five skills that I have and then kind of don’t do any kind of explaining because there’s a lot of like cheap signals.
Chris: I could say that I do deep learning and then you’re like, oh, really, let’s talk about that, and it turns out that I did not know enough about deep learning.
Hugo: Right. And the other thing you mentioned in there, there are a couple of other points, which… GitHub repository I’m sure is incredibly helpful. And on top of that, you mentioned testing and a through line through this has been the importance of at least basic software engineering skills in terms of data science. I find that that’s something that’s missing with a lot of kind of early career data analysts and data scientists, whether it be on using debuggers or unit testing or versioning, these types of things, people need to work on a bit more.
Chris: I think that’s key. If there’s anything that goes through our data science team at Devoted when we are looking to hire someone or when we are working on our own projects is that the stuff that we work on are products, we are building full products for people. Whether they are only a few lines of code or, you know, say they’re something simpler, like moving some kind of data from a Google spreadsheet to redshift, just something really simple like that, or something more complicated, we are building full products, products for people and thus they need to have as much of that as we can have. We want testing in there. We want things to be linted We want things to follow some kind of object oriented notion or say functional Python with the doc strings in there that describe all the documents are doing, say, static typing, if we could do that, we’re seemingly on the verge of doing that, but not doing that yet.
Chris: But that kind of stuff, like we think of that as a software product. We make software products for people. Whether it’s actually like a reusable tool or analyses, that’s just what we do. I wish I thought about that more from the start and I think as data science matures, I think there’s definitely going to be this movement in the direction of data scientists sitting on software engineering teams and being a member of a software engineering team just with a certain specialty set of skills. And part of that means that you need to be able to work with them and write code that they can use and write code that they are fine to incorporate in their stuff. And that just means more software engineering skills.
Hugo: For sure. And so, something that we’ve thrown around a bit is this idea of graduate school and dissertations. And a question I get a lot is from people who are thinking of going to grad school and they’re actually wondering whether to go to grad, if they want to work in data science, whether to go to grad school or whether to get like a data analyst job that they can get at that point and then try to progress into data science demonstrating that they have developed a bunch of analytical tools. Do you have any thoughts on that from your side of the hiring table?
Chris: Sure. I would never recommend someone get a PhD, I mean, masters might be different, I would never recommend someone get a PhD because they wanted to get a better job. Like a PhD is a long and difficult process.
Hugo: And you got to really want to do a PhD in order to do a PhD.
Chris: My PhD, like a huge amount of people quit, maybe like half or more than half of people quit along the way and got nothing. A PhD can be six years long. If you quit after year three, you don’t get anything. I think you may be they might give you a masters as like a sort of a consolation prize or something like that. But it is a very, very tough activity and it’s very, very hard and most people don’t complete it and people have a lot of stress around it and is difficult.
Hugo: And the final year in particular, I just remember my PhD, the final year was absolutely brutal. I don’t want to say I hated the work I was doing but there was certainly at the end a bunch of negative sentiments that involved the stress and the suffering, and also the sleeping under my desk for the final six months actually.
Chris: Oh, yeah. I hated my dissertation by the end. The phrase that I kept on saying over my mind is that the only good dissertation is a done dissertation. Just grinding through. But it is worth it because I got to spend five years studying a topic that I really, really cared about. And I got to go as, I mean, imagine being able to go, I mean, you don’t need to imagine.
Chris: But like, if you’re doing a PhD, imagine being given five years to go as deep into a topic as you could possibly go. Like there is no level of deepness that is okay. Keep on going deeper over and over and over again. And that is super interesting and super cool. It totally changes your thinking and it totally changes you as a person, and it’s not a good way to get, like as a stepping stone to getting another job. It really just isn’t. You could have much better spent that time doing other things that are directly related to getting an interview than going off on some crazy quest to study some amphibian in the whatever in the south Polynesian islands or something like that. Like there’s definitely better ways of just getting a pay raise and then more advanced stuff.
Hugo: So is it a viable option to enter as a data analyst and try to progress to data science?
Chris: I think there’s probably two tracks. One, if you can find the right place, you can absolutely go from data analyst into data science. I think it more, it’s about trying to find the place where there isn’t a firm division between the two, right? So like, if I joined a larger company as a data analyst, as a junior data analyst, I would try to find chances where I could do more software engineering stuff. I would start to work on more complicated project. I would try to see like, okay, cool, sure, I can make this this quick analysis but can I do it better using Bayesian or something like that. But you sort of have to be self motivated to gain more of those skills.
Chris: Another option is like a master’s degree which could be one or two years and can expose you to a lot of that kind of stuff in a relatively quick amount of time. And then you get out and then you can, you know, I think you can do like a step up. Like we don’t care about degrees at Devoted, there’s no requirement for some kind of degree. But there is lots of skills that people would learn say in a data science master’s program or any kind of quantitative master’s program, masters in mathematics or somebody that that they could really take advantage of and it can be a big step up for people to do that. I think it’s probably relatively cheap, I don’t really know exactly.
Chris: You can either do the self learning path which is, you have to be scrappy, you have to kind of find opportunities where they are and move up through that and probably have to switch jobs a few times because the people hired you as a basic data analyst and all of a sudden you’re doing Bayesian left, right and center and then you want to be paid like you’re doing Bayesian work all the time and then you go find another job that focuses on Bayesian and then you move up from there.
Chris: And then the other one is you know getting that master’s degree and then going back and trying to find another job after that, saying that you have, you’ve got this experience of that you know more about the formal training around some of this kind of stuff, and it doesn’t help you with a lot of business cases, but it does help you that you can say you understand the problems around Bayesian analyses or the problems around some kind of machine learning model then you can apply in a real way.
Hugo: And if someone comes to you and knows these types of things but doesn’t necessarily know how it works in the health space, I presume a good strategy there is to say, hey, I know these techniques, I don’t know a lot about health, but I would love to learn about this stuff, to demonstrate like a passion for the domain expertise essentially.
Chris: This is the first health insurance company I’ve ever worked for. And we do a lot of learning around that area where, so I remember I think my second day of Devoted, they were like, hey, you should come to this meeting and I came to the meeting. And it’s just four data scientists sitting in a room with a doctor explaining how medical coding works. So like, you know, like what is the code for someone who breaks their hip and how does that relate to the other code and how people are billed on that thing. It’s just doctors sitting around telling us how all these things work. And it was so educational and it was so new that it’s a big part of it.
Hugo: Absolutely. I really liked the way that this conversation has gone in terms of providing a variety of different paths from the machine learning to the first time data scientists demonstrating what they’ve done through project or through quantitative research through to the self learning approach. And for anyone out there who wants to take the self learning approach, I’ve heard that datacamp.com is an incredible place to learn. It was actually Chris who told me that before we started recording.
Chris: That’s it. I was just like DataCamp, you’ve got to use DataCamp. No one’s paying for this. I’m ambivalent. This isn’t sponsored.
Hugo: And this is not sponsored by Facebook either.
Chris: Or Bayesian analyses.
Hugo: Exactly. Yeah, Gelman. Gelman isn’t paying us. Hey, that’s a good idea actually. So, I’m wondering if there are any, like any advice, things that you love people doing when they come into interviews or that you think of the worst things that people could do. Just any general tidbits of advice for first time interviewees from your side of the table?
Chris: That’s a good question. I think for us, I’m trying to say us because I don’t want to say it’s just me that’s biased. Like that’ll be biased if I was just like, we hire as a team so it’s not just me obviously that’s hiring, but I’m the person who’s interviewing. I will say what I, what I tend to think rather than like us as a team because I don’t know what they are biased towards.
Chris: For me, I’m a big fan of people who are excited about learning new things and excited about data because I genuinely enjoy what I do. I enjoy learning a new technique, I enjoy sitting around with a new book around some kind of analysis or an old book around some analysis that I don’t know very well. I really, really enjoy that. And I think when you’re coming in as a junior person and you can sort of say, hey, I don’t know how a lot of this stuff works but I am really, really interested in what it is and I know where I want to be in five years.
Chris: And that involves a huge amount of learning. That is exactly what we want. Like we are not hiring you because we’re junior, we’re hiring you because we think that you could be senior with some training, with some mentorship, with some projects and that kind of stuff. Like we do not want you to be junior forever. We want you to be senior and like we acknowledge that the more we train you the more you might leave. And that’s totally okay with us. Like come join Devoted, be a junior person for a while, enjoy yourself, like learn a huge amount of stuff, really excited what you do and then and then go find a better job somewhere else. Is a completely reasonable like thing that we don’t have expectations that you don’t do that.
Chris: But it is, I want someone to be excited about it. That excitement can come out in various ways. So people who have lots of side projects, obviously, like that’s a signal that people used to be like I love it in my spare time. But other people don’t have a lot of spare time. That can be one signal but it’s not the only signal we care about. If you just come in and are really, really excited about it and I can tell that you have spent a lot of time thinking about it and you have a lot of knowledge around it. A lot of knowledge around it for what I would expect someone at your level to have, like, that’s a nice sign that you really care about this one thing.
Chris: You might not have knowledge around everything. But you might have this one thing where it’s like, I think random forests were super cool. I spent a lot of time reading about them. Like I don’t know deep learning, I don’t know software engineering, I don’t know anything except for like, I’m just like super interested in this one thing. Like that’s kind of cool. I think that that excitement bleeds off on to me, possibly onto other people who are interviewing.
Chris: But definitely, definitely for me, because if you can’t get the job based on your experience, which when you are more experienced, you could just get the job because you’re experienced, you kind of need another strategy. One of the strategies is just saying that you have the right attitude and you’re excited to do it and you can learn a lot and you could be a good member of the team that people want to work with and that’s a great way of doing it if you don’t have the 10 years under your belt or something.
Hugo: Sure. You need something that differentiates you, a differentiating factor.
Chris: I understand that one of the hard parts for junior data scientists is that their resumes often look a lot alike because one person went to a boot camp, someone else went to a different boot camp, someone else was a math major, someone else like did this mathematics project, someone else wrote one research paper. There’s a lot of like signals that are pretty much the same. And so, the way that people can distinguish themselves I think at least in my mind is having some enjoyment for what you do because we want you to enjoy it and therefore learn more at it, you know, learn more about doing it. You don’t need to burn the midnight oil to do it, you could totally work nine to five with the rest of us. But we want you to be interested in it.
Chris: And if we, every hire, now that I’m sitting on this side of the table, every hire is a bet. And for the junior person, the best bet that we could make is that we are hiring you and you don’t know everything we wish you knew. But, that in two years, you could know like a big portion of what we wish you knew as a senior person, right? Like, I mean, you wouldn’t be senior two years but you get my point, right? Like you could be so much more capable and so much more of a resource for the team. So we come in and hire you and you’re junior and then all of a sudden you’re mid level and you’re kicking butt and then we try to retain you because we want you to stay there because you’re so awesome.
Chris: That part is like a big thing.
Hugo: Yeah. And I suppose one thing I’d like to gauge your opinion on is the difference between what you learn self learning or even in research in terms of using Bayesian inference on machine learning, whatever it may be, the difference between the types of tools and techniques you use there, which may be importing CSVs and then doing machine learning in production in a company. And you may have all these things that you haven’t actually been exposed to before by doing Kaggle competitions, for example.
Chris: I’ll give you another example, like one of the things they is very hard to learn is data engineering. So data scientists are different than data engineers of course. But there’s a reason that there isn’t a bunch of data engineering boot camps because to do data engineering, you basically need a production system to learn that skill. Like you need to have millions of data points flowing around and all this kind of stuff for you to actually do stuff with that.
Chris: And so if you don’t learn it on the job, like there’s relatively few areas that you could learn that on your own. And I think there’s some parallels to data science, where there’s just some things that are hard to do as a self directed project. I would recommend that people do side projects or do learning projects that take advantage of those. So like, instead of say, instead of importing a CSV with the data, like load it into a database and then pull it down, and then pull it down once an hour or something like that, just to like get more experience with that.
Chris: These tools are either free or cheap. So if you have like a small amount of data, so for example, like there’s Amazon Athena, which is like a way of kind of search, like it’s basically like a way to do queries on big data. You can totally upload just a little bit of data to that and then you pay by query. So you could pay like 30 cents to do a query. And so you do like 30 queries and you kind of understand what’s happening and then you kind of use it for a project and then you drop that project and you never pay for it again. But you have that experience under your belt. And then so when someone comes to you and says, hey, I saw you have this AWS experience. I did this kind of project, we used Athena. I had to work out the security roles and then how I put it onto a server and how I, etc, etc, etc, and all that kind of stuff you can do because with things like AWS, you just pay for usage, it’s just you, you know, just have a small amount of data.
Chris: But you do have that experience, that is really useful. That’s a good signal that like, hey, you are really interested in the kind of stuff. If I release you on to our system that costs 10s of thousands of dollars a month, like you could sit down and do some really interesting stuff and you’d be interested in that. I think that’s great.
Chris: I would recommend that if you really wanted to stand out, you should try to identify the skills that are harder for people to get, because it’s very easy to download a CSV and then do some group by statements in Pandas or something like that. And that’s why there’s 50,000 tutorials that do that. And that’s why my homepage has a bunch of tutorials on doing that. And that is useful stuff. But to stand out if you had some kind of experience in something that was harder to set up and something that was harder to do, I think that’s a nice way of sort of saying like no, no, no, I’m super serious about this. Like I tried to do this thing that was more difficult. And it’s more in line with like what a real businesses would do just at a much, much smaller scale, but I’m still using the same technologies. I mean, that’s awesome. That’s great.
Favorite Data Science Technique
Hugo: So Chris, I want to ask you what one of your favorite data science techniques or methodologies is. But before that, you’ve said random forest enough. My first question is random forests or support vector machines?
Chris: Always random forest. Who uses support vector machines? I don’t think like I’ve ever, I cannot think of a single case of someone who has used a support vector machine in production. Support vector machines are mathematically cool. They’re just super cool. Like when you explain how they work, it’s just super interesting. It’s a very clever technique but random forests are like if you need to get something done, a random forest like just seemingly works out of the box in so many ways and so many cases very, very easily that it is definitely one of the best starting points. I think a random forest is definitely one of the best starting points for a lot of modeling. And then I would kind of go from there and decide if you wanted to make the random forest more complex or add more feature engineering or change your model up or something like that. But I would definitely… I’m a big random forest fan.
Hugo: For sure. I actually, I was riffing off, it was a couple of weeks ago, someone tweeted out, does anybody use support vector machine for anything. You just replied no.
Chris: Yeah, I can’t think of like, I don’t know who uses that. I cannot think of a single time that someone would choose a support vector machine over anything. I don’t see that that one moment where it’s like the killer model to use. Someone’s going to listen to this and send me some kind of article, they cured some kind of cancer using some kind of support vector machine. I cannot think of an example off the top of my head.
Chris: I always considered a support vector machine more as sort of a teaching function for like, hey, here’s one of the techniques that people try. It’s mathematically genius. I think it explains a lot of the concepts of, like a decision line between groups and that kind of stuff really well and what happens when people are on the wrong side of the… Like when a data point is on the wrong side of the decision line, all that kind of stuff, I think that’s totally totally useful. Also, I’ve never seen someone use it in production.
Call to Action
Hugo: So, do you have a final call to action for our listeners out there?
Chris: Sure. I mean, one, you should completely apply for the jobs at Devoted. This is not actually an attempt for me to get, this is not an attempt for me to get people to apply for jobs at Devoted but hey, we are an awesome company, come work with DJ Patil and myself and a number of great people. It’ll be fun, it’ll be great and I hope you apply.
Hugo: Incredible. We’ll include a link in the show notes as well too.
Chris: Awesome. Look at this, Finally, God, expense, I mean, I’m not really paying any money for it but I feel like I should expense, I don’t know, like a lunch because of this or something.
Hugo: Absolutely. Let’s do that.
Chris: This would be my other more general call to action I think for someone who’s interested in data science. Think very carefully around the skills that you’re trying to develop and the skills that other people who are applying are trying to develop. Sit down and think about ways that you can apply those skills in a way that both demonstrates that you have those skills and/or that you’re learning those skills or something like that, but has some kind of real impact in society around us.
Chris: There’s so much free data out there, there’s so many really, really, really interesting projects that you can do by yourself, on your own with basically no money that you can sit down and say, hey like, I’ve used random forest and this really strong way but I’ve also used it in a way that I detected some crazy cool thing, detected parking tickets in New York or detected like some kind of, I don’t know like some kind of thing around like police shootings or some kind of thing around crime or something, anything like that. Like, I think there’s so many really interesting things that you can do with data science. This is an applied field.
Chris: And so, learn things in data science, that’s a big part of it, but then also apply it in your own life and the projects that interest you. That is the ultimate thing that I want to talk about in every single interview is just the cool things that they’ve done and use data for. And I want to see that spark of excitement in the things that they’ve done because then I want to work with them. Because if they’re so excited about it, I’m excited about it and it makes you want to go to work every day.
Hugo: This has been a great conversation, Chris. So thank you so much for coming on the show.
Chris: Thank you for having me.
Hugo: Such a pleasure.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.