Site icon R-bloggers

How to recruit data analysts for the public sector by @ellis2013nz

[This article was first published on free range statistics - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A management challenge

Between 2011 and 2017 I selected somewhere between 20 and 30 staff and contractors, for New Zealand public sector roles with titles like Analyst, Senior Analyst and Principal Analyst. Alternative names for these roles could have been (and sometimes were) “researcher”, “data analyst”, “statistician”, “R developer”, “data scientist” and in one case “Shiny developer” – with or without “Senior” or “Principal” in front of the name. I must have been part of over 50 job interviews for such roles, and read some hundreds of technical exercises and perhaps more than 1,000 job applications. Perhaps it was only 500 applicants, counting people who applied more than once; but it was certainly lots. This was when my own positions had titles such as Manager Tourism Research and Evaluation, Manager Sector Trends, and General Manager – Evidence and Insights. This blog post is aimed at people with similar job titles doing the recruiting, but might also be of interest for people applying for data science roles.

There’s quite a literature out there on the web about recruiting data scientists, but most of it seems to be based in quite a different world from mine. I don’t have the time or the inclination to do a comprehensive study of all the advice, but it seems to fit into several broad categories:

  1. Alleged technical Data Scientist interview questions such as “explain the use of combinatorics in data science”, “what is an eigenvalue and eigenvector”, “how do you choose the k in k-means cluster analysis” and “write a function that takes in two sorted lists and outputs a sorted list that is their union” (these are all suggested questions from a real site I won’t link to). I mean, WTF? Would anyone really want to choose your team based on ability to answer that sort of question in an interview setting?
  2. A reaction to the above, pages that emphasise non-technical qualities like enthusiasm for learning, ability to communicate, team orientation, client orientation etc. These are generally much more sensible, but not particularly specific to data science and hence very incomplete. Maybe advice like this is useful for a data scientist thrust from their technical corner into a management role they’re not ready for, but much more likely if you’ve reached a management level in the public sector, you would have been thinking about these things for your own and others’ teams since you were an egg. More importantly, your public sector organisation will have all sorts of processes designed (successfully or not) to recruit people with these qualities, so it should be the least of your problems when designing the recruitment.
  3. Pages for employers which are more about how to attract data scientists, with ideas ranging from “higher salaries” and “give them time and opportunities to invent cool stuff” (both useful) through to “put your lead candidate up in the coolest expensive hotel they’ve ever been, with gorgeous models walking through the lobby” (yuck for all sorts of reasons). Ideas like the first two might be useful for strategising, but such resources are basically silent on the tactical specifics of picking the wheat from the chaff.
  4. A surprising number of pages that have basically no content other than the vacuous (“recruiters, work closely with hiring managers to build out accurate job descriptions”) and seem to exist as click-bait to sell advertising.

Most of the advice out there does not seem based in the world that I lived in as a recruiting manager within a public sector organisation. My main challenge was typically to get an effective assessment of technical skills – that is, the ability to apply technical skills to our sort of problems in a way that will work in our particular organisation – in the recruitment process. I suspect this is also the challenge for other managers; even more so those that have less personal hands-on experience with the latest tools.

The important checking up on “team orientation” and “ability to concentrate” are usually hard-coded into the recruitment process and a core part of the standard interview or tests provided by my organisation. These are not something I had to particularly think of differently for a data scientist than when selecting (for example) policy advisors or business analysts. On the other hand, a question about explaining eigenvalues would be far too technical for an interview, and also has the material disadvantage of being the wrong “technique” for any analytical job I’ve recruited for (because the real technique is not knowing how to define an eigenvalue but applying a related analytical tool to data).

There are two big challenges in any recruitment problem, of which the “recruiting a data scientist” is only a special case:

“The colonel doesn’t need to be a crack shot with a rifle, but they need to know what a rifle is and take seriously the issue of when it is used and for what.”

I think this quote is from me, but let me know if I’ve nicked it from someone else.

A complication – recruitment under constraints

In a larger organisation with a more formal approach to human resources – and in any public sector organisation, where aspects of recruitment may be governed by specific law eg requiring decisions to be based on merit rather than politics – there are additional complications. Individual recruiting managers have only limited control over key elements of recruiting staff such as:

“Tell us about a time you solved a difficult technical problem and delivered improved outcomes, by defining on a whiteboard an eigenvalue and an eigenvector.”

[Don’t use this interview question for real (unless you cut out everything after “outcomes”)]

So, some thoughts…

By 2017 I wasn’t doing this recruitment the same way I was in 2011. In 2011, people told me “you can’t make public servants code” (turned out to be wrong) and “you can’t require public servants to know R” (also wrong, although I never made R a requirement for a role, so long as people demonstrated they could learn it fast and had equivalent skills in another language like Stata, SAS or Python to prove it). So, here’s where I ended up in my recruitment processes.

Job description

Advert

In 2015 thanks to a supportive HR team I had a very successful recruitment round that used the theme “Data Ninjas wanted – no seriously, we need 3 of them”. We led with :

The “ninja” language really cut through but I wouldn’t use it again; since then I’ve seen in other contexts enough complaints about that word to convince me it is associated with masculine culture (and others that it is a cliche). Seek alternatives. You don’t want to be this guy, although he did lots more wrong than just describe the developer required as a “super ninja”.

I would use the other language in that advert again though. For example, the recruitment specialist said to me “what might be different about the sort of people you want for this job”, which is where the “get paid to play with data” idea came from. It’s something I’ve heard many people say, and frankly I think it myself; I think it really contributed to the great response I got for that advert.

Practical exercise

This is the important bit. Interviews alone just don’t cut it for these sorts of jobs. I used a written exercise, as similar as possible to a realistic work task, as my main short-listing tool. Medium-listing is done from the CV and covering letter (where ideally you have asked them to address specific selection criteria). Then from this medium list – maybe 10 people per vacancy – you can really identify the people with the right skills by giving them a task as similar as possible to what they would need to do in the job. These people are all given a few days to complete a written exercise, in their own time and with their own tools. The exercise is used as the short-listing instrument, and is also referred to extensively in the interview and final decision between those on the short list.

Obviously, exactly what sort of technical exercise will depend on the particular role you are filling, but I can’t emphasise its importance enough. It’s much more important than the interview in terms of picking the best person for the job.

It’s important that the technical exercise:

Taking all that into account, the typical selection exercise I use for a data analyst role will be something like this:

An influential (but hypothetical, for this process) industry stakeholder has come to the Minister for Tourism arguing that visitors to New Zealand who go to Queenstown in New Zealand’s South Island do more activities overall, spend more money per day, and stay longer in New Zealand than visitors who do not visit Queenstown. Further, they argue that while visitors in general are more likely to visit Queenstown now than 10 years ago, this isn’t the case for visitors from Europe; and that something should be done about it. The Minister has asked the tourism policy team to advise if all this is true and they have asked you for analytical assistance.

The data in this downloadable zip file is a copy of genuine survey data from a Ministry relational database. It comes from the International Visitor Survey, which has sampled tourists departing from New Zealand continuously since 1997 and includes questions on spend, activities (eg cycling, ski-ing, visiting museums) and locations visited. It is weighted each quarter year to the total population of departing tourists aged 15 or more. Your task is to provide a 2-3 page document for the policy team explaining whether the factual claims made to the Minister are correct, what caveats should be held around the conclusions, and anything else they need to know, in language accessible to people with a good understanding of tourism but no specialist skills or interest in data and statistics. You should provide two documents: a written document (PDF or Word, include charts and tables if helpful) for the policy team to read and consider before they respond to the Minister; and a working document or documents (eg R code, knitr/rmarkdown report, Excel workbook, SAS project) and accompanying documentation aimed at one of your fellow analysts so they can easily understand, reproduce and peer review your work.

In your analysis, you will need to combine several tables of data eg to answer the questions on activities as there is one table with a row per visitor, and another table with a row per visitor-activity combination. You are free to use any analytical tools you have access to, but note that in the actual work situation this task would involve a combination of SQL and R. To keep the exercise within reasonable limits, do not address the “should something be done about it” part of the original query, and do not use any data sources other than that linked to above. Note that this exercise is testing your writing and communication skills as well as your analytical skills; so please carefully consider the two different audiences for the two documents you need to produce.

Additional guidance would be given on the survey design, weights, and some of the columns in various tables that are particularly relevant (for example, the vw_IVSSurveyMainHeader.csv file is the table with one row per respondent; there are many different ways of defining “spend” with this data, but the WeightedSpend column is the correct one to use for this exercise; “Weighted” in WeightedSpend refers to outlier treatment, not to survey weighting; the PopulationWeight column is the survey weight).

Interview

Job interviews are expensive in terms of time and mental effort (intellectual and emotional) for both candidates and panels; and they are also not as effective in predicting performance as one would hope. So I like the interview to be the final stage, and of as small a number of candidates as possible. As much weight as possible should be given to the experience and demonstrated skills of the candidates in doing tasks that resemble the job, which (in the case of data scientists) rarely resembles a job interview.

Just like the technical exercise, interview questions should be ones that any candidate can answer but excellent candidates can answer excellently. In organisations I’ve worked for since 2000 there have been policies to use structured behavioural questions in interview, and typically HR will provide samples for the manager. It’s important to choose and tweak these carefully to make sure you have a clear relationship between interview questions and selection criteria. My data scientist or statistician interviews all follow the same basic structure. After the introductions and explaining how the interview will work, I start with two non-behavioural questions:

Then I have about four to six behavioural questions, all of which follow the template “This is a question about X. Tell us about a time you had to …., and hence showed your ability to do X.”. Typically, one of these questions will be technical and along the lines of

The other questions will be more generic skills asking for examples of behaviours such as problem solving, working in teams, persisting through difficulties, managing time and resources. Nearly always, one of my questions is a variant on

The pitch of the questions will depend on the seniority of the role of course (eg entry-level roles seeking examples that could be met by university study, volunteer work or holiday jobs if there is limited work experience to draw on). I expect additional technical details to emerge as collateral in the answers to these non-technical questions, but if necessary toward the end of the interview might ask additional questions like

These interviews always end with two things:

It’s important to note that these are two separate opportunities! Sometimes people focus just on the “any questions for us” and miss the part about “anything you want to add”. When asked “anything you want to add”, even if all their skills and behaviours have been well showcased through the interview the candidate should still make a final pitch drawing it all together – basically, this is their chance to ask for the job. The “anything you want to ask us” is much less important. Candidates should keep questions short and limited to things they really do need to know; the panel are busy, and probably apart from the recruiting manager they will not be particularly interested in the conversation at this point.

Interview panels should include some kind of customer representative, at least one person with sufficient technical skills to judge the applicants, and the recruiting manager. They should have two or three people on them and a mix of genders and (if possible) ethnicities.

I can safely “give away the questions” for interviews I’m involved with. The aim of the interview is not to surprise people with questions and test whether they come up with the “right” answer, but to understand their skills and experience. The more they know in advance what will be asked and hence how to prepare for it, the better chance the panel has of making a good decision.

Psychometric tests

I don’t do these for recruitment unless they are organisationally required. In fact I have some fairly strong views about them; probably something for a blog post of its own.

Summary

So there we have it. My key points for recruiting data scientists for the public sector (which may well apply more broadly):

This worked well for me, and I would argue mine were some of the strongest analytical teams anywhere in New Zealand, not just the public sector. I hope this might be helpful for others recruiting analysts too.

To leave a comment for the author, please follow the link and comment on their blog: free range statistics - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.