Today we caught up with Andrew Little and Daniel de Bortoli who will be teaching the ‘Web Scraping and Text Mining Lyrics in R‘ workshop at EARL on the 9th of September. We spoke about their careers and lives at Mango and what their dream workshop would look like…
Hi both! Thank you for talking to me today, I’d love to know some more details about your lives at Mango and how your career has led you to this point?
Daniel – I’ve been a Data Scientist at Mango for around a year, prior to Mango I was a researcher in the mechanical engineering sector until deciding that I wanted to move away from academia, and data science always seemed like an exciting and interesting area. I was interested in solving more practical problems, but still having intellectually challenging and interesting work. Since joining Mango, I’ve done lots of consulting and training projects and more recently I’ve been working on a revenue optimisation project.
Andrew – I’m a Junior Data Scientist at Mango – I’ve been here for two years now. I came straight from university so this is the first step in my career, I believe I was part of the first graduate intake that Mango did. I’ve done lots of training in my time here and a decent amount of consultancy projects too. Currently, I’m doing lots of work on helping teams move from one programming language to R, so I’m showing them the best way to work in R.
You’re hosting a workshop on web scraping, why do you think this is a useful skill for Data Scientists?
Andrew – There are many situations where you have data given to you and it’s just available, but perhaps when you try and do something more exciting or nuanced often there is no data and you need to get that yourself. So it’s a way that you can use the ridiculous vastness of the internet to get freely available data that you can then use.
Daniel – No matter what you’re interested in, if it is on the internet you can collect your own data. That’s the power of it.
What can people expect to leave the workshop knowing?
Daniel – Having a full end-to-end view of a modern workflow, so working with text data in R. From collecting – web scraping – to processing it, cleaning it, and looking at many of the common modeling approaches or common tasks with text data. All the way to generating some interesting outputs. We chose lyrics to work with, as they’re not too niche and most people are interested in music!
What would your dream workshop be?
Andrew – Personally for me there are two main areas I’m interested in, one is the workshop we’re actually teaching – so that’s good!
Daniel – So you’re saying we created your dream workshop?
Andrew – You’re right actually, yes! So either this or I’m quite interested in computer vision as well. That’s something you don’t see too much on, as it’s quite a new area. Anything that is seen as cutting edge I am interested in and not just standard statistical analysis. For example when you’re using data to build AI that actually feels like AI – so it’s doing something like a human would do, like reading text documents or processing images.
Daniel – Preparing this workshop, I realised I’d never actually worked with audio data. We’ve been referring to data that Spotify has, on things like, how upbeat a song is – so that’s audio analysis. That would be really interesting and uncommon to look into.
Thank you both!
If you’d like to find out more about the EARL Conference and the other workshops we have available, please visit here.