- 17 – Fresher’s from various engineering background (both Graduates and Post-Graduates)
- 12 – Fresher’s from various quantitative background (Maths, Stats, MBAs, Econometrics, etc.)
- 18 – Experienced from different industry background (data management related, programming, consulting, etc.)
- All members of the sample belong to two major cities of India.
- As mentioned earlier, almost all except few has given same inference that ‘numbers of visits to branch’ have positive relationship with ‘age’ of the customers. In other words, as age is increasing, customers are preferring to visit the branch. Not to forget to mention, interestingly most of them are comfortable with R programming except few typo errors, kudos to all developers making it more user friendly.
- Astonishingly, only 21% of the sample, has done some data understanding after reading the data, i.e. looking into descriptive stats either through summary functions or plots before moving to the modeling part. In these 21%, not even a single sample member is from engineering background (by saying this I am not generalizing it, nor against engineering background, but commenting from sample perspective). Also, perceptibly, another 15% came back to data understanding after fitting at least one or two models.
- One more astonishment is, type of techniques employed by participants went onto deep learning methods. Average number of models applied by all participants was near to 3, herein, there are few participants, who didn’t even fitted a single technique/model.
- Only 15% of the sample, had clearly mentioned that result may be spurious or declined to comment on relationship due to noise in the data; however, only half of them came out with explanations for the same.
- Notable fact from our exercise is that, many of them directly applied the techniques they are aware (few among them directly fitted neural networks, and then came back to machine learning classification techniques as they need to comment on relationship). And, more than half of the sample first directly test with a variant of Generalized Linear Model and then went to applications of other techniques as they found explanatory power of the model was low and they were behind all data mining techniques till time limit ends.
Author thank management of start-up for allowing to publish exercise highlights. He undertook several programs towards analytical talent development, views expressed here are from his industry experience. He can be reached at [email protected] for more details.