Do you love Data Science? I mean, the Data part in it

[This article was first published on r-bloggers on Programming with R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last week, We talked all about Artificial Intelligence (also Artifical Stupidity) which led me to think about the foundation of Data Science that's the Data itself. I think, Data is the least appreciated entity in the Data Science Value chain. You might agree with me, If you do Data Science outside Competitive Platforms like Kaggle where Data given to you is what most of the Data Scientists dream about in their jobs.

Data Foundation

“AI God fathers” have a good fan following but many of us know Fei-Fei Li whose (with her team)contribution of building the ImageNetfor AI is invaluable.

“One thing ImageNet changed in the field of AI is suddenly people realized the thankless work of making a dataset was at the core of AI research. People really recognize the importance the dataset is front and center in the research as much as algorithms.” – Fei-Fei Li

Data Startup

Meanwhile, Venture Capitalists aren't shying away from putting their money where Data is created and curated – Recently, silicon-valley startup Scale AI has hit the unicorn status. Scale AI's about us page reads:

The Data Platform for AI

Scale AI has also open-sourced Datasets and That's sweet.

Build your own Data

Zalando that open-sourced Fashion-MNIST published a nice paper that listed out the steps they took to publish the dataset. There are also free tools like labelImg and makesense.ai to help you annotate images for a typical Image dataset. For NLP Annotation, BRAT is a nice free open-source tool. And, If you are planning for a pet project and don't have the required dataset this tutorial by Mat Kelcey of counting bees on a rasp pi with a conv net would be a tremendous help.

In R, Check out this to learn How to generate meaningful fake data for learning, experimentation and teaching using {fakir}.

That said, If you appreciate Data Science as much as you'd appreciate the beauty of a Ferrari or Lamborghini, then you might also have to remind you that car is only useful if you've got the oil in it which is your super-clean labelled Data that's usable for Data science and Machine Learning.

If you liked this, Please subscribe to my Data Science Newsletter and also share it with your friends!

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers on Programming with R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)