Data science is hot right now. If you want to learn more about it, where should you go? Online, of course! Check out our favorite data science sites. Whether you’re a beginner or a pro, these are sites you should know.
Not so long ago, if you wanted information on a topic like data science, you had to look for it – either at your local library or at a university. Information was golden, and like gold it was guarded.
Now, though, we almost have too much information. The skill isn’t in finding material; it’s in separating the valuable and useful from the unimportant and useless. This is especially true with a hot topic like data science. There are plenty of “sources” that aren’t worth the time it takes to read them.
With that in mind, we’ve compiled a list of trusted online repositories of data science knowledge. They can be divided into the following groups:
- Machine Learning Competitions
- Aggregate Sites
- Q&A Sites
- R & Python
Let’s jump right in!
Machine Learning Competitions
You might think that data science competitions are just for experts, but these sites can also offer a lot of good information for beginners.
Kaggle calls itself “your home of data science and machine learning”. It’s best known for its competitions, but the site also has a lot of other information, including a job board, Kaggle Kernels, publicly available datasets from past competitions, and a discussion forum. Newcomers have plenty of ways to hone their craft; the most interesting, in my opinion, are the completed competitions that have been turned into learning opportunities.
The competitions in Kaggle are based on the principle of crowdsourcing. Once a problem is announced by some company, Kaggle members can start trying to solve it, either alone or in groups. The competition rewards are extremely high, so there is usually an enormous response. Kaggle has more than a million registered members and several thousand teams are usually involved in each competition! It is one of the largest data communities on the Internet.
As we said earlier, some closed competitions have been turned into learning opportunities for beginners. There’s no prize (except knowledge), but these are a great way to learn data science. Check out these three:
- Titanic: Machine Learning from Disaster
- House Prices: Advanced Regression Techniques
- Digit Recognizer
CrowdANALYTIX may not be as famous as Kaggle – to which it is very similar – but it’s near and dear to my heart. Some of my first data visualization and predictive modeling projects were given a reward on this site, so this community will always be very special to me.
This site features community-held competitions related to data modeling, research, and visualization. However, their approach is slightly different than Kaggle’s. Kaggle will usually provide you with a well-prepared dataset on which you try different Machine Learning (ML) algorithms and then optimize. The emphasis is on model development? and ML algorithms. On CrowdANALYTIX, you cover the entire process of model development?: you find the data, do the web scraping and data cleaning, explain your business approach, and (finally) apply the predictive algorithm.
Currently, CrowdANALYTIX has three contests geared to non-experts:
- Business Analytics for Beginners Using R–Part I
- Business Analytics for Beginners Using R–Part II
- Business Analytics for Beginners Using R–Part III
Analytics Vidhya is an Indian platform that counts more than 60,000 data scientists from 200+ countries as members. Among many other things, this site offers the following channels:
- Learn – Great for those getting started with data science. There are some nice suggested learning paths, links to training materials, and blog articles related to data science and Big Data. I especially like the Infographics section, which presents various topics using interesting diagrams and other visual aids.
- Engage – The main part of this section (for me, anyway) is the Q&A discussion forum. Anyone can join in with a question or an answer. There are also some quality blog articles regularly published on this site.
- Compete – Hackathons and interactive workshops are the main reason for this platform’s popularity. Analytics Vidhya focuses on predictive modeling competitions. These are based on real-life problems, which makes them a great way to learn the skills we use the most.
These sites host a huge amount of information for the aspiring and professional data scientist.
Data Science Central
Data Science Central is a platform designed for data scientists and Big Data practitioners. It also hosts a lot of information in the form of blog posts. These are classed according to the following channels:
- Big Data
- AnalyticBridge (for data analysts and Business Intelligence experts)
- Deep Learning
- Data Visualization
Data Science Central also offers recommendations about books, courses, and other learning methods. The industry’s latest trends are covered in a very understandable and interesting way. Here is just a sample of the content you can find on this site:
- Some Thoughts on Mid-Career Switching Into Data Science
- Statistics is Dead?–Long Live Data Science
- Time Series Classification with Tensorflow
KDnuggets is similar to Data Science Central – it’s a place where you can find a lot of information about data science. However, KDnuggets is organized a bit differently, and it focuses on industry news, opinions and interviews, publicly available datasets, and data science software. There’s also pages and pages dedicated to education on this site, including tutorials and courses.
Check out the content on KDnuggets – it’s a well-known blog aggregator. Below are some recent top posts:
- 30 Essential Data Science, Machine Learning & Deep Learning Cheat Sheets
- Introduction to Blockchains & What It Means to Big Data
- Understanding Machine Learning Algorithms
Simply Statistics was founded by three biostatistics professors ?– Jeff Leek, Roger Peng and Rafa Irizarry. These three are quite famous because of their free Coursera online courses dealing with statistics, data science, and machine learning. They created the Simply statistics platform to share their ideas and advice on data science. They focus on interesting topics that they use in their daily work.
This blog is not updated as frequently as some others, but it contains worthwhile posts, articles, and tips. There are also interviews with data scientists that discuss what it’s like to be a data scientist and what their daily work routine looks like.
No Free Hunch
We’ve already mentioned Kaggle; now meet No Free Hunch, the official Kaggle blog. Most of the articles on this platform are related to Kaggle competitions, but they also cover the following areas:
- Data Science News
- Kaggle News
- Winner’s Interviews
The winner’s interviews section is great. Through these posts, you can get to know experienced Kagglers – their background, their experience, and how they go about winning competitions.
CrossValidated and Stack Overflow
CrossValidated and Stack Overflow are very similar; in fact, CrossValidated is the sister site to Stack Overflow. Both are question-and-answer sites; Stack Overflow is visited by more than 50 million developers each month.
Why two Q&A sites? CrossValidated centers around statistics, machine learning, data analysis, data mining, and data visualization. I like to call it Stack Overflow for data scientists. Questions related to R and Python programming will be placed on Stack Overflow pages; questions related to statistical analysis, Machine Learning or probability theory will likely be found on CrossValidated.
I cannot imagine my daily work without these resources. They’re pulled up on my web browser most of the time.
Quora is another question-and-answer site where questions are asked, answered, edited, and organized by a community of users. Quora answers questions of all kinds, from cooking to career advice. You can choose specific channels, like technology, or you can search for topics.
Besides questions related to programming issues or Machine Learning problems, you can find interesting “general data science” questions on Quora. Here are a few:
- How Can I Become a Data Scientist?
- Why Is Python a Language of Choice for Data Scientists?
- Machine Learning: Is Machine Learning a Field Best Suited for Geniuses? Should I Bother Trying to Pursue It?
- What Programming Language Is Best for Machine Learning and Statistical Analysis? Is it R or Python?
These are good topics for someone who is a beginner in this field. And each person who submits an answer must also list their credentials, so if you look, you can find some really good info here.
R & Python
Data scientists are generally divided between two languages?–some prefer R, others prefer Python. Python.org is for Python developers. This is the official home of the Python programming language. On this site, you can find nearly anything to do with programming in Python ?– tutorials, documentation, jobs, information about workshops and conferences, the latest news, and upcoming events for Python developers. This is the most important website for data scientists who use Python as their primary programming language.
Python.org is divided into several parts:
- About – Learn more about Python, including how to get started programming.
- Downloads – Download the latest Python release and install it on your computer; this section covers all Python releases.
- Documentation – A detailed and clear introduction to the language, syntax, and semantics of Python, plus documentation related to the standard library.
- Community – Information for the Python user community.
- Events – Announces upcoming conferences and other events.
- Success Stories – 41 stories about Python implementations and Python software.
- News – Also includes interviews with those in the Python community.
R-bloggers offers insightful posts, daily news, and tutorials all about the R programming language. It has more than 750 contributors and over 50,000 followers; it’s quite famous among R developers.
It is handy to have all articles, advice, and best practices related to R in one place? – especially when you need help and you’re in the middle of the development process.
Although R-bloggers is known primarily for its posts, the site also provides a nice learning path for R beginners. This path is divided into several sections (R Basics, Data Manipulation, Data Visualization, Machine Learning, etc.). Each section points you to relevant resources that range from documentation and online courses to books and other methods. It’s a great way to begin learning and stay engaged.
The Comprehensive R Archive Network (CRAN)
CRAN is a collection of sites which carry identical material (so-called web mirrors) consisting of distribution(s), extensions, and documentation for the R programming language. It’s where you can download the latest official release of R, daily snapshots of R (copies of the current source trees), and a wealth of additional contributed code. Without CRAN’s documentation, it would be nearly impossible to program in R.
It is worth mentioning that the R Development Core team has published some very useful manuals. Beginners and pros alike can benefit from reviewing these:
- An Introduction to R – An introduction to the language and to using R for statistical analysis and graphics.
- R Data Import/Export – Describes the import and export facilities available either in R itself or via packages available on CRAN.
- R Installation and Administration – How to install R.
- The R Language Definition – Details of the expression evaluation process, which is useful to know when you’re programming R functions.
Why Use These Data Science Resources?
There are many webpages, resources, and communities that are devoted to data science. This post listed sites with the highest quality of content?, the most popular sites that every data scientist should know. They are an excellent starting point for those just beginning their data science journey. You can pick up sound advice, find pointers to the best courses, and learn about what materials and books will further your development and growth. Who knows?? Maybe you’ll contribute to the development of these communities and inspire someone else to learn about data science.