Over the past while with my time on LinkedIn, I got to have exposure to many people from many different lines of work. I also managed to have carved a space for myself there where I can post about Data Science topics and share my blogs along the way. There have always been posts and polls comparing R and Python as well as the subsequent debates among users of the languages as far as which one is superior for doing Data Science. While these sort of arguments will never end and I am far from innocent of engaging in them, I chose to take to task understanding why Data Science practioners preferred one languge over another by “controlling” for exposure to the other language.
In this blog I am going to share my results from my LinkedIn polls comparing respondents preferences. The polls asked for respondents preferences for:
dplyrin R vs
pandasin Python for data wrangling,
ggplot2in R vs
seabornin Python for data visualization, and
Using Jupyter notebooks vs RMarkdown for writing reports.
This is by no means a formal study, its more of just me sharing my findings in blog form. Social media platforms come and go, but having a blog where I can share my findings (albeit less popular) offers a place where I can post my curated content. Likely due to LinkedIn’s algorithms, my first and second questions got more traction with over 132,000 views combined and over 1600 and 1300 votes respectively, while my last question only got a little more than 4000 views and over 106 votes at the time of writing.
To quote a comment on one of my polls:
With this in mind, lets share the results of these polls.
(Visuals were made with
ggplot2 and the
ggtech package for the theme)
1. dplyr vs pandas
As expected, most users who were pro-pandas never used
dplyr before. However, when controlling for prior experience, it was pretty much a 50-50 split among respondents between using
pandas in Python and
dplyr in R. There were some comments recommending that I check out the
dtplyr packages in R; while I don’t have much exposure to using those packages presently, I hope to check them out in the future.
For my closest experience to
dplyr in Python, check out my review on the
2. ggplot2 vs matplotlib and seaborn
In the case of comparing
seaborn among users who had experience with both packages,
ggplot2 is preferred by 56% of users. Most users of
seaborn don’t have experience with
ggplot2 and vice-versa.
I was told to check out the
plotly library which is compatible in R and Python and it really looks like a great library to have for building interactive dashboards and applications. While I don’t have much experience with it now, I do hope to check it out when time allows for it.
3. Using Jupyter notebooks vs RMarkdown for writing reports.
The results from this poll are questionable as I only got 106 replies to this poll. With this in mind these are the results:
Of users with experience with using both RMarkdown and Jupyter notebooks for writing their reports, 63% of users prefer using RMarkdown over Jupyter notebooks, however there are more users who have experienced Jupyter notebooks than RMarkdown.
With all being said, using
dplyr in R or
pandas Python for doing data wrangling seems like a toss up among users with experience with both languages. For data visualization,
ggplot2 seems to be preferred over
seaborn and if you trust the sample size, RMarkdown is preferred over Jupyter notebook among users with experience with both.
In general, apparent that R is still the underdog in terms of it being a language used for Data Science and programming- but by no means does that make me intend on stopping from using it any time soon.
When I get the time, I look forward to giving
plotly a spin!
Thank you for reading!