The first official release of
poorman (v 0.1.9) is now on CRAN! You can now install
poorman directly from CRAN with the following code:
In this blog post I want to address some common questions that I have received since I started writing the package.
poorman is a package that unapologetically attempts to recreate the
dplyr API in a dependency free way using only
poorman is still under development and doesn’t have all of
dplyr’s functionality but what I would consider the “core” functionality is included. The idea behind
poorman is that a user should be able to take their
dplyr based script and run it using
poorman without any hiccups.
So what does
In this first official release,
poorman includes copies of the key
select(), rename(), pull(), relocate(), mutate(), transmute(), arrange() filter(), slice() summarise() / summarize() group_by(), ungroup()
poorman also includes the join functionality.
inner_join(), left_join(), right_join(), full_join() anti_join(), semi_join()
poorman also includes its own version of the pipe so you do not need to load or install
More functionality is being developed and will be added in time.
This is probably the most common question; why bother developing
dplyr already exists. Well there are actually several reasons why I decided to develop it. The most important reason to me though is quite simply because I can.
poorman started out as a personal challenge and a bit of fun. Also as a freelance R developer, it is good to build up my portfolio of open source code that I can show to potential clients.
Another reason for developing
poorman is I wanted to challenge a common misconception that
base R is not as powerful, or as good, or as useful as
dplyr. Too often I see and hear comments belittling
base R and as a user of the language for over 10 years now – well before the inception of
dplyr – I find this idea very worrying.
poorman’s package start up message is quite poignant in this regard.
I’d seen my father. He was a poor man, and I watched him do astonishing things. – Sidney Poitier
Finally, I have a natural joy of teaching. Writing
poorman gives me a platform to hopefully show useRs two key aspects of R programming in
base; common data manipulation tasks; and non-standard evaulation.
But why not just use
Let’s be honest, the
tidyverse is a fantastic set of packages which have transformed the face of data analysis in R, and
dplyr is arguably one of the most important packages within the
tidyverse. The API is in my opinion very easy to learn and use.
Being a part of the
tidyverse, however, means that
dplyr comes with a large number of dependencies that users must also install which is often seen as a disadvantage to using the
tidyverse. Disadvantages of dependencies have been written about before and so I won’t go into detail here. However what I will say is that the user may not have a need for additional parts of the
tidyverse and so may not wish to have to install multiple packages to use one or two functions.
Some of these dependencies are very useful of course, allowing expansion into other areas such as accessing Spark instances and databases using the same API the user already knows. This is great and if you are using these additional tools then I absolutely recommend that you choose
poorman. However if you don’t need the extra dependencies and functionality that comes with the wider
tidyverse then maybe consider giving the lightweight
poorman a go.
Finally a point on installation times,
poorman takes roughly six seconds to install. If you’ve ever had to install or upgrade
dplyr or the
tidyverse, you’ll recognise that this is very fast.
Why the name
As I have already mentioned, I have seen comments in the past pertaining to R’s worthlessness without the
tidyverse and so the name
poorman is a subtle play on the the idea that you must be a “poor man” if you use
base. The irony of course is that I have managed to recreate – quite easily – the key parts of the
dplyr API using only
Why not use
data.table for the backend?
Because I wanted to build something that was completely dependency free and adding
data.table as an
Import adds a dependency.
poorman have dependencies?
To answer this, we need to define what we mean by “dependency free”.
poorman does have some dependencies but they are for development purposes only and are therefore listed in the
Suggests part of the
DESCRIPTION file. Thus when a user installs the package, these dependencies are only ever installed if they are explicitly requested. However,
poorman doesn’t have any dependencies that users of the package need to install in order to use its functionality. I use these dependency packages to help me develop more easily. Therefore
poorman isn’t a truly “dependency free” like
data.table is, but it is dependency free for its intended users.
So if you find yourself needing a dependency free data manipulation package that follows the
dplyr API with short installation times then give
poorman a try. Equally if you find any issues, please submit an issue to