Being a data scientist in a startup I can program with several languages, but often R is a natural choice.
Recently I wanted my company to build a product based on R. It simply seemed like a perfect fit.
But this turned out to be a slippery slope into the open-source code licensing field, which I wasn’t really aware of before.
Bottom line: legal advice was not to use R!
Was it a single lawyer? No. The company was willing to “play along” with me, and we had a consultation with 4 different software lawyers, one after the other.
What is the issue? R is licensed as GPL 2, and most R packages are also GPL (whether 2 or 3).
GPL is not a permissive license. It is categorized as “strongly protective”.
In layman terms, if you build your work on a GPL program it may force you to license your product with a GPL license, too. In other words – it restrains you from keeping your code proprietary.
Now you say – “This must be wrong”, and “You just don’t understand the license and its meaning”, right? You may also mention that Microsoft and other big companies are using R, and provide R services.
Well, maybe. I do believe there are ways to make your code proprietary, legally. But, when your software lawyers advise to “make an effort to avoid using this program” you do not brush them off
Now, for some details.
As a private company, our code needs to be proprietary. Our core is not services, but the software itself. We need to avoid handing our source code to a customer.
The program itself will be installed on a customer’s server. Most of our customers have sensitive data and a SAAS model (or a connection to the internet) is out of the question. Can we use R?
The R Core Team addressed the question “Can I use R for commercial purposes?”. But, as lawyers told us, the way it is addressed does not solve much. Any GPL program can be used for commercial purposes. You can offer your services installing the software, or sell a visualization you’ve prepared with ggplot2. But, it does not answer the question – can I write a program in R, and have it licensed with a non-GPL license (or simply – a commercial license)?
The key question we were asked was is our work a “derivative work” of R. Now, R is an interpreted programming language. You can write your code in notepad and it will run perfectly. Logic says that if you do not modify the original software (R) and you do not copy any of its source code, you did not make a derivative work.
As a matter of fact, when you read the FAQ of the GPL license it almost seems that indeed there is no problem. Here is a paragraph from the Free Software Foundation https://www.gnu.org/licenses/gpl-faq.html#IfInterpreterIsGPL:
If a programming language interpreter is released under the GPL, does that mean programs written to be interpreted by it must be under GPL-compatible licenses?(#IfInterpreterIsGPL)
When the interpreter just interprets a language, the answer is no. The interpreted program, to the interpreter, is just data; a free software license like the GPL, based on copyright law, cannot limit what data you use the interpreter on. You can run it on any data (interpreted program), any way you like, and there are no requirements about licensing that data to anyone.
Problem solved? Not quite. The next paragraph shuffles the cards:
However, when the interpreter is extended to provide “bindings” to other facilities (often, but not necessarily, libraries), the interpreted program is effectively linked to the facilities it uses through these bindings. So if these facilities are released under the GPL, the interpreted program that uses them must be released in a GPL-compatible way. The JNI or Java Native Interface is an example of such a binding mechanism; libraries that are accessed in this way are linked dynamically with the Java programs that call them. These libraries are also linked with the interpreter. If the interpreter is linked statically with these libraries, or if it is designed to link dynamically with these specific libraries, then it too needs to be released in a GPL-compatible way.
Another similar and very common case is to provide libraries with the interpreter which are themselves interpreted. For instance, Perl comes with many Perl modules, and a Java implementation comes with many Java classes. These libraries and the programs that call them are always dynamically linked together.
A consequence is that if you choose to use GPLed Perl modules or Java classes in your program, you must release the program in a GPL-compatible way, regardless of the license used in the Perl or Java interpreter that the combined Perl or Java program will run on
This is commonly interpreted as “You can use R, as long as you don’t call any library”.
Now, can you think of using R without, say, the Tidyverse package? Tidyverse is a GPL library. And if you want to create a shiny web app – you still use the Shiny library (also GPL). Assume you will purchase a shiny server pro commercial license, this still does not resolve the shiny library itself being licensed as GPL.
Furthermore, we often use quite a lot of R libraries – and almost all are GPL. Same goes for a shiny app, in which you are likely to use many GPL packages to make your product look and behave as you want it to.
Is it legal to use R after all?
I think it is. The term “library” may be the cause of the confusion.
As Perl is mentioned specifically in the GPL FAQ quoted above, Perl addressed the issue of GPL licensed interpreter on proprietary scripts head on (https://dev.perl.org/licenses/ ): “my interpretation of the GNU General Public License is that no Perl script falls under the terms of the GPL unless you explicitly put said script under the terms of the GPL yourself.
Furthermore, any object code linked with perl does not automatically fall under the terms of the GPL, provided such object code only adds definitions of subroutines and variables, and does not otherwise impair the resulting interpreter from executing any standard Perl script”
There may also be a hidden explanation by which most libraries are fine to use. As said above, it is possible the confusion is caused by the use of the term “library” in different ways.
Linking/binding is a technical term for what occurs when compiling software together. This is not what happens with most R packages, as may be understood when reading the following question and answer: Does an Rcpp-dependent package require a GPL license?
The question explains why (due to GPL) one should NOT use the Rcpp R library. Can we infer from it that it IS ok to use most other libraries?
“This is not a legal advice”
As we’ve seen, what is and is not legal to do with R, being GPL, is far from being clear.
Everything that is written on the topic is also marked as “not a legal advice”. While this may not be surprising, one has a hard time convincing a lawyer to be permissive, when the software owners are not clear about it. For example, the FAQ “Can I use R for commercial purposes?” mentioned above begins with “R is released under the GNU General Public License (GPL), version 2. If you have any questions regarding the legality of using R in any particular situation you should bring it up with your legal counsel”. And ends with “None of the discussion in this section constitutes legal advice. The R Core Team does not provide legal advice under any circumstances.”
In between the information is not very decisive, either. So at the end of the day, it is unclear what is the actual legal situation.
Another thing one of the software lawyers told us is that Investors do not like GPL. In other words, even if it turns out that it is legal to use R with its libraries – a venture capital investor may be reluctant. If true, this may cause delays and may also require additional work convincing the potential investor that what you are doing is indeed flawless. Hence, lawyers told us, it is best if you can find an alternative that is not GPL at all.
What makes Python better?
Most of the “R vs. Python” articles are pure junk, IMHO. They express nonsense commonly written in the spirit of “Python is a general-purpose language with a readable syntax. R, however, is built by statisticians and encompasses their specific language.” Far away from the reality as I see it.
But Python has a permissive license. You can distribute it, you can modify it, and you do not have to worry your code will become open-source, too. This truly is a great advantage.
Is there anything in between a permissive license and a GPL?
Yes there is.
For example, there is the Lesser GPL (LGPL). As described in Wikipedia: “The license allows developers and companies to use and integrate a software component released under the LGPL into their own (even proprietary) software without being required by the terms of a strong copyleft license to release the source code of their own components. However, any developer who modifies an LGPL-covered component is required to make their modified version available under the same LGPL license.” Isn’t this exactly what the R choice of a license was aiming at?
R is not LGPL. R has no written exceptions.
The fact that R and most of its libraries use a GPL license is a problem. At the very least it is not clear if it is really legal to use R to write proprietary code.
Even if it is legal, Python still has an advantage being a permissive license, which means “no questions asked” by potential customers and investors.
It would be good if the R core team, as well as people releasing packages, were clearer about the proper use of the software, as they see it. They could take Perl as an example.
It would be even better if the license would change. At least by adding an exception, reducing it to an LGPL or (best) permissive license.
Click HERE to leave a comment.