R and Salesforce

February 25, 2012

(This article was first published on Cloud.. Analytics and CRM, and kindly contributed to R-bloggers)


R is widely used among scientists and statisticians to perform statistical analysis while Salesforce.com is one of the leading CRM software packages used for Marketing and Salesforce automation. Salesforce.com contains vital information regarding Leads, Customers, Contacts, Opportunities and Cases. Currently this data is mainly used for operational purposes by Sales and Marketing professionals. How about using this data for predictive analysis or forecasting?

Unfortunately Salesforce.com built on a multi-tenant architecture, limits system usage heavily. Only way to achieve this is to read Salesforce data using Java Web Services API provided by Salesforce and feeding that data to R.

One way is to export salesforce data to csv file and importing the data into R.
But this will be quite tedious and time consuming. It would be nice if one can query Salesforce.com data directly from R and create a data frame or matrix for performing analyses.


Fortunately Java code can be directly invoked from R using rJava package. Values returned by Java methods are available in R session memory and can be mapped to equivalent R data structures. So, theoretically it must be possible to invoke a java method from R which connects to Salesforce.com and returns the required data.

Let us consider a trivial use case where a java method accepts String argument containing SOQL (Salesforce.com equivalent of SQL) and execute that statement returning data as a list of lists such as vector of vectors.

The first step is to install rJava as follows:

Make sure that Java 1.6 or higher is installed and all the environment variables are pointing to correct directories or folders. (Such as CLASSPATH and PATH)

Start R


Once the installation is successful, try


This command will initialize Java.

myRObj=.jnew(“myJavaobj”) will crate a java object from your java class.

For detailed referece, please refer: http://cran.r-project.org/web/packages/rJava/rJava.pdf

Now, I am going to develop a java class say, SalesforceHelper.java having a method queryObject.

GetObject method will accept 1 String argument:SOQL statement. (Salesforce.com Equivalent of SQL) such
as Select Id, Name From Lead.

Force.com Web Service Connector (WSC)

Salesforce.com provides extensive java api to perform database query or updates. In order to develop SalesforceHelper java class mentioned above, I am going to use WSC-20.jar along with enterprise.wsdl.

To download enterprise.wsdl, login to your salesforce.com org and click on Setup.

Click on Develop->API.

Click on enterprise.wsdl.

You may download wsc-20.jar from the project url:


Once you download wsc-20.jar, you need to create enterprise.jar as follows:

java -classpath wsc-20.jar com.sforce.ws.tools.wsdlc enterprise.wsdl enterprise.jar

Add wsc-20.jar and enterprise.jar files to your java CLASSPATH environment variable.

Let us write a java class to connect to Salesforce.com, perform SOQL and return a Vector containing Vectors of rows which in turn contain the selected field values.
Thus if our query is SELECT Name, Salary__c from Lead LIMIT 20, then our java method will return Vector containing 20 Vectors, each containing 2 objects of type String and Double.

import com.sforce.soap.enterprise.Connector;
import com.sforce.soap.enterprise.DeleteResult;
import com.sforce.soap.enterprise.EnterpriseConnection;
import com.sforce.soap.enterprise.Error;
import com.sforce.soap.enterprise.QueryResult;
import com.sforce.soap.enterprise.SaveResult;
import com.sforce.soap.enterprise.sobject.SObject;
import com.sforce.soap.enterprise.sobject.AggregateResult;
import com.sforce.ws.ConnectionException;
import com.sforce.ws.ConnectorConfig;
import java.util.*;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;

public class SalesforceHelper {

public class Main {

  static final String USERNAME = "USER";
  static final String PASSWORD = "PASSWD+SEC TOKEN";
  static final String ENDPOINT = "https://login.salesforce.com/services/Soap/c/23.0";
  static EnterpriseConnection connection;

  public static void main(String[] args) {

  public static Vector queryObject(String qry) throws Exception {
    ConnectorConfig config = new ConnectorConfig();

    Vector rowContainer = new Vector();
    String delims = "[ ,]+"; //Parse query tokens with these delimiters
    String[] tokens = qry.split(delims);
    String tableName=null;
    ArrayList colNames = new ArrayList();
    ListIterator listItr = colNames.listIterator();
    for (int i=1;i<tokens.length;i++){ //Start with 1 because 0 will be Select
        if (tokens[i].equalsIgnoreCase("from")){
   Object[] cols = colNames.toArray();
   System.out.println("columns:"+cols+" Object:"+tableName);
   connection = Connector.newConnection(config);

   QueryResult queryResults = connection.query(qry);

   if (queryResults.getSize() > 0) {
      for (int i=0;i<queryResults.getRecords().length;i++) {
           Vector vrow = new Vector();
           String obj="com.sforce.soap.enterprise.sobject."+tableName;
           SObject  sfObject = (SObject)queryResults.getRecords()[i];
           Class<?> cl = Class.forName(obj);
          for (int j=0;j<cols.length;j++){
               String methodName="get"+(String)cols[j]; //Method Name will be get<colname> such as getName()
               Method m = cl.getMethod(methodName);
               vrow.add(m.invoke(sfObject)); //Invoke method to get the field value
               System.out.println("Row#"+i+": "+vrow);
         // Add row to row container
         rowContainer.add(vrow); //Add Row to the container vector.

return rowContainer;


Now we have to write a function in R which will accept one argument and return a data frame as follows:

for(i in 1:rows){
  for (j in 1:cols){
    if (arr[[i]]$toArray()[[j]] %instanceof% "java/lang/String"){
     }else if (arr[[i]]$toArray()[[j]] %instanceof% "java/lang/Double"){

Currently this function is handling only String and Double object types. But it can be very easily extended to handle all the object types.
Also, this function is using java reflection to execute methods such as toArray() and toString().
Using reflection in this manner is very inefficient. Better way is to use Java Native Interface (JNI).
Please refer rJava reference as mentioned above.

Now let us get our Lead data as follows:


vec=sfObj$queryObject("SELECT Name, Salary__c From Lead LIMIT 20")


We now have Lead data frame in R which can be saved for further analysis.

Next Step

Performing some meaningful analysis on Lead data such as Regression Analysis to find out if any lead attributes are directly related to conversion.

To leave a comment for the author, please follow the link and comment on their blog: Cloud.. Analytics and CRM.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , , ,

Comments are closed.

Recent popular posts


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)