Mind Reading… What are our customers thinking?

[This article was first published on Oracle R Enterprise, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Overhauling analytics processes is becoming a recurring
theme among customers. A major telecommunication provider recently
embarked on overhauling their analytics process for customer surveys. They had three
broad technical goals:

  • Provide an agile
    environment that empowers business analysts to test hypotheses based on
    survey results
  • Allow dynamic customer segmentation
    based on survey responses and even specific survey questions to drive
    hypothesis testing
  • Make results of new
    surveys readily available for research

The ultimate goal is to derive greater value from survey
research that drives measurable improvements in survey service delivery, and as
a result, overall customer satisfaction.

This provider chose Oracle Advanced Analytics (OAA) to power
their survey research. Survey results and analytics are maintained in Oracle
Database and delivered via a parameterized BI dashboard. Both the database and
BI infrastructure are standard components in their architecture.

A parameterized BI dashboard enables analysts to create
samples for hypothesis testing by filtering respondents to a survey question
based on a variety of filtering criteria. This provider required the ability to
deploy a range of statistical techniques depending on the survey variables,
level of measurement of each variable, and the needs of survey research

Oracle Advanced Analytics offers a range of in-database
statistical techniques complemented by a unique architecture supporting
deployment of open source R packages in-database to optimize data transport to
and from database-side R engines. Additionally, depending on the nature of
functionality in such R packages, it is possible to leverage data-parallelism
constructs available as part of in-database R integration. Finally, all OAA
functionality is exposed through SQL, the ubiquitous language of the IT
environment. This enables OAA-based solutions to be readily integrated with BI
and other IT technologies.

The survey application noted above has been in production
for 3 months. It supports a team of 20 business analysts and has already begun
to demonstrate measurable improvements in customer satisfaction.

In the rest of this blog, we explore the range of
statistical techniques deployed as part of this application.

At the heart of survey research is hypothesis testing. A completed customer satisfaction survey
contains data used to draw conclusions about the state of the world. In the survey
domain, hypothesis testing is comparing the significance of answers to specific
survey questions across two distinct groups of customers – such groups are
identified based on knowledge of the business and technically specified through
filtering predicates.

Hypothesis testing sets up the world as consisting of 2
mutually exclusive hypotheses:

a) Null hypothesis –
states that there is no difference in satisfaction levels between the 2 groups
of customers

b) Alternate
hypothesis states that there is a significant difference in satisfaction levels
between the 2 groups of customers

Obviously only one of these can be true and the true-ness is
determined by the strength, probability, or likelihood of the null hypothesis
over the alternate hypothesis. Simplistically, the degree of difference
between, e.g., the average score from a specific survey question across two
customer groups could provide the necessary evidence in helping decide which
hypothesis is true.

In practice the process of providing evidence to make a
decision involves having access to a range of test statistics – a number
calculated from each group that helps determine the choice of null or alternate
hypothesis. A great deal of theory, experience, and business knowledge goes
into selecting the right statistic based on the problem at hand.

The t-statistic (available in-database) is a fundamental
function used in hypothesis testing that helps understand the differences in
means across two groups. When the t-values across 2 groups of customers for a
specific survey question are extreme then the alternative hypothesis is likely
to be true. It is common to set a critical value that the observed t-value
should exceed to conclude that the satisfaction survey results across the two
groups are significantly different. Other similar statistics available
in-database include F-test, cross tabulation (frequencies of various response
combinations captured as a table), related hypothesis testing functions such as
chi-square functions, Fisher’s exact
test, Kendall’s coefficients, correlation coefficients and a range of lambda

If an analyst desires to compare across more than 2 groups
then analysis of variance (ANOVA) is a collection of techniques that is commonly
used. This is an area where the R package ecosystem is rich with several proven
implementations. The R stats package
has implementations of several test statistics and function glm allows analysis of count data
common in survey results including building Poisson and log linear models. R’s MASS package implements a popular
survey analysis technique called iterative
proportional fitting
. R’s survey
package has a rich collection of features

The provider was specifically interested in one function in
the survey package – raking (also known as sample balancing) – a process that assigns
a weight to each customer that responded to a survey such that the weighted
distribution of the sample is in very close agreement with other customer attributes,
such as the type of cellular plan, demographics, or average bill amount. Raking
is an iterative process that uses the sample design weight as the starting
weight and terminates when a convergence is achieved.

For this survey application, R scripts that expose a wide
variety of statistical techniques – some in-database accessible through the
transparency layer in Oracle R Enterprise and some in CRAN packages – were
built and stored in the Oracle R Enterprise in-database R script repository.
These parameterized scripts accept various arguments that identify samples of
customers to work with as well as specific constraints for the various
hypothesis test functions. The net result is greater agility since the business
analyst determines both the set of samples to analyze as well as the
application of the appropriate technique to the sample based on the hypothesis
being pursued.

For more information see these links for Oracle’s R Technologies software: Oracle R Distribution, Oracle R Enterprise, ROracle, Oracle R Connector for Hadoop

To leave a comment for the author, please follow the link and comment on their blog: Oracle R Enterprise.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)