Let’s Do Some Hierarchical Bayes Choice Modeling in R!
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
Choice

Pricing Level

Proportion Selecting Option


Set

Brand A

Brand B

Brand A

Brand B

Neither

1

Price 1

Price 1

0.5

0.3

0.2

2

Price 1

Price 2

0.5

0.2

0.3

3

Price 1

Price 3

0.5

0.1

0.4

4

Price 2

Price 1

0.4

0.3

0.3

5

Price 2

Price 2

0.4

0.2

0.4

6

Price 2

Price 3

0.4

0.1

0.5

7

Price 3

Price 1

0.3

0.3

0.4

8

Price 3

Price 2

0.3

0.2

0.5

9

Price 3

Price 3

0.3

0.1

0.6

# set seed in order to replicate set.seed(362013) #create structure for choice data choice<NULL #simulates 100 respondents and 9 choice sets for (i in 1:100) { cell1<NULL cell2<NULL cell3<NULL cell4<NULL cell5<NULL cell6<NULL cell7<NULL cell8<NULL cell9<NULL cell1<rbind(cell1,which.max(rmultinom(1, size = 1, prob = c(0.5,0.3,0.2)))) cell2<rbind(cell2,which.max(rmultinom(1, size = 1, prob = c(0.5,0.2,0.3)))) cell3<rbind(cell3,which.max(rmultinom(1, size = 1, prob = c(0.5,0.1,0.4)))) cell4<rbind(cell4,which.max(rmultinom(1, size = 1, prob = c(0.4,0.3,0.3)))) cell5<rbind(cell5,which.max(rmultinom(1, size = 1, prob = c(0.4,0.2,0.4)))) cell6<rbind(cell6,which.max(rmultinom(1, size = 1, prob = c(0.4,0.1,0.5)))) cell7<rbind(cell7,which.max(rmultinom(1, size = 1, prob = c(0.3,0.3,0.4)))) cell8<rbind(cell8,which.max(rmultinom(1, size = 1, prob = c(0.3,0.2,0.5)))) cell9<rbind(cell9,which.max(rmultinom(1, size = 1, prob = c(0.3,0.1,0.6)))) row<cbind(cell1,cell2,cell3,cell4,cell5,cell6,cell7,cell8,cell9) choice<rbind(choice,row) } choice apply(choice,2,table) choice_df<data.frame(cbind(1:100,choice)) names(choice_df)<c("id","set1","set2","set3","set4","set5","set6","set7","set8","set9") choice_df
id

set1

set2

set3

set4

set5

set6

set7

set8

set9

1

2

2

1

3

1

3

1

3

2

2

1

3

1

1

3

3

1

2

3

3

1

3

3

3

3

1

1

3

1

4

3

2

1

1

3

1

1

3

1

5

1

1

3

1

1

1

1

3

3

Option

set1

set2

set3

set4

set5

set6

set7

set8

set9

1

58

51

58

47

39

41

37

35

46

2

26

21

8

15

18

9

29

21

5

3

16

28

34

38

43

50

34

44

49

design1<matrix(c( 1,0,1,0,0,0, 0,1,0,0,1,0, 0,0,0,0,0,0), nrow=3,ncol=6,byrow=TRUE) design1 design2<matrix(c( 1,0,1,0,0,0, 0,1,0,0,0,1, 0,0,0,0,0,0), nrow=3,ncol=6,byrow=TRUE) design2 design3<matrix(c( 1,0,1,0,0,0, 0,1,0,0,0,0, 0,0,0,0,0,0), nrow=3,ncol=6,byrow=TRUE) design3 design4<matrix(c( 1,0,0,1,0,0, 0,1,0,0,1,0, 0,0,0,0,0,0), nrow=3,ncol=6,byrow=TRUE) design4 design5<matrix(c( 1,0,0,1,0,0, 0,1,0,0,0,1, 0,0,0,0,0,0), nrow=3,ncol=6,byrow=TRUE) design5 design6<matrix(c( 1,0,0,1,0,0, 0,1,0,0,0,0, 0,0,0,0,0,0), nrow=3,ncol=6,byrow=TRUE) design6 design7<matrix(c( 1,0,0,0,0,0, 0,1,0,0,1,0, 0,0,0,0,0,0), nrow=3,ncol=6,byrow=TRUE) design7 design8<matrix(c( 1,0,0,0,0,0, 0,1,0,0,0,1, 0,0,0,0,0,0), nrow=3,ncol=6,byrow=TRUE) design8 design9<matrix(c( 1,0,0,0,0,0, 0,1,0,0,0,0, 0,0,0,0,0,0), nrow=3,ncol=6,byrow=TRUE) design9
Creating a List Structure for bayesm
We will be using the function rhierMnlRwMixture from the R package bayesm to estimate the parameters of our choice model, and bayesm uses a list structure to store the y (choices), the X (design matrices), and the z (covariates). We will not be discussing covariates in this post, but you should be aware that bayesm offers much greater capability than we will be covering. For instance, the function's name, rhierMnlRwMixture, suggests a hierarchical multinomial random walk with mixtures of normal distributions. We, however, will be keeping it simple.
The following code will create the list structures that bayesm needs. It is somewhat more general than we need for this example given that every individual sees the same nine choice sets. Briefly, id is used to select only those rows with the same id value. This allows us to have multiple rows for each respondent, as we might if the data were in long rather than wide format. The list will be stored in lgtdata. Then, we loop through each respondent to create a list of 100 with y and X.
id=levels(as.factor(choice_df[,1])) lgtdata=NULL for (i in 1:100) { respdata=choice_df[choice_df[,1]==id[i],] ty<NULL tdesign<NULL ty=c(ty,respdata$set1) tdesign=rbind(tdesign,design1) ty=c(ty,respdata$set2) tdesign=rbind(tdesign,design2) ty=c(ty,respdata$set3) tdesign=rbind(tdesign,design3) ty=c(ty,respdata$set4) tdesign=rbind(tdesign,design4) ty=c(ty,respdata$set5) tdesign=rbind(tdesign,design5) ty=c(ty,respdata$set6) tdesign=rbind(tdesign,design6) ty=c(ty,respdata$set7) tdesign=rbind(tdesign,design7) ty=c(ty,respdata$set8) tdesign=rbind(tdesign,design8) ty=c(ty,respdata$set9) tdesign=rbind(tdesign,design9) lgtdata[[i]]=list(y=ty,X=as.matrix(tdesign)) }
To verify that the code functioned as intended, you need to view the elements of lgtdata. You index the list as lgtdata[[1]] for the first respondent and lgtdata[[100]] for the last respondent. Here are the data for the first respondent. The design matrices have been stacked in $X, and the choices have been placed in the vector $y. Once you tell rhierMnlRwMixture that there are three alternatives per choice set, $X will be subdivided and matched with $y.
$y [1] 2 2 1 3 1 3 1 3 2 $X [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 0 1 0 0 0 [2,] 0 1 0 0 1 0 [3,] 0 0 0 0 0 0 [4,] 1 0 1 0 0 0 [5,] 0 1 0 0 0 1 [6,] 0 0 0 0 0 0 [7,] 1 0 1 0 0 0 [8,] 0 1 0 0 0 0 [9,] 0 0 0 0 0 0 [10,] 1 0 0 1 0 0 [11,] 0 1 0 0 1 0 [12,] 0 0 0 0 0 0 [13,] 1 0 0 1 0 0 [14,] 0 1 0 0 0 1 [15,] 0 0 0 0 0 0 [16,] 1 0 0 1 0 0 [17,] 0 1 0 0 0 0 [18,] 0 0 0 0 0 0 [19,] 1 0 0 0 0 0 [20,] 0 1 0 0 1 0 [21,] 0 0 0 0 0 0 [22,] 1 0 0 0 0 0 [23,] 0 1 0 0 0 1 [24,] 0 0 0 0 0 0 [25,] 1 0 0 0 0 0 [26,] 0 1 0 0 0 0 [27,] 0 0 0 0 0 0
Finally We Can Run the Analysis
We need to set a couple of values and pass those values along with the data to the function. R=20000 is the number of MCMC draws, and keep=10 is the thinning parameter telling the function to keep every 10th draw.
mcmc=list(R=20000,keep=10) library(bayesm) out=rhierMnlRwMixture(Data=list(p=3,lgtdata=lgtdata),Prior=list(ncomp=1),Mcmc=mcmc) plot(out$loglike,type="l") trace<t(apply(out$betadraw,c(2,3),mean)) matplot(trace, type="l") beta.501_2000<apply(out$betadraw[,,501:2000], 2, mean) beta.1001_2000<apply(out$betadraw[,,1001:2000], 2, mean) beta.1501_2000<apply(out$betadraw[,,1501:2000], 2, mean) cbind(beta.501_2000,beta.1001_2000,beta.1501_2000)
The rhierMnlRwMixture function took about 10 minutes on my PC to complete the 20,000 draws. The coefficients for the parameters in our design matrix are stored in out$betadraw, which is a 100 x 6 x 2000 array. The first two dimensions, the 100 x 6, seem reasonable. If hierarchical Bayes yields individuallevel beta weights, then we should have six coefficients for each of the 100 respondents. But we do not have one estimate of each coefficient for each respondent. There were 20,000 draws per respondent from which we kept every 10th for 2000 in total.
I have included a number of lines after rhierMnlRwMixture that require some time to explain because they involve issues about whether the MCMC draws have entered a highprobability region (often called convergence). Appendix A from the book Bayesian Statistics and Marketing reviews these commands when discussing another example. This chapter by Rossi and Allenby also describes this same study. Both readings are challenging, but neither is overly technical.
However, I do want to finish with the last four lines of codes. I want to examine the aggregate level estimates for my six parameters after averaging over respondents. If I discard the first 500 draws, I will average draw 501 to 2000. Thus, the three apply operations discard a different number of initial draws (500, 1000, and 1500). One of the conditions for entering a highprobability region is that the estimates should not fluctuate widely. Let's take a look.
beta.501_2000

beta.1001_2000

beta.1501_2000


Brand A

0.08

0.08

0.04

Brand B

1.99

2.02

2.01

Brand A, Price 1

0.76

0.74

0.69

Brand A, Price 2

0.11

0.09

0.05

Brand B, Price 1

1.39

1.43

1.43

Brand B, Price 2

1.16

1.19

1.18

We do not see much fluctuation, but do these coefficients reflect what we knew when we generated the data? Given the dummy coding, the coefficients appear reasonable. The brands are ordered as expected, given that "Neither" was the base alternative and has a coefficient of zero (Neither = Brand A > Brand B). The highest price is Price 3, which is also coded 0, therefore price sensitivity seems to be in the correct direction. But these coefficients do not represent linear effects, and we should be reluctant to reach general conclusions. We need a simulator to access the impact of changing prices for the two brands.
As you might recall, I began this post with the hope that it would help the reader to run their first hierarchical Bayes choice model using R. With that in mind, I tried to stay focused on running the analysis. For those wanting to understand the details, Kenneth Train provides an updated online textbook and some very blurry RealPlayer videos of a 2001 course. The learning curve is still very steep. Perhaps with the ability to run the analysis in R, you will be motivated to start the climb.
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.