[This article was first published on R – Displayr, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

## Why use simulated data to check your choice model?

You can generate choice model designs by using numerous methods with different specifications. Likewise, there are a variety of ways to check and compare them.

The most common methods are D-error and design balance. However, these metrics do not directly measure how a design would perform in real life where we use it for a survey and analyse the results with a choice model. Running a survey is expensive and time-consuming, so checking designs by running a small initial survey and fitting a model is costly. However, it is possible to generate simulated data by making reasonable assumptions about respondents preferences for the attribute levels. You can use this simulated data to check and compare designs before collecting the real survey data.

I’ll show you how, using the FitChoiceModel function from the R package flipChoice, which is available on GitHub.

## Inputs: design

The choice model design needs to be supplied to FitChoiceModel in any of the following formats:

• a choice model design object generated from flipChoice::ChoiceModelDesign (supplied to argument design)
• a Sawtooth CHO file (supplied as a file path to argument cho.file)
• a Sawtooth dual format design file (supplied as a file path to argument design.file)
• a JMP design file (supplied as a file path to argument design.file)

The last three formats may also require a file containing attribute levels, supplied as a file path to argument attribute.levels.file, if the level names are not already contained in the design. For the example in this article, I will be using a choice model experiment design with eggs attributes (Weight, Quality, Price) generated using the Efficient algorithm.

## Inputs: priors

Once the design has been specified, the next step is to supply the priors. This is done by passing a two-column matrix to the argument simulated.priors, where the first and second columns need to contain the means and standard deviations respectively of the simulated multivariate normal distribution of the parameters. The rows in the matrix correspond to the parameters in the model.

I have shown an example of this below for the eggs attributes. The first three rows correspond to Weight, the next two correspond to Quality and last four correspond to Price. Note that I did not include the first levels from each attribute due to the coding of the design.

> matrix(c(1, 2, 3, 0.5, 1, -1, -2, -3, -4, 0.5, 1, 1.5, 0.25, 0.5, 0.5, 1, 1.5, 2), ncol = 2)
[,1] [,2]
[1,]  1.0 0.50
[2,]  2.0 1.00
[3,]  3.0 1.50
[4,]  0.5 0.25
[5,]  1.0 0.50
[6,] -1.0 0.50
[7,] -2.0 1.00
[8,] -3.0 1.50
[9,] -4.0 2.00

Alternatively, if a choice model design object was supplied, the priors used to create the design will be used instead by setting simulated.priors.from.design = TRUE. If this argument is set to true but no priors are present in the design object, then the model will use prior means and standard deviations of zero. This will cause the simulated choices to be random and independent of attributes.

For the purpose of checking the design, I would recommend specifying non-zero priors for all attribute levels in the design, so that we can see how the design performs for each level.

## Inputs: simulated sample size

Next, you will need to choose the simulated sample size, which is the number of respondents to simulate. This is set through the argument simulated.sample.size. The default is 300, but generally this should be larger if there are many parameters in the model, especially if you find that estimated parameters do not match the model.

## Inputs: others

The remaining arguments to FitChoiceModel are mostly model-related settings and not specific to simulated data. For the purpose of checking the design, I would generally first run a single-class latent class analysis without any questions left out, which is the default, but you may choose different settings or even run Hierarchical Bayes instead (see this article).

## Model output

The table below shows the output from running a single-class latent class analysis with the priors shown previously. The model has only roughly deduced the mean priors. However, this is a limitation with the model rather than the design. There is no variation between respondents, because the model has only estimated one set of coefficients. The prediction accuracy is not high, but it is around the usual level for a latent class analysis. You can also use simulated data to determine what level of accuracy to expect from real data, assuming that respondents have certain preferences.

The next table shows the output from a Hierarchical Bayes analysis on the same data. The means are much closer to the priors than with latent class analysis. Yet, the standard deviations are smaller than the priors and the model has skewed the respondent distributions. This is a model-related issue rather than an issue with the design. The prediction accuracies are higher, due to the flexibility of Hierarchical Bayes model. Overall, there do not seem to be any issues with the design. I would be concerned if the model failed to converge, if parameter estimates were vastly different from the priors, or if prediction accuracies were low, or too high (i.e., close to or equal to 100%). When out-of-sample prediction accuracies are too high, this means that the model is overfitting to the data. This situation can be addressed by increasing the number of respondents or questions per respondent, or decreasing the number of attributes or attribute levels in the design.

## Standard errors

The last table displays parameter statistics from the latent class analysis above. This was created by passing the output of FitChoiceModel into flipChoice::ExtractParameterStats. The parameters for which a prior was specified all have small standard errors relative to the coefficients and hence a high level of significance. It would be worth investigating the design if any of these parameters were not significant. This could indicate a potential issue with the design failing to adequately cover some levels. You can also compare different designs with the same specifications and settings against each other. Lower standard errors indicate a better design.

                     Coefficient Standard Error t-Statistic       p-Value
Alternative: 2         0.1476026     0.05916977    2.494560  1.266492e-02
Alternative: 3         0.1299049     0.07141263    1.819074  6.900015e-02
Weight: 60g            0.8972597     0.08495843   10.561162  1.267965e-25
Weight: 65g            1.6969143     0.08298150   20.449309  4.253649e-87
Weight: 70g            2.3512835     0.09265273   25.377380 7.833598e-129
Quality: Barn Raised   0.4161940     0.06248258    6.660960  3.225634e-11
Quality: Free Range    0.7119104     0.05913308   12.039123  1.243114e-32
Price: 3 dollars      -0.6612453     0.08050931   -8.213278  3.166417e-16
Price: 4 dollars      -1.4852020     0.09225711  -16.098510  5.526794e-56
Price: 5 dollars      -1.9404908     0.08572401  -22.636491 7.635587e-105
Price: 6 dollars      -2.3535715     0.10071473  -23.368690 4.537753e-111

## Summary

I have described the inputs required to create a choice model output with simulated data, and the outputs to look for when checking and comparing designs in R. It is a good idea to run a design with simulated data first, to check for any issues before spending valuable time and money giving it to respondents.

If you are interested in seeing how simulated data can be used to compare design-generating algorithms, check out this comparison. Want to find out more about choice models? Head on over the “Market Research” section of our blog.