Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Valentine’s Day is around the corner and love is in the air… but, shock horror, nearly every second marriage ends in a divorce! Unfortunately, I can tell you first hand that this is an experience you’d rather not have. In this post, we see how data science, in the form of the OneR package and an interesting new data set, might potentially help you to avoid that tragedy… so read on!

In a scientific study last year in Turkey, nearly 200 participants (married as well as divorced) were being asked to rate how important they find the following statements (some of them seem to have got a little lost in translation from the original Turkish version):

1. When one of our apologies apologizes when our discussions go in a bad direction, the issue does not extend.
2. I know we can ignore our differences, even if things get hard sometimes.
3. When we need it, we can take our discussions with my wife from the beginning and correct it.
4. When I argue with my wife, it will eventually work for me to contact him.
5. The time I spent with my wife is special for us.
6. We don’t have time at home as partners.
7. We are like two strangers who share the same environment at home rather than family.
8. I enjoy our holidays with my wife.
9. I enjoy traveling with my wife.
10. My wife and most of our goals are common.
11. I think that one day in the future, when I look back, I see that my wife and I are in harmony with each other.
12. My wife and I have similar values in terms of personal freedom.
13. My husband and I have similar entertainment.
14. Most of our goals for people (children, friends, etc.) are the same.
15. Our dreams of living with my wife are similar and harmonious
16. We’re compatible with my wife about what love should be
17. We share the same views with my wife about being happy in your life
18. My wife and I have similar ideas about how marriage should be
19. My wife and I have similar ideas about how roles should be in marriage
20. My wife and I have similar values in trust
21. I know exactly what my wife likes.
22. I know how my wife wants to be taken care of when she’s sick.
23. I know my wife’s favorite food.
24. I can tell you what kind of stress my wife is facing in her life.
25. I have knowledge of my wife’s inner world.
26. I know my wife’s basic concerns.
27. I know what my wife’s current sources of stress are.
28. I know my wife’s hopes and wishes.
29. I know my wife very well.
30. I know my wife’s friends and their social relationships.
31. I feel aggressive when I argue with my wife.
32. When discussing with my wife, I usually use expressions such as “you always” or “you never”.
33. I can use negative statements about my wife’s personality during our discussions.
34. I can use offensive expressions during our discussions.
35. I can insult our discussions.
36. I can be humiliating when we argue.
37. My argument with my wife is not calm.
38. I hate my wife’s way of bringing it up.
39. Fights often occur suddenly.
40. We’re just starting a fight before I know what’s going on.
41. When I talk to my wife about something, my calm suddenly breaks.
42. When I argue with my wife, it only snaps in and I don’t say a word.
43. I’m mostly thirsty to calm the environment a little bit.
44. Sometimes I think it’s good for me to leave home for a while.
45. I’d rather stay silent than argue with my wife.
46. Even if I’m right in the argument, I’m thirsty not to upset the other side.
47. When I argue with my wife, I remain silent because I am afraid of not being able to control my anger.
48. I feel right in our discussions.
49. I have nothing to do with what I’ve been accused of.
50. I’m not actually the one who’s guilty about what I’m accused of.
51. I’m not the one who’s wrong about problems at home.
53. When I discuss it, I remind her of my wife’s inadequate issues.
54. I’m not afraid to tell her about my wife’s incompetence.

Now, the question is whether one can decide – on the basis of their ratings alone – whether a person will actually get divorced. Let us see if data science can help us in this love related matter!

The data and a link to the corresponding article can be found here: Divorce Predictors data set, I unpacked the data for your convenience, you can download it here: divorce.csv. Let us now use the OneR package (on CRAN) to analyse it:

library(OneR)

divorce <- read.csv("data/divorce.csv", sep = ";")
divorce$Class <- factor(ifelse(divorce$Class == 0, "married", "divorced"))
data <- optbin(divorce)
model <- OneR(data, verbose = TRUE) # 18. My wife and I have similar ideas about how marriage should be
##
##     Attribute Accuracy
## 1 * Atr18     98.24%
## 2   Atr11     97.65%
## 2   Atr17     97.65%
## 2   Atr19     97.65%
## 5   Atr9      97.06%
## 5   Atr16     97.06%
## 5   Atr20     97.06%
## 5   Atr40     97.06%
## 9   Atr26     96.47%
## 10  Atr12     95.88%
## 10  Atr14     95.88%
## 10  Atr15     95.88%
## 10  Atr25     95.88%
## 10  Atr30     95.88%
## 15  Atr29     95.29%
## 15  Atr36     95.29%
## 15  Atr39     95.29%
## 18  Atr4      94.71%
## 18  Atr8      94.71%
## 18  Atr21     94.71%
## 18  Atr27     94.71%
## 22  Atr5      94.12%
## 22  Atr37     94.12%
## 22  Atr38     94.12%
## 25  Atr41     93.53%
## 25  Atr44     93.53%
## 27  Atr1      92.94%
## 27  Atr2      92.94%
## 27  Atr10     92.94%
## 27  Atr24     92.94%
## 31  Atr22     92.35%
## 31  Atr28     92.35%
## 31  Atr31     92.35%
## 31  Atr33     92.35%
## 35  Atr13     91.76%
## 35  Atr32     91.76%
## 35  Atr35     91.76%
## 38  Atr23     91.18%
## 38  Atr34     91.18%
## 40  Atr54     90.59%
## 41  Atr50     89.41%
## 42  Atr3      88.82%
## 43  Atr42     87.65%
## 44  Atr51     87.06%
## 45  Atr49     84.71%
## 45  Atr53     84.71%
## 47  Atr7      82.35%
## 48  Atr47     81.76%
## 49  Atr48     80.59%
## 50  Atr52     80%
## 51  Atr43     78.82%
## 52  Atr45     77.06%
## 53  Atr6      74.12%
## 54  Atr46     68.24%
## ---
## Chosen attribute due to accuracy
## and ties method (if applicable): '*'

summary(model)
##
## Call:
## OneR.data.frame(x = data, verbose = TRUE)
##
## Rules:
## If Atr18 = (-0.004,1.19] then Class = married
## If Atr18 = (1.19,4]      then Class = divorced
##
## Accuracy:
## 167 of 170 instances classified correctly (98.24%)
##
## Contingency table:
##           Atr18
## Class      (-0.004,1.19] (1.19,4] Sum
##   divorced             3     * 81  84
##   married           * 86        0  86
##   Sum                 89       81 170
## ---
## Maximum in each column: '*'
##
## Pearson's Chi-squared test:
## X-squared = 154.56, df = 1, p-value < 2.2e-16

plot(model)


prediction <- predict(model, data)
eval_model(prediction, data)
##
## Confusion matrix (absolute):
##           Actual
## Prediction divorced married Sum
##   divorced       81       0  81
##   married         3      86  89
##   Sum            84      86 170
##
## Confusion matrix (relative):
##           Actual
## Prediction divorced married  Sum
##   divorced     0.48    0.00 0.48
##   married      0.02    0.51 0.52
##   Sum          0.49    0.51 1.00
##
## Accuracy:
## 0.9824 (167/170)
##
## Error rate:
## 0.0176 (3/170)
##
## Error rate reduction (vs. base rate):
## 0.9643 (p-value < 2.2e-16)


So, the best predictor is the rating on statement 18. The question you should ask your partner before marrying him or her is, therefore, the following:

What is a good marriage for you?

A simple question but one that might reveal some major differences between your conceptions of what a good marriage is. In that case, the outlook is not good. The accuracy of the prediction is a whopping 98.24%! By the way, this is even slightly better than the 98.23% given in the paper (which is achieved by an artificial neural network).

Had I only known this 20 years ago…

Happy Valentine’s Day and stay tuned as we will take a little break and hopefully see you back on March 17’th!