# qdap 1.3.1 Release: Demoing Dispersion Plots, Sentiment Analysis, Easy Hash Lookups, Boolean Searches and More…

March 14, 2014
By

(This article was first published on TRinker's R Blog » R, and kindly contributed to R-bloggers)

We’re very pleased to announce the release of qdap 1.3.1

This is the latest installment of the qdap package available at CRAN. Several important updates have occurred since the 1.1.0 release, most notable the addition of two vignettes and some generic view methods.

The new vignettes include:

The former is a detailed HTML based guide over viewing the intended use of qdap functions.  The second vignette is an explanation of how to move between qdap and tm package forms as qdap moves to be more compatible with this seminal R text mining package.

To install use:

install.packages(“qdap”)

Some of the changes in versions 1.2.0-1.3.1 include:

Generic Methods

• scores generic method added to view scores from select qdap objects.
• counts generic method added to view counts from select qdap objects.
• proportions generic method added to view proportions from select qdap objects.
• preprocessed generic method added to view preprocessed data from select qdap objects.

These methods allow the user to grab particular parts of qdap objects in a consistent fashion.  The majority of these methods also pick up a corresponding plot method as well.  This adds to the qdap philosophy that data results should be easy to grab and easy to visualize. For instance:

(x <- question_type(DATA.SPLIT$state, DATA.SPLIT$person))

## methods
scores(x)
plot(scores(x))
counts(x)
plot(counts(x))
proportions(x)
plot(proportions(x))
truncdf(preprocessed(x), 15)
plot(preprocessed(x))

## Demoing Some of the New Features

We’d like to take the time to highlight some of the development that has happened in qdap in the past several months:

### Dispersion Plots

 wrds <- freq_terms(pres_debates2012$dialogue, stopwords = Top200Words) ## Add leading/trailing spaces if desired wrds2 <- spaste(wrds) ## Use ~~ to maintain spaces wrds2 <- c(" governor~~romney ", wrds2[-c(3, 12)]) ## Plot with(pres_debates2012 , dispersion_plot(dialogue, wrds2, rm.vars = time, color="black", bg.color="white"))   with(rajSPLIT, dispersion_plot(dialogue, c("love", "night"), bg.color = "black", grouping.var = list(fam.aff, sex), color = "yellow", total.color = "white", horiz.color="grey20"))  ### Word Correlation  library(tm) data("crude") oil_cor1 <- apply_as_df(crude, word_cor, word = "oil", r=.7) plot(oil_cor1)   oil_cor2 <- apply_as_df(crude, word_cor, word = qcv(texas, oil, money), r=.7) plot(oil_cor2, ncol=2)  ### Easy Hash Table #### A Small Example  lookup(1:5, data.frame(1:4, 11:14)) ## [1] 11 12 13 14 NA ## Leave alone elements w/o a match lookup(1:5, data.frame(1:4, 11:14), missing = NULL) ## [1] 11 12 13 14 5 #### Scaled Up 3 Million Records key <- data.frame(x=1:2, y=c("A", "B")) ## x y ## 1 1 A ## 2 2 B big.vec <- sample(1:2, 3000000, T) out <- lookup(big.vec, key) out[1:20] ## On my system 3 million records in: ## Time difference of 24.5534 secs #### Binary Operator Version  codes <- list(A=c(1, 2, 4), B = c(3, 5), C = 7, D = c(6, 8:10)) 1:12 %l% codes ## [1] "A" "A" "B" "A" "B" "D" "C" "D" "D" "D" NA NA 1:12 %l+% codes ## [1] "A" "A" "B" "A" "B" "D" "C" "D" "D" "D" "11" "12"  ### Simple-Quick Boolean Searches We’ll be demoing this capability on the qdap data set DATA:  ## person state ## 1 sam Computer is fun. Not too fun. ## 2 greg No it's not, it's dumb. ## 3 teacher What should we do? ## 4 sam You liar, it stinks! ## 5 greg I am telling the truth! ## 6 sally How can we be certain? ## 7 greg There is no way. ## 8 sam I distrust you. ## 9 sally What are you talking about? ## 10 researcher Shall we move on? Good then. ## 11 greg I'm hungry. Let's eat. You already?  First a brief explanation from the documentation: terms – A character string(s) to search for. The terms are arranged in a single string with AND (use AND or && to connect terms together) and OR (use OR or || to allow for searches of either set of terms. Spaces may be used to control what is searched for. For example using ” I ” on c(“I’m”, “I want”, “in”) will result in FALSE TRUE FALSE whereas “I” will match all three (if case is ignored). Let’s see how it works. We’ll start with ” I ORliar&&stinks”. This will find sentences that contain ” I “ or that contain “liar” and the word “stinks”.  boolean_search(DATA$state, " I ORliar&&stinks")

## The following elements meet the criteria:
## [1] 4 5 8

boolean_search(DATA$state, " I &&.", values=TRUE) ## The following elements meet the criteria: ## [1] "I distrust you." boolean_search(DATA$state, " I OR.", values=TRUE)

## The following elements meet the criteria:
## [1] "Computer is fun. Not too fun."
## [2] "No it's not, it's dumb."
## [3] "I am telling the truth!"
## [4] "There is no way."
## [5] "I distrust you."
## [6] "Shall we move on?  Good then."
## [7] "I'm hungry.  Let's eat.  You already?"

boolean_search(DATA$state, " I &&.") ## The following elements meet the criteria: ## [1] 8  #### Exclusion as Well boolean_search(DATA$state, " I ||.", values=TRUE)

## The following elements meet the criteria:
## [1] "Computer is fun. Not too fun."
## [2] "No it's not, it's dumb."
## [3] "I am telling the truth!"
## [4] "There is no way."
## [5] "I distrust you."
## [6] "Shall we move on?  Good then."
## [7] "I'm hungry.  Let's eat.  You already?"

boolean_search(DATA$state, " I ||.", exclude = c("way", "truth"), values=TRUE) ## The following elements meet the criteria: ## [1] "Computer is fun. Not too fun." ## [2] "No it's not, it's dumb." ## [3] "I distrust you." ## [4] "Shall we move on? Good then." ## [5] "I'm hungry. Let's eat. You already?"  #### Binary Operator Version  dat <- data.frame(x = c("Doggy", "Hello", "Hi Dog", "Zebra"), y = 1:4) ## x y ## 1 Doggy 1 ## 2 Hello 2 ## 3 Hi Dog 3 ## 4 Zebra 4 z <- data.frame(z =c("Hello", "Dog")) ## z ## 1 Hello ## 2 Dog dat[dat$x %bs% paste(z$z, collapse = "OR"), ]  ### Polarity (Sentiment) The polarity function is an extension of the work originally done by Jeffrey Breen with some accompnaying plotting methods. For more information see the Introduction to qdap Vignette.  poldat2 <- with(mraja1spl, polarity(dialogue, list(sex, fam.aff, died))) colsplit2df(scores(poldat2))[, 1:7]   sex fam.aff died total.sentences total.words ave.polarity sd.polarity 1 f cap FALSE 158 1810 0.076422846 0.2620359 2 f cap TRUE 24 221 0.042477906 0.2087159 3 f mont TRUE 4 29 0.079056942 0.3979112 4 m cap FALSE 73 717 0.026496626 0.2558656 5 m cap TRUE 17 185 -0.159815603 0.3133931 6 m escal FALSE 9 195 -0.152764808 0.3131176 7 m escal TRUE 27 646 -0.069421082 0.2556493 8 m mont FALSE 70 952 -0.043809741 0.3837170 9 m mont TRUE 114 1273 -0.003653114 0.4090405 10 m none FALSE 7 78 0.062243180 0.1067989 11 none none FALSE 5 18 -0.281649658 0.4387579 #### The Accompanying Plotting Methods plot(poldat2)  plot(scores(poldat2))  ### Question Type  dat <- c("Kate's got no appetite doesn't she?", "Wanna tell Daddy what you did today?", "You helped getting out a book?", "umm hum?", "Do you know what it is?", "What do you want?", "Who's there?", "Whose?", "Why do you want it?", "Want some?", "Where did it go?", "Was it fun?") left_just(preprocessed(question_type(dat))[, c(2, 6)])   raw.text q.type 1 Kate's got no appetite doesn't she? doesnt 2 Wanna tell Daddy what you did today? what 3 You helped getting out a book? implied_do/does/did 4 Umm hum? unknown 5 Do you know what it is? do 6 What do you want? what 7 Who's there? who 8 Whose? whose 9 Why do you want it? why 10 Want some? unknown 11 Where did it go? where 12 Was it fun? was   x <- question_type(DATA.SPLIT$state, DATA.SPLIT\$person)

scores(x)
      person tot.quest    what    how   shall implied_do/does/did
1       greg         1       0      0       0             1(100%)
2 researcher         1       0      0 1(100%)                   0
3      sally         2  1(50%) 1(50%)       0                   0
4    teacher         1 1(100%)      0       0                   0
5        sam         0       0      0       0                   0
plot(scores(x), high="orange")

These are a few of the more recent developments in qdap. We would encourage readers to dig into the new vignettes and start using qdap for various Natural Language Processing tasks. If you have suggestions or find a bug you are welcome to:

• submit suggestions and bug-reports at: https://github.com/trinker/qdap/issues
• send a pull request on: https://github.com/trinker/qdap

• For a complete list of changes see qdap’s NEWS.md