Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I recently was reading the book “Functional Art” and came across the work of Stefanie Posavec. Her Sentence Drawings (click here to see and click here to learn) caught my attention. Here is a ggplot2 rendition: From what I understand about this visualization technique it’s meant to show the aesthetic and organic beauty of language (click here for interview with artist). I was captivated and thus I began the journey of using ggplot2 to recreate a Sentence Drawing.

# Getting Started

I decided to use data sets from the qdap package. This requires downloading the development version which in turn requires downloading a few development version dependencies. Be patient, this may take a minute or two to install.

### Installing Packages from GitHub

# install.packages("devtools")
library(devtools)
install_github(c("slidify", "slidifyLibraries"), "ramnathv", ref = "dev")
install_github("knitcitations", "cboettig")
install_github(c("reports", "qdapDictionaries", "qdap"), "trinker")
install_github("ggthemes", "jrnold")
install.packages("scales")

invisible(lapply(c("qdap", "ggplot2", "ggthemes", "scales", "grid"),
require, character.only = TRUE))


# Right Turn Function

Stefanie Posavec describes the process for creating the Sentence Drawing by making a right turn at the end of each sentence. I went straight to work creating an inefficient solution to making right hand turns. Realizing the inefficiency, I asked for help and utilized this response from flodel. Here is the solution as a function that you’ll need to run.

turn_it <- function(dataframe, len.col, turn = -pi/2) {

dat <- dataframe
dat[, "turn"] <- rep(turn, nrow(dataframe))
dat <- within(dat, {
facing <- pi/2 + cumsum(turn)
move <- dat[, len.col] * exp(1i * facing)
position <- cumsum(move)
x2 <- Re(position)
y2 <- Im(position)
})

dat[, c("x1", "y1", "x2", "y2")] <-
lapply(dat[, c("x1", "y1", "x2", "y2")], round, digits=0)
data.frame(dataframe, dat[, c("x1", "y1", "x2", "y2")])
}


### Plot It

Here are the turns represented visually.

n <- 15
set.seed(11)
(dat <- data.frame(id = paste("X", 1:n, sep="."),
lens=sample(1:25, n, replace=TRUE)))

##      id lens
## 1   X.1    7
## 2   X.2    1
## 3   X.3   13
## 4   X.4    1
## 5   X.5    2
## 6   X.6   24
## 7   X.7    3
## 8   X.8    8
## 9   X.9   23
## 10 X.10    4
## 11 X.11    5
## 12 X.12   12
## 13 X.13   23
## 14 X.14   22
## 15 X.15   19

ggplot(turn_it(dat, "lens"), aes(x = x1, y = y1, xend = x2, yend = y2)) +
geom_segment(aes(color=id), size=3,lineend = "round") +
ylim(c(-40, 10)) + xlim(c(-20, 40)) # Apply to Romeo and Juliet

Now that I had this accomplished I set to work with Romeo and Juliet.

### Setting Up a Data Set

dat2b <- rajSPLIT
dat2b$wc <- wc(rajSPLIT$dialogue)
dat2b <- dat2b[!is.na(dat2b[, "wc"]), ]

## Reassign names to family affiliation
dat2b[, "fam.aff"] <- factor(lookup(as.character(dat2b[, "fam.aff"]),
levels(dat2b[, "fam.aff"])[1:3], qcv(Escalus, Capulet, Montague),
missing = NULL))

## Make dataframe with the beginning coordinates of each act
beg_act <- do.call(rbind, lapply(with(turn_it(dat2b, "wc"),
split(turn_it(dat2b, "wc"), act)), function(x) {
x[1, qcv(act, x1, y1, x2, y2)]
}))


### Romeo and Juliet Plotted

ggplot(turn_it(dat2b, "wc"), aes(x = x1, y = y1, xend = x2, yend = y2)) +
geom_segment(aes(color=fam.aff), lineend = "butt", size=1) +
#geom_point(x=0, y=0, size=5, shape="S") +
#geom_point(data=dat4b, aes(x=-106, y=-273), size=5, shape="E") +
geom_point(data=beg_act, aes(x = x1, y=y1), size=2.3,
colour = "grey25") +
geom_text(data=beg_act, aes(x = x1, y=y1, label = paste("Act", act)),
colour = "grey25", hjust = -.1, size=5, fontface = "bold") +
guides(colour = guide_legend(override.aes = list(alpha = 1))) +
theme_few() +
scale_colour_few(name="Family\nAffiliation") +
theme(axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title= element_blank(),
legend.position = c(.1, .85),
legend.title.align = .5) +
ggtitle("Romeo and Juliet Family\nAffiliation: Sentence Drawing") After this I wanted to try to fill by sentence level polarity using a newer polarity (sentiment) algorithm from qdap.

poldat <- polarity(dat2b[, "dialogue"])

ggplot(turn_it(poldat[["all"]], "wc"), aes(colour=polarity)) +
geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2),
lineend = "round", size=1) +
theme_few() +
theme(panel.background = element_rect(fill = "grey20"),
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title= element_blank(),
legend.direction = "horizontal",
legend.title = element_text(colour="white"),
legend.background = element_blank(),
legend.text = element_text(colour="white"),
legend.position = c(.80, .07))  +
mid = "white", high = muted("red"))  +
guides(colour = guide_colorbar(barwidth = 11, barheight = .75)) +
ggtitle("Sentence Polarity: Sentence Drawing") # Thoughts…

While I like the aesthetics and organic feel of Stefanie Posavec’s Sentence Drawings I can’t help but to ask what this is showing me; what does such a visual afford the audience? I concluded that it captures that language isn’t linear but recursive and intricately linked. Posavec describes the tight spirals as choppy and the extended ones as flowing and smooth. However, I believe there are better ways to capture this sentiment while still balancing the notion of organic recursivity with identifying structure.

Visual representations, like this turn of talk plot below, capture meaningful patterns in the data and allow for comparisons but present the data as linear, when it really is not.

out <- tot_plot(dat2b, "dialogue", grouping.var = "fam.aff",
facet.vars = "act", tot=FALSE, plot = FALSE)

out + theme(legend.position = "bottom") +
labs(fill="Family\nAffiliation") Again, there must be a balance between capturing the essence of language and understanding the structure. Perhaps using pre-attentive attributes in a meaningful way would be a start to allowing Posavec’s representation to be more useful in finding the narrative in the data. The right hand turn she uses is arbitrary. I ask, what if the turn were meaningful, towards a particular demographic variable. I also could see the benefit of the use of Yihui’s animation package to show the fluid nature of the conversation. I may return to this blog post but I invite others to attempt the challenge of showing something meaningful in the data, while capturing the controlled chaos of language.

*Blog post created using the reports package  