stone flakes IV

June 29, 2014
By

[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post I want to try something new, a causal graphical model. The aim here is just as much to get myself a feel what these things do as to understand how the stone flakes data fit together.

Data

Data are stone flakes data which I analyzed previously. The first post was clustering, second linking to hominid type, third regression. Together these made for the bulk of a standard analysis. In this new analysis the same starting data is used.
r12 <- merge(r1,r2)

Packages

The main package used is pcalg (Methods for graphical models and causal inference). Even though it lives on cran, it requires RBGL (An interface to the BOOST graph library) which lives on Bioconductor. Plots are made via Rgraphvis (Provides plotting capabilities for R graph objects), Bioconductor again, which itself has the hard work done by graphviz, which, on my linux machine, is a few clicks to install.
library(‘pcalg’)
library(‘Rgraphviz’)

First analysis

I am just following the vignette here, to get some working code.
rx <- subset(r12,,names(r2)[-1])
rx <- rx[complete.cases(rx) & !(r12\$ID %in% c(‘ms’,’c’,’roe’,’sz’,’va’,’arn’)),]
suffStat <- list(C = cor(rx), n = nrow(rx))
pc.gmG <- pc(suffStat, indepTest = gaussCItest,
p = ncol(rx), alpha = 0.01)
png(‘graph1.png’)
plot(pc.gmG, main = “”)

personally I dislike this plot since you have to know which variable is which number. I don’t think that is acceptable for things one wants to share. Since I could not find documentation how to modify this via the plot statement, I took the ugly road of directly modifying an S4 object; pc.Gmc.
[email protected]@nodes <- names(rx)
names([email protected]@edgeL) <- names(rx)
png(‘graph2.png’)
plot(pc.gmG, main = “”)
dev.off()

This makes some sense looking at the variable names.
RTI (Relative-thickness index of the striking platform) is connected to WDI (Width-depth index of the striking platform). PSF (platform primery (yes/no, relative frequency)) is related to FSF (Platform facetted (yes/no, relative frequency)). PSF is also related to PROZD (Proportion of worked dorsal surface (continuous)) which then goes to ZDF1 (Dorsal surface totally worked (yes/no, relative frequency)). ZDF1 is also influenced by FLA (Flaking angle (the angle between the striking platform and the splitting surface)).

Second analysis

Much as like this analysis, it does not lead to a connection between flakes on one hand and age or group on the other hand. Since the algorithm assumes normal distributed variables, group is out of the question. Log(-age) seems to be closest to normal distributed.

rx <- subset(r12,,names(r2)[-1])
rx\$lmage <- log(-r12\$age)
rx <- rx[complete.cases(rx) & !(r12\$ID %in% c(‘ms’,’c’,’roe’,’sz’,’va’,’arn’)),]
suffStat <- list(C = cor(rx), n = nrow(rx))
pc.gmG <- pc(suffStat, indepTest = gaussCItest,
p = ncol(rx), alpha = 0.01)
[email protected]@nodes <- names(rx)
names([email protected]@edgeL) <- names(rx)
plot(pc.gmG, main = “”)

Adding age links the two parts, while keeping most of the previous graph unchanged. The causal link however, seems reversed, does age cause change in flakes or do changes in flakes cause age? Nevertheless, it does show a different picture than before. In linear regression FSF and LBI contributed, but there I had not removed the outliers. In this approach FSF features, but is in its turn driven by PSF. The other direct influence is ZDF1, which is now also driven by WDI.

Third analysis

It is probably pushing the limits of what is normal distributed, but there are two binairy variables, stone material (1=flint, 2=other) and site (1=gravel pit, 0=other).
rx <- subset(r12,,c(‘site’,’mat’,names(r2)[-1]))
rx\$lmage <- log(-r12\$age)
rx <- rx[complete.cases(rx) & !(r12\$ID %in% c(‘ms’,’c’,’roe’,’sz’,’va’,’arn’)),]
suffStat <- list(C = cor(rx), n = nrow(rx))
pc.gmG <- pc(suffStat, indepTest = gaussCItest,
p = ncol(rx), alpha = 0.01)
[email protected]@nodes <- names(rx)
names([email protected]@edgeL) <- names(rx)
plot(pc.gmG, main = “”)

This makes a link from material to RTI (relative thickness) and a connection site to FLA (flaking angle), see below.
# site (1=gravel pit, 0=other)
boxplot(FLA ~ c(‘other’,’gravel pit’)[site+1],
data=r12,
ylab=’FLA’,
xlab=’site’)

#stone material (1=flint, 2=other)
boxplot(RTI ~ c(‘flint’,’other’)[mat],
data=r12,
ylab=’RTI’,
xlab=’mat’)

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...