[This article was first published on Struggling Through Problems, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
On StackOverflow, to posters with more experience ask their questions in fewer words?
No. There’s no visible difference:
Chars of non-code:
Chars of code:
The data comes from the super-handy StackOverflow API, which was retrieved using wget and then parsed using rjson and XML.
First read in and parse the JSON:
so.R
1 library(rjson)
2 library(XML)
3 library(ggplot2)
4 library(plyr)
5
6 read.qs = function(path) {
7 fromJSON(file = path)$questions
8 }
9
10 questions = do.call(c,
11 lapply(c('page-1.json', 'page-2.json', 'page-3.json'),
12 read.qs
13 )
14 )
Then for each one parse the HTML and look for <pre> and <p> tags:
so.R (cont)
15 Table = ldply(questions, function(q) {
16 body.text = sprintf('<body>%s</body>', q$body)
17 body = htmlParse(body.text)
18
19 description = tot.length.of(body, '//p//text()')
20 code = tot.length.of(body, '//pre//text()')
21
22 rep = q$owner$reputation
23
24 data.frame(
25 rep, description, code
26 )
27 })
(where tot.length.of is:
so.R (cont)
28 tot.length.of = function(doc, query) {
29 parts = xpathApply(doc, query, xmlValue)
30 text = paste(parts, collapse='')
31 nchar(text)
32 }
)
Then make the plots:
so.R (cont)
33 png('description.png')
34 print(ggplot(data=Table)
35 + geom_point(aes(rep, description))
36 + scale_x_log10()
37 + scale_y_log10()
38 + xlab('Rep')
39 + ylab('Verbosity')
40 )
41 dev.off()
42
43 png('code.png')
44 print(ggplot(data=Table)
45 + geom_point(aes(rep, code))
46 + scale_x_log10()
47 + scale_y_log10()
48 + xlab('Rep')
49 + ylab('Verbosity')
50 )
51 dev.off()
$ Rscript so.R >/dev/null 2>&1
To leave a comment for the author, please follow the link and comment on their blog: Struggling Through Problems.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
