Outer Product of Character Vectors in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
What follows is like a kata to strengthen your R fundamentals.
The lovely stats in the wild recently posted some hott data analysis of Olympians’ ages and sexes. Because I’m annoyingly picky about graphics, I asked for his code so I could tweak the graphics according to my own perfidious norms. Stats in the wild posted his scraper of sports-reference.com — I’m sure you can find some more interesting uses for it — and asked for (polite) suggestions for improvement.
One potential place for improvement in stats in the wild’s code could answer two questions for R learners more generally so I’m sharing the code block.
alphabet<-c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z")
for (i.one in 1:26){
for (i.two in 1:26){
letters<-paste(alphabet[i.one],alphabet[i.two],sep="")
}
}
The desired goal is to get pairs of letters like
[1] aa ba ca da ea fa ga ha ia ja ka la ma na oa [16] pa qa ra sa ta ua va wa xa ya za ab bb cb db [31] eb fb gb hb ib jb kb lb mb nb ob pb qb rb sb [46] tb ub vb wb xb yb zb ac bc cc dc ec fc gc hc [61] ic jc kc lc mc nc oc pc qc rc sc tc uc vc wc [76] xc yc zc ad bd cd dd ed fd gd hd id jd kd ld [91] md nd od pd qd rd sd td ud vd wd xd yd zd ae [106] be ce de ee fe ge he ie je ke le me ne oe pe [121] qe re se te ue ve we xe ye ze af bf cf df ef [136] ff gf hf if jf kf lf mf nf of pf qf rf sf tf [151] uf vf wf xf yf zf ag bg cg dg eg fg gg hg ig [166] jg kg lg mg ng og pg qg rg sg tg ug vg wg xg [181] yg zg ah bh ch dh eh fh gh hh ih jh kh lh mh [196] nh oh ph qh rh sh th uh vh wh xh yh zh ai bi [211] ci di ei fi gi hi ii ji ki li mi ni oi pi qi [226] ri si ti ui vi wi xi yi zi aj bj cj dj ej fj [241] gj hj ij jj kj lj mj nj oj pj qj rj sj tj uj [256] vj wj xj yj zj ak bk ck dk ek fk gk hk ik jk [271] kk lk mk nk ok pk qk rk sk tk uk vk wk xk yk [286] zk al bl cl dl el fl gl hl il jl kl ll ml nl [301] ol pl ql rl sl tl ul vl wl xl yl zl am bm cm [316] dm em fm gm hm im jm km lm mm nm om pm qm rm [331] sm tm um vm wm xm ym zm an bn cn dn en fn gn [346] hn in jn kn ln mn nn on pn qn rn sn tn un vn [361] wn xn yn zn ao bo co do eo fo go ho io jo ko [376] lo mo no oo po qo ro so to uo vo wo xo yo zo [391] ap bp cp dp ep fp gp hp ip jp kp lp mp np op [406] pp qp rp sp tp up vp wp xp yp zp aq bq cq dq [421] eq fq gq hq iq jq kq lq mq nq oq pq qq rq sq [436] tq uq vq wq xq yq zq ar br cr dr er fr gr hr [451] ir jr kr lr mr nr or pr qr rr sr tr ur vr wr [466] xr yr zr as bs cs ds es fs gs hs is js ks ls [481] ms ns os ps qs rs ss ts us vs ws xs ys zs at [496] bt ct dt et ft gt ht it jt kt lt mt nt ot pt [511] qt rt st tt ut vt wt xt yt zt au bu cu du eu [526] fu gu hu iu ju ku lu mu nu ou pu qu ru su tu [541] uu vu wu xu yu zu av bv cv dv ev fv gv hv iv [556] jv kv lv mv nv ov pv qv rv sv tv uv vv wv xv [571] yv zv aw bw cw dw ew fw gw hw iw jw kw lw mw [586] nw ow pw qw rw sw tw uw vw ww xw yw zw ax bx [601] cx dx ex fx gx hx ix jx kx lx mx nx ox px qx [616] rx sx tx ux vx wx xx yx zx ay by cy dy ey fy [631] gy hy iy jy ky ly my ny oy py qy ry sy ty uy [646] vy wy xy yy zy az bz cz dz ez fz gz hz iz jz [661] kz lz mz nz oz pz qz rz sz tz uz vz wz xz yz
which seems like a simple request. But how to do this idiomatically in R?
It’s quite often that you want to do for (i in 1:222) { for (j in 1:333) { for (k in 1:444) { stuff }}}.
Also nice to know that R has already provided access to “the 13th letter in the alphabet” with letters[13], so it’s unnecessary to redefine alphabet every time. (yay!)
As used in maths, the inner product of two [tensors | matrices | vectors] shrinks the output, and the outer product enlargens the output. In this case, “outer product” cycles through for (1:26) { for (1:26) { fill up the matrix with each entry [i,j] } } and does so idiomatically—that is, with vectorised loops. (Which is the goal in R, J, and other vectorised languages.)
Here’s my answer, and I’d like to hear your comments or better/also-good solutions.
c( outer( letters, letters,FUN=paste ,sep=""))
Broken down:
letters[1:26]= iterate through the alphabet.lettersalso does the whole alphabet.outer= outer product of two arrays, tryouter( 2:7, 3:5 )at theRprompt and then tryouter( 1:26, 1:26, FUN=paste ). (In mathsoutercontrasts with convolution =2:7 * 3:8inR— and with inner-producting, which is the dot-product, similar to determinant, equal to a projection, same as matrix multiplication, essentially the∑i•j•kessentially the sum-product of terms =2:7 %*% 3:8.)FUN=paste, sep=""The grand theory behind this is much more complicated than what it does.pasteconcatenates two strings, with a default separator of spacesep=" ".
The gnarly theory reason:FUNis an argument toouter, which defaults to multiplication (you see this inouter( 1:26, 1:26 ) )but can be set to concatenation since we’re working with characters rather than numbers. Then topass sep=""to paste — how to do that? You get a problem callingFUN=paste( sep="")because that’s incoherent to the computer. You could do an ugly workaround withFUN=function(x) paste(x, sep="")… but the makers ofRforesaw that you would often want to do things like this, so in addition toFUNthey madeARGScome afterFUN, only needing the distinguishment of a comma, andARGSpasses arguments toFUN, so you can writesep=""within outer, without having to make afunction(x)specifically to pass toFUN.
Wow, that was notfun.c= the natural output is 2-dimensional andcstreamlines that into one single vector.
Another way to do it is:
sapply( letters, FUN=function(x) paste(x, letters, sep="") )
which I think is uglier … perhaps because it uses letters twice or perhaps because I think outer-producting is what I’m really doing.
Thoughts? Can it be done even more idiomatically or naturally?
UPDATE: gappy3000 says expand.grid() scales better than outer().
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
