Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A few months ago I used R to investigate black hole word numbers in the English language. A friend suggested there are probably black hole word numbers in other languages. There are only three other languages that I have a nodding acquaintance of (spoken languages, not computer languages), and all three do have such black holes. Here is the result of my research. My R code for all four languages is at the end.

First, a review with English words. Every English word gets you to the same black hole number as you count the number of letters in the word and then successively count the number of letters in the resulting word number. That black hole is at four. Once you get to four, you are stuck and can’t get out. Here is an example.

The word hippopotomonstrosesquippedaliophobia (fear of long words) has 36 letters.
The word thirtysix has nine letters.
The word nine has four letters.
The word four has four letters.

Here are some more English words, with their word number length counting sequence. I found a long list of English words, so this list is truly a random sample. (For the other languages, I could not find a nice long list, so the words are not random but rather a convenience sample.)

miscognizable thirteen eight five four
harvestry nine four
geopolitist eleven six three five four
jessed six three five four
pardonee eight five four
whitfield nine four
ghazal six three five four
morphophonemically eighteen eight five four
calonectria eleven six three five four
conceptiveness fourteen eight five four

Every German word also gets you to the same black hole number: vier.

handschuh neun vier
flugzeug acht vier
staubsauger elf drei vier
waschmaschine dreizehn acht vier
haustürschlüssel sechszehn neun vier
lächeln sieben sechs funf vier
geutscher neun vier
danke funf vier
morgen sechs funf vier
tee drei vier
torschlusspanik funfzehn acht vier

In Hebrew, where there is the complication that letters are written from right to left, there are two black hole numbers: ארבע and שלש . Below, the rightmost word is the word whose letters are first counted, and the subsequent counting is from right to left.

פירת ארבע
אורתודוקסית אחדעשר שש שתים ארבע
קומוניסטית עשר שלש
ומועמדויות עשר שלש
עיתונות שבע שלש
ארוך ארבע
שלה שלש
כך שתים ארבע
לראות חמש שלש
להסתכל שש שתים ארבע

In Spanish there is a black hole at cinco. However, unlike the previous languages that had a black hole where you are stuck and can’t get out, Spanish also has some words where you oscillate back and forth between two numbers but never really fall into a hole. These two Spanish numbers are seis and cuatro.

montaña ocho cuatro seis cuatro seis cuatro seis cuatro seis cuatro
Iglesia siete cinco
computadora once cuatro seis cuatro seis cuatro seis cuatro seis cuatro
oficina siete cinco
preguntar nueve cinco
entender ocho cuatro seis cuatro seis cuatro seis cuatro seis cuatro
hermosa siete cinco
asombroso nueve cinco
perezoso ocho cuatro seis cuatro seis cuatro seis cuatro seis cuatro
somnoliento doce cuatro seis cuatro seis cuatro seis cuatro seis cuatro
saludable nueve cinco

This is reminiscent of some numerical algorithms that oscillate and never converge. For example, if f(x) = x3 -2*x + 2 and x0 = 1, which has a single root at approximately -1.769, Newton-Raphson approximations will oscillate between x = 0 and x = 1, and f(x) = 1 and f(x) = 2 and never find the root. You can see from the first graph that the oscillation occurs at the wrong section of the curve.

If you think about it, the trick to why these black holes exist is not too difficult, and the same trick works in these four languages. I’m sure there are other languages that have no such black hole.

Here is the R code I used:

####################################################
# Try hippopotomonstrosesquippedaliophobia (fear of long words) which has 36 letters.

library(english)
x <- "hippopotomonstrosesquippedaliophobia"
y <- -99     # Initialize y
while(y != “four”){
y <- nchar(x)
y <- as.character(english(y))     # Spell out an integer as a word
if (grepl(‘-‘, y, fixed = TRUE)) y <- gsub('-', '', y)     # delete hyphen
print(c(x,y))
x <- y
}

####################################################
# Try ten random English words

library(english)
library(wordcloud)
set.seed(123)
original <- sample(words\$V1, 10, replace = FALSE)
# original <- c(
“miscognizable”,”harvestry”,”geopolitist”,”jessed”,”pardonee”,”whitfield”,”ghazal”,”morphophonemically”,
“calonectria”,”conceptiveness”)
wordcloud(word=original, random.order = TRUE, colors=c(“red”,”blue”,”darkgreen”,”brown”,”black”,”red”,
“blue”,”darkgreen”,”navy”,”black”), ordered.colors=TRUE,, scale=c(3,7))
rm(words)     # free up memory
for (i in 1:10){
x <- original
y <- vector()
y[1] <- "dummy"     # Initialize y
for (j in 1:100){
c <- nchar(x[i])
c <- as.character(english(c))     # Spell out an integer as a word
if (grepl(‘-‘, c, fixed = TRUE)) y[j] <- gsub('-', '', c) else y[j] <- c     # delete hyphen
x[i] <- y[j]
if (y[j] == “four”) {
break
}
}
cat(c(original[i], “\t”, y), “\n”)
}

####################################################
# Try 10 Hebrew words

original <- c("פירת", "אורתודוקסית", "קומוניסטית", "ומועמדויות", "עיתונות", "ארוך", "שלה", "כך", "לראות", "להסתכל" )

numbs <-
c(“אחת”, “שתים”, “שלש”, “ארבע”, “חמש”, “שש”, “שבע”, “שמונה”, “תשע”, “עשר”,”אחד עשר”,”שתיים עשרה”,”שלוש עשרה”,”ארבעה עשר”,”חמש עשרה”,”שש עשרה”,”שבע עשרה”,”שמונה עשרה”,”תשע עשרה”,”עשרים”)
for (i in 1:10){
x <- original
y <- vector()
for (j in 1:10){
c <- nchar(x[i])
y[j] <- numbs[c]
if (grepl(‘ ‘, y[j], fixed = TRUE)) y[j] <- gsub(' ', '', y[j])     # delete space
x[i] <- y[j]
if (y[j] == “ארבע” | y[j] == “שלש”) {
break
}
}
cat(c(original[i], “\t”, y), “\n”)
}

####################################################
# Try 11 Spanish words; however, infinite oscillation without convergence at cuatro and seis

numbs <- c(
“uno”, “dos”, “tres”, “cuatro”, “cinco”, “seis”, “siete”, “ocho”,
“nueve”, “diez”, “once”, “doce”, “trece”, “catorce”, “quince”,
“dieciséis”, “diecisiete”, “dieciocho”, “diecinueve”, “veinte”)
original <- x
for (i in 1:11){
y <- vector()
for (j in 1:10){
c <- nchar(x[i])
y[j] <- numbs[c]
x[i] <- y[j]
if (y[j] == “cinco”) {
break
}
}
cat(c(original[i], “\t”, y), “\n”)
}

####################################################
# Try 11 German words

x <- c("handschuh","flugzeug","staubsauger","waschmaschine","haustürschlüssel","lächeln","geutscher", "danke", "morgen","tee","torschlusspanik")
numbs <- c(
“eins”,”zwei”,”drei”,”vier”,”funf”,”sechs”,”sieben”,”acht”,”neun”,”zehn”,
“elf”,”zwolf”,”dreizehn”,”vierzehn”,”funfzehn”,”sechszehn”,”siebzehn”,
“achtzehn”,”neunzehn”,” zwanzig”)
original <- x
for (i in 1:11){
y <- vector()
for (j in 1:10){
c <- nchar(x[i])
y[j] <- numbs[c]
x[i] <- y[j]
if (y[j] == “vier”) {
break
}
}
cat(c(original[i], “\t”, y), “\n”)
}

####################################################
# Newton-Raphson: x(n+1) = xn – f(xn) / f ‘ (xn)

# f(x) = x^3 -2*x + 2
# f ‘ (x) = 3*(x^2) – 2

par(mfrow = c(1, 2))

# quick plot to choose initial value
x<- seq(from=-5, to=5, .001)
y <- x^3 - 2*x + 2
plot(x,y, main=”f(x) = x^3 -2*x + 2″, xlab=”x”, ylab=”y”, col=”red”, ylim=c(-2,4), cex.main = 3)
axis(side = 1, font = 2, cex.axis = 2)
axis(side = 2, font = 2, cex.axis = 2)
abline(h=0)

# Newton-Raphson
x <- vector()
f <- vector()
x_new <- 1     # initial guess
for (n in 1:10){
x[n] <- x_new
f[n] <- (x[n])^3 - 2*x[n] + 2
fprime <- 3 * (x[n])^2 -2     # manual derivative calculation
x_new <- x[n] - f[n]/fprime
if ( (abs(x[n] – x_new)/x[n]) < .00005 ){break}
}

df <- data.frame(cbind(x,f))
df

plot(df\$x, df\$f, pch = 16, cex = 2, main=”Sequence of N-R points”, xlab=”x”, ylab=”y”, cex.main = 3)
for (i in 1:nrow(df)){
arrows(x0 = x[i], y0 = f[i], x1 = x[i+1], y1 = f[i+1], col=”blue”)
}
axis(side = 1, font = 2, cex.axis = 2)
axis(side = 2, font = 2, cex.axis = 2)
abline(h=0)

dev.off()     # reset par