Tweaking Movie Subtitles with R

April 10, 2013
By

(This article was first published on theBioBucket*, and kindly contributed to R-bloggers)

I use R to fix subtitles that are not in sync with my movies. For the example below the subs were showing too early - so I added some time to each sequence in the srt file. For simplicity I used exactly 1 second in the below example.
You'll see that I use my function dl_from_dropbox(), on which I wrote a post previously, to get the example file!





setwd(tempdir()")
options(digits = 12)
options(digits.secs = 3)

### get subtitle example file:
dl_from_dropbox <- function(x, key) {
require(RCurl)
bin <- getBinaryURL(paste0("https://dl.dropboxusercontent.com/s/", key, "/", x),
ssl.verifypeer = FALSE)
con <- file(x, open = "wb")
writeBin(bin, con)
close(con)
message(noquote(paste(x, "read into", getwd())))
}

dl_from_dropbox("Game_of_Thrones_S3_E1_engl.srt", "wojo9k8v8cezs9g")
shell.exec("Game_of_Thrones_S3_E1_engl.srt") #I use the MS text-editor to view srt files

# https://www.dropbox.com/s/wojo9k8v8cezs9g/Game_of_Thrones_S3_E1_engl.srt
###

### tweak the file by changing the time - i.e., I add 1 sec to all sequences here:
t <- readLines("Game_of_Thrones_S3_E1_engl.srt")
tt <- unlist(strsplit(t, " --> ")) #split time start/end

x <- grep("\\d{2}:\\d{2}:\\d{2},\\d{3}", t) #ids of time data in t
y <- sort(c(x, x+1)) #ids of time data in tt

ttt <- gsub(",",".", tt[y]) #replace decimal comma

(a <- strptime(ttt, format="%H:%M:%OS", tz="GMT")) #convert to date/time
(b <- as.numeric(a)) #convert to number

c <- 1 #add 1 sec

(d <- as.POSIXct(as.numeric(b+c+1e-6), origin="1970-01-01", tz="GMT")) #convert back
(e <- format(d, "%H:%M:%OS")) #re-format
(f <- gsub("\\.", ",", e)) #replace decimal point

id_t1 <- seq(1, length(y), 2)
id_t2 <- seq(0, length(y), 2)

(g <- paste0(f[id_t1], " --> ", f[id_t2])) #bring into original form

t_new <- t
t_new[x] <- g #insert new sequences into original data
print(t_new)

### save to new file:
write(t_new, "Game_of_Thrones_S3_E1_engl_new.srt")
shell.exec("Game_of_Thrones_S3_E1_engl_new.srt") #I use the MS text-editor to view srt files

To leave a comment for the author, please follow the link and comment on his blog: theBioBucket*.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.