WebVTT caption transcription app

[This article was first published on R on Pablo Bernabeu, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This open-source, R-based web application allows the conversion of video captions (subtitles) from the Web Video Text Tracks (WebVTT) Format into plain texts. For this purpose, users upload a WebVTT file with the extension .vtt or .txt (examples available here and here). Automatically, metadata such as timestamps are removed, and the text is formatted into a paragraph. The result is displayed on the website, and can be downloaded as .docx and .txt documents. Overall, this application serves to improve the accessibility of video captions.

?  The web application can be launched here or here

The data is only available to the user, and is deleted when the website is closed.

Questions and suggestions can be submitted as issues or emailed to . The app can be extended via pull requests.

Developer: Pablo Bernabeu (Dept. Psychology, Lancaster University). Licence: Creative Commons Attribution 4.0 International.

Code details

The core of the application is in the index.Rmd script, which uses ‘regular expressions’ to process the VTT file. In turn, that script draws on another one to enable the download of .docx documents. Last, the latter script in turn uses a Word template.


To leave a comment for the author, please follow the link and comment on their blog: R on Pablo Bernabeu.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)