Ctextclassics, my First Package

May 16, 2018
By

(This article was first published on Posts on Anything Data, and kindly contributed to R-bloggers)

My latest update is a milestone! I have authored my first ever R package which is an API caller for ctext.org. Ctext hosts numerous pre-modern Chinese texts and my package makes them available to you. The scope is broad, but think philosophical works in Confucianism, Daoism, Legalism, military doctrines, history compilations, works in medicine, and many more.

The three main functions of ctextclassics are get_chapter("book", "chapter") ,get_chapters("book", chapters), get_books("book") and the internal dataframe book_list which shows the available texts. So perhaps try something like:

library(ctextclassics)
head(unique(book_list$book), n = 5)
## [1] "analects"        "art-of-war"      "bai-hu-tong"     "baopuzi"        
## [5] "book-of-changes"
knitr::kable(head(get_books("analects")))
book chapter word chapter_cn
analects xue-er 子曰:「學而時習之,不亦說乎?有朋自遠方來,不亦樂乎?人不知而不慍,不亦君子乎?」 學而
analects xue-er 有子曰:「其為人也孝弟,而好犯上者,鮮矣;不好犯上,而好作亂者,未之有也。君子務本,本立而道生。孝弟也者,其為仁之本與!」 學而
analects xue-er 子曰:「巧言令色,鮮矣仁!」 學而
analects xue-er 曾子曰:「吾日三省吾身:為人謀而不忠乎?與朋友交而不信乎?傳不習乎?」 學而
analects xue-er 子曰:「道千乘之國:敬事而信,節用而愛人,使民以時。」 學而
analects xue-er 子曰:「弟子入則孝,出則弟,謹而信,汎愛眾,而親仁。行有餘力,則以學文。」 學而

Just be careful, the API limit is around 60-ish. Which means you can get about 3 books on average before my download functions start spitting out NA values.

The API indexes its book and chapter names differently. Some are in English (e.g., “analects”) whereas others are written in Pinyin (e.g., “mengzi”). Some chapter titles use the word “the” whereas others don’t. So eventually I’ll consider the need to make the function calls more robust and help users avoid those inconsistencies. There’s a lot that can be improved here, be it adding authentication, a way to keep track of API call count, or anything else – but I’m looking forward to it!

So, what’s my ultimate goal? It is to use ctextclassics for text analysis on Chinese classic texts – Similar to how we see the amazing Tidytext and gutenbergr packages used! Quite ambitious, I know. At any rate, I’m enjoying reading this classical Chinese.

To cap off this post, you can use my package by typing devtools::install_github("Jjohn987/ctextclassics") and remember to check out the documentation of the functions for a better explanation.

If you want to contribute, please do so! You can comment here, fork my Github, or post an issue.

To leave a comment for the author, please follow the link and comment on their blog: Posts on Anything Data.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)