Blog Archives

topic models for synchronic & diachronic corpus exploration

February 25, 2018
By
topic models for synchronic & diachronic corpus exploration

Synchronic application Diachronic application Topic clusters quick summary References This post outlines a fairly simple workflow from annotated corpus to topic model, with a focus on the exploratory utility of topic models. We first consider some text structures relevant to topic modeling in R, and then demonstrate some approaches to visualizing model results, including variation in topic prevalence over time for a diachronic corpus....

Read more »

locating linguistic diversity in the usa

February 9, 2018
By
locating linguistic diversity in the usa

Language data and the census Languages in the US Linguistic diversity as entropy Locating linguistic diversity FIN This post investigates linguistic diversity in the United States utilizing data made available by the US Census. We consider census language classifications, and introduce a simple methodology for quantifying linguistic diversity using entropy scores. The post is largely exploratory, and a bit of an excuse to play with...

Read more »

a simple framework for corpus-based keyphrase extraction

January 29, 2018
By
a simple framework for corpus-based keyphrase extraction

Defining potential keyphrases Corpus search for potential keyphrases Selecting descriptive keyphrases with the tf-idf statisitic Post script - State of the Union Addresses This post outlines a simple framework for identifying and extracting keyphrases from component texts of a corpus. We first consider some functional characteristics of descriptive keyphrases, as well as some more formal (ie, regex-based) definitions. We then demonstrate the use of...

Read more »

corpus query and grammatical constructions

January 9, 2018
By
corpus query and grammatical constructions

Search syntax Corpus search Search summary KWIC & BOW Summary and shiny This post demonstrates the use of a simple collection of functions from my R-package corpuslingr. Functions streamline two sets of corpus linguistics tasks: annotated corpus search of grammatical constructions and complex lexical patterns in context, and detailed summary and aggregation of corpus search results. While still in development, the package should be useful to linguists...

Read more »

a census-based approach to spanish language maintenance

December 29, 2017
By
a census-based approach to spanish language maintenance

Census nuts/bolts New Mexico & the US Some macro-exploration A simple model Some final notes References In this post we investigate Spanish language maintenance within Hispanic communities in the US utilizing data from the US Census. Spanish language maintenance refers to the rate at which Hispanics within a given community speak Spanish. Here, we consider a census-based methodology presented in Bills (1989) and Bills, Chávez, and...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)