Blog Archives

Curate language data (1/2): organizing meta-data

November 30, 2017
By
Curate language data (1/2): organizing meta-data

When working with raw data, whether is comes from a corpus repository, web download, or a web scrape, it is important to recognize that the attributes that we want to organize can be stored or represented in various formats. The three I will cover here have to do with meta-data that is: (1) contained in the file name of...

Read more »

Acquiring data for language research (3/3): web scraping

November 1, 2017
By
Acquiring data for language research (3/3): web scraping

Web scraping There are many resources available through direct downloads from repositories and individual sites and R package interfaces to web resources with APIs, but these resources are relatively limited to the amount of public-facing textual data recorded on the web. In the case that you want to acquire data from webpages R can be used to access the web...

Read more »

Acquiring data for language research (2/3): package interfaces

October 22, 2017
By

Package interfaces A convenient alternative method for acquiring data in R is through package interfaces to web services. These interfaces are built using R code to make connections with resources on the web through Automatic Programming Interfaces (APIs). Websites such as Project Gutenberg, Twitter, Facebook, and many others provide APIs to allow access to their data under certain conditions, some...

Read more »

Acquiring data for language research (1/3): direct downloads

October 19, 2017
By

There are three main ways to acquire corpus data using R that I will introduce you to: direct download, package interfaces, and web scraping. In this post we will start by directly downloading a corpus as it is the most straightforward process for the novice R programmer and incurs the least number of steps. Along the way I will...

Read more »

Data for language research -types and sources

October 3, 2017
By

In this Recipe you will learn about the types of data available for language research and where to find data. The goal, then, is to introduce you to the landscape of language data available and provide a general overview of the characteristics of l...

Read more »

Introduction to statistical thinking

September 14, 2017
By
Introduction to statistical thinking

Before we begin working on the specifics of our data project, it is important to have a clear understanding of some of the basic concepts that need to be in place to guide our work. In this post I will cover some of these topics including the impor...

Read more »

Project management for scalable data analysis

August 30, 2017
By
Project management for scalable data analysis

Project management This post can really be seen as an extension of the last post Getting started with R and RStudio in that we will be getting to know some more advanced, but indispensable features of RStudio. These features, in combination with some organizational and programming strategies, will enable us to conduct efficient data analysis and set the stage for...

Read more »

Getting started with R and RStudio

August 13, 2017
By
Getting started with R and RStudio

Why R? The R programming language is free software developed with an eye towards statistical computing and data visualization that has has taken off in popularity over the last decade and is now finds itself among the most used programming languages, in general and is often the go-to language for data science. So what’s all the fuss about? Among the things...

Read more »

Introducing the Recipe series

August 2, 2017
By

The Recipe series: an overview My goal in this series is to explore the ‘why’ and the ‘how’ of doing quantitative language research. The content of this series will, in large part, overlap with resources available on doing Data Science, ge...

Read more »

Testing features in `blogdown`

July 23, 2017
By
Testing features in `blogdown`

This is a post to test the functionality of blogdown. We are going to look at how various types of outputs are rendered and do some tweaking to get things to work right. The hope is to get an idea of what the best practices for creating posts and pages is when working with this software. Tables Here’s some more code...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)