May 3, 2017 | Julia Silge

A couple of weeks ago, I saw on Dirk Eddelbuettel's blog that R 3.4.0 was going to include a function for obtaining information about packages currently on CRAN, including basically everything in DESCRIPTION files. When R 3.4.0 was released, this was...

Gender Roles with Text Mining and N-grams

April 14, 2017 | Julia Silge

Today is the one year anniversary of the janeaustenr package's appearance on CRAN, its cranniversary, if you will. I think it's time for more Jane Austen here on my blog. via GIPHY I saw this paper by Matthew Jockers and Gabi Kirilloff a number of months ago and ...

How Do You Discover R Packages?

March 19, 2017 | Julia Silge

Like I mentioned in my last blog post, I am contributing to a session at userR 2017 this coming July that will focus on discovering and learning about R packages. This is an increasingly important issue for R users as we all decide which of the 10,000+...

Scraping CRAN with rvest

March 5, 2017 | Julia Silge

I am one of the organizers for a session at userR 2017 this coming July that will focus on discovering and learning about R packages. How do R users find packages that meet their needs? Can we make this process easier? As somebody who is relatively new...

Women in the 2016 Stack Overflow Survey

January 22, 2017 | Julia Silge

Note: Cross-posted with the Stack Overflow blog The 2017 Stack Overflow Developer Survey opened last week, and we on the Data Team are looking forward to analyzing the survey results to better understand our developer community. I am particularly inte...

Text Mining in R: A Tidy Approach

January 13, 2017 | Julia Silge

I spoke on approaching text mining tasks using tidy data principles at rstudio::conf yesterday. I was so happy to have the opportunity to speak and the conference has been a great experience. If you want to catch up on what has been going on at rstudio::conf, Karl Broman ...

Reddit Responds to the Election

December 5, 2016 | Julia Silge

It's been about a month since the U.S. presidential election, with Donald Trump's victory over Hillary Clinton coming as a surprise to most. Reddit user Jason Baumgartner collected and published every submission and comment posted to Reddit on the ...

Measuring Gobbledygook

November 24, 2016 | Julia Silge

In learning more about text mining over the past several months, one aspect of text that I've been interested in is readability. A text's readability measures how hard or easy it is for a reader to read and understand what a text is saying; it depends on how ...

Mapping Election Results in Utah

November 10, 2016 | Julia Silge

My adopted home state of Utah has been a weird place this election cycle. For the unfamiliar, Utah is extremely conservative when it comes to politics; it is one of the reddest of the red states and has backed the Republican candidate for president for...

Tidy Text Mining with R

October 27, 2016 | Julia Silge

I am so pleased to announce that tidytext 0.1.2 is now available on CRAN. This release of tidytext, a package for text mining using tidy data principles by Dave Robinson and me, includes some bug fixes and performance improvements, as well as some new ...

Singing the Bayesian Beginner Blues

September 27, 2016 | Julia Silge

Earlier this week, I published a post about song lyrics and how different U.S. states are mentioned at different rates, and at different rates relative to their populations. That was a very fun post to work on, but you can tell from that paragraph near the end that I ...

Song Lyrics Across the United States

September 25, 2016 | Julia Silge

The inspiration for this post is a joint venture by both me and my husband, and its genesis lies more than 15 years in our past. One of the recurring conversations we have in our relationship (all long-term relationships have these, right?!) is about song lyrics and place names. I think ...

We Are Not Very Evenly Distributed

August 18, 2016 | Julia Silge

I saw this tweet making the rounds this past week. Half of all Americans live in the red counties, half live in the orange counties— Conrad Hackett (@conradhackett) August 8, 2016 Interesting! I saw people using this map to make the argument that the Electoral College was super ...

Something Strange in the Neighborhood

August 4, 2016 | Julia Silge

Today I was so pleased to see a new data package hit CRAN, and how wonderful to see such accomplished women writing R packages. What a great new data package on CRAN! And always great to see more women authors in #rstats

Return of the NEISS Data

July 21, 2016 | Julia Silge

Almost six months ago (!) I wrote a blog post about the NEISS data set, a sample of accidents reported to emergency rooms in the U.S. that are related to consumer products. Ever since I did that exploration, I have been wanting to ask a bit of a different question ...

Fatal Police Shootings Across the U.S.

July 6, 2016 | Julia Silge

I have been full of grief and sadness and some anger in the wake of yet more videos going viral in the past couple days showing black men being killed by police officers. I am not an expert on what it means to be a person of color in the ...

A Beginner’s Guide to Travis-CI for R

May 19, 2016 | Julia Silge

Have you seen all those attractive green badges on other people's R packages and thought, "I want a lovely green badge!" Always a nice feeling when Travis manages to actually build. #runconf16— Julia Silge (@juliasilge) April 1, 2016 OF COURSE YOU DO. Well, let's give ...
[Read more...]

The Life-Changing Magic of Tidying Text

April 28, 2016 | Julia Silge

When I went to the rOpenSci unconference about a month ago, I started work with Dave Robinson on a package for text mining using tidy data principles. What is this tidy data you keep hearing so much about? As described by Hadley Wickham, tidy data has a specific structure: each ...
