Creating a Gilmore Girls character network with R

[This article was first published on Shirin's playgRound, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

With the impending (and by many – including me – much awaited) Gilmore Girls Revival, I wanted to take a somewhat different look at our beloved characters from Stars Hollow.

I had recently read a few cool examples of how to create co-occurrence networks and wanted to combine this with an analysis similar to what David Robinson did for Love Actually.

Fortunately, there are people out there, who have invested their time (and a lot of it, I imagine) to write up transcripts for every Gilmore Girls episode. I chose www.crazy-internet-people.com’s list of transcripts.

Based on these transcripts I calculated the main character’s number of lines per episode and from there the co-occurrence matrix with other characters. This told me with which other major characters they appeared together in episodes and how often. This network nicely illustrates that Lorelai has the most lines of all characters (node size reflects total number of lines in all episodes a character had), followed by Rory. No surprise there but interestingly, the third place goes to Luke and fourth to Emily. Most interaction happened between Lorelai and Rory of course (edge width reflects number of co-occurences in episodes), but Lorelai and Luke, Rory and Luke, Emily and Rory and Lorelai and Sookie follow suit. What the network also shows is that Lorelai has major connections with most other characters – more so than Rory.

A cluster dendrogram shows us the character co-occurrence in a slightly different way: The further down in the dendrogram tree two nodes split, the more episodes these characters had in common. Again, no surprise here that Lorelai and Rory have the most closely connected nodes, closely followed by Luke. We can also see quite nicely that the couples Sookie and Jackson, Lane and Zack and Emily and Richard share a lot of episodes.

Of course, these numbers also reflect the total number of episodes that characters were in, so that there is an inherent bias for characters with short occurrences in many episodes being more strongly connected to e.g. Lorelai than characters with fewer but more important plots. It would have been interesting to calculate the co-occurrence per scene instead of episode but unfortunately, this information was not given in the transcripts (if someone knows transcripts that denote scene number, please contact me).

I also wanted to see in which episodes these 20 characters appeared. Of course, Lorelai and Rory appeared in every episode but for other characters, there are clear gaps.

And finally, I wanted to know how many lines per episode each of them spoke: This boxplot shows the median number of lines per episode for each character (middle line of the boxes), as well as the lower and upper quartiles (outer edges of the boxes) and the outlier episodes (dots).

There will be a part 2 next week, where I will explore the Gilmore Girls a bit more. No spoilers, but among other things, I’ll be looking at their coffee consumption through the data lens…

For a detailed description, plus R code for the plots see further below or find the R Markdown on Github.

If you don’t care about the show and have not (unlike me) watched every episode at least twice, maybe you’ll be interested in using my R code to recreate a similar character network for other TV shows, movies or books (and if you do, please share them with me)!


Obtaining all episode transcripts

The transcript URLs from www.crazy-internet-people.com have the following scheme: “http://www.crazy-internet-people.com/site/gilmoregirls/pages/, s#/, s#s/, §.html” (#: number of season, of which there are seven; §: running number of episode, from 1 to 153). Following this scheme, I looped over all seasons and episodes to read the lines for each HTML directly into R via their respective URLs.

Because the raw HTML looked a bit messy, I had to do some tidying of the text:

  • First, I grabbed only lines with a character name at the beginning, which indicates the character who is speaking (these were all in caps).
  • Then, I had to remove the remaining HTML tags/ descriptors.
  • After this, I had the transcript text remaining, which I could transform into a data frame
  • and add season, episode number and running episode number to each line of text.
  • And finally, I combined all transcripts into one object.
<span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">7</span><span class="p">){</span><span class="w">                                            </span><span class="c1"># there are 7 seasons
</span><span class="w">
  </span><span class="k">if</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">){</span><span class="w">                                             </span><span class="c1"># all seasons except the first have 22 episodes (the first has 21)
</span><span class="w">
    </span><span class="k">for</span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">21</span><span class="p">){</span><span class="w">

      </span><span class="n">cat</span><span class="p">(</span><span class="s2">"\nSeason"</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s2">", Episode"</span><span class="p">,</span><span class="w"> </span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="s2">"\n"</span><span class="p">)</span><span class="w">            </span><span class="c1"># to see the progress I am printing the season and episode number
</span><span class="w">
      </span><span class="n">thepage</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readLines</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="s2">"http://www.crazy-internet-people.com/site/gilmoregirls/pages/s"</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s2">"/s"</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s2">"s/"</span><span class="p">,</span><span class="w"> </span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="s2">".html"</span><span class="p">))</span><span class="w">

      </span><span class="n">thepage</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">thepage</span><span class="p">[</span><span class="n">grep</span><span class="p">(</span><span class="s2">"^[[:upper:]]+:"</span><span class="p">,</span><span class="w"> </span><span class="n">thepage</span><span class="p">)]</span><span class="w">  </span><span class="c1"># grabbing character lines only
</span><span class="w">      </span><span class="n">thepage</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"\t"</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">thepage</span><span class="p">)</span><span class="w">                  </span><span class="c1"># removing HTML tags
</span><span class="w">      </span><span class="n">thepage</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"<.*>"</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">thepage</span><span class="p">)</span><span class="w">                </span><span class="c1"># removing some more HTML tags
</span><span class="w">
      </span><span class="n">thepage</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">(</span><span class="n">thepage</span><span class="p">)</span><span class="w">
      </span><span class="n">thepage</span><span class="o">$</span><span class="n">season</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">i</span><span class="w">                                 </span><span class="c1"># add season number
</span><span class="w">      </span><span class="n">thepage</span><span class="o">$</span><span class="n">episode</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">paste</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="n">sep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"_"</span><span class="p">)</span><span class="w">           </span><span class="c1"># add episode number
</span><span class="w">      </span><span class="n">thepage</span><span class="o">$</span><span class="n">episode_running_nr</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">j</span><span class="w">                     </span><span class="c1"># add running epsiode number
</span><span class="w">
      </span><span class="k">if</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">j</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">){</span><span class="w">                                </span><span class="c1"># combine all transcripts into one object
</span><span class="w">        </span><span class="n">transcripts</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">thepage</span><span class="w">
      </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="n">transcripts</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbind</span><span class="p">(</span><span class="n">transcripts</span><span class="p">,</span><span class="w"> </span><span class="n">thepage</span><span class="p">)</span><span class="w">
      </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">

    </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">                                              </span><span class="c1"># repeat for seasons 2 to 7
</span><span class="w">
      </span><span class="k">for</span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">22</span><span class="p">){</span><span class="w">

        </span><span class="n">cat</span><span class="p">(</span><span class="s2">"\nSeason"</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s2">", Episode"</span><span class="p">,</span><span class="w"> </span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="s2">"\n"</span><span class="p">)</span><span class="w">

        </span><span class="k">if</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">2</span><span class="p">){</span><span class="w">                                       </span><span class="c1"># to get the running episode number, 
</span><span class="w">                                                          </span><span class="c1"># I have to add the number of episodes from previous seasons
</span><span class="w">          </span><span class="n">n</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">j</span><span class="m">+21</span><span class="w">
        </span><span class="p">}</span><span class="w">

        </span><span class="k">if</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">3</span><span class="p">){</span><span class="w">
          </span><span class="n">n</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">j</span><span class="m">+21+22</span><span class="w">
        </span><span class="p">}</span><span class="w">

        </span><span class="k">if</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">4</span><span class="p">){</span><span class="w">
          </span><span class="n">n</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">j</span><span class="m">+21+22+22</span><span class="w">
        </span><span class="p">}</span><span class="w">

        </span><span class="k">if</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">5</span><span class="p">){</span><span class="w">
          </span><span class="n">n</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">j</span><span class="m">+21+22+22+22</span><span class="w">
        </span><span class="p">}</span><span class="w">

        </span><span class="k">if</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">6</span><span class="p">){</span><span class="w">
          </span><span class="n">n</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">j</span><span class="m">+21+22+22+22+22</span><span class="w">
        </span><span class="p">}</span><span class="w">

        </span><span class="k">if</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">7</span><span class="p">){</span><span class="w">
          </span><span class="n">n</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">j</span><span class="m">+21+22+22+22+22+22</span><span class="w">
        </span><span class="p">}</span><span class="w">

        </span><span class="c1"># rinse and repeat
</span><span class="w">        
        </span><span class="n">thepage</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readLines</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="s2">"http://www.crazy-internet-people.com/site/gilmoregirls/pages/s"</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s2">"/s"</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s2">"s/"</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="s2">".html"</span><span class="p">))</span><span class="w">

        </span><span class="n">thepage</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">thepage</span><span class="p">[</span><span class="n">grep</span><span class="p">(</span><span class="s2">"^[[:upper:]]{2,}:"</span><span class="p">,</span><span class="w"> </span><span class="n">thepage</span><span class="p">)]</span><span class="w">
        </span><span class="n">thepage</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"\t"</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">thepage</span><span class="p">)</span><span class="w">
        </span><span class="n">thepage</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"<.*>"</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">thepage</span><span class="p">)</span><span class="w">

        </span><span class="n">thepage</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">(</span><span class="n">thepage</span><span class="p">)</span><span class="w">
        </span><span class="n">thepage</span><span class="o">$</span><span class="n">season</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">i</span><span class="w">
        </span><span class="n">thepage</span><span class="o">$</span><span class="n">episode</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">paste</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="n">sep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"_"</span><span class="p">)</span><span class="w">
        </span><span class="n">thepage</span><span class="o">$</span><span class="n">episode_running_nr</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">n</span><span class="w">

        </span><span class="k">if</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">j</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">){</span><span class="w">
          </span><span class="n">transcripts</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">thepage</span><span class="w">
        </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">
          </span><span class="n">transcripts</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbind</span><span class="p">(</span><span class="n">transcripts</span><span class="p">,</span><span class="w"> </span><span class="n">thepage</span><span class="p">)</span><span class="w">
        </span><span class="p">}</span><span class="w">
        </span><span class="p">}</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span>

Some of the lines were empty, so I removed those.

<span class="n">transcripts</span><span class="o">$</span><span class="n">thepage</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">as.character</span><span class="p">(</span><span class="n">transcripts</span><span class="o">$</span><span class="n">thepage</span><span class="p">)</span><span class="w">  </span><span class="c1"># convert to character vector
</span><span class="n">transcripts</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">transcripts</span><span class="p">[</span><span class="o">!</span><span class="n">transcripts</span><span class="o">$</span><span class="n">thepage</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="p">]</span><span class="w">  </span><span class="c1"># remove empty lines
</span>

This is how the data frame looked like at this point:

  • each row contains one line of dialogue with the character name in caps coming before their lines.
<span class="n">head</span><span class="p">(</span><span class="n">transcripts</span><span class="p">)</span><span class="w">
</span>
##                                          thepage season episode
## 1 LORELAI: Please, Luke. Please, please, please.      1     1_1
## 2 LUKE: How many cups have you had this morning?      1     1_1
## 3                                 LORELAI: None.      1     1_1
## 4                                  LUKE: Plus...      1     1_1
## 5            LORELAI: Five, but yours is better.      1     1_1
## 6                      LUKE: You have a problem.      1     1_1
##   episode_running_nr
## 1                  1
## 2                  1
## 3                  1
## 4                  1
## 5                  1
## 6                  1

To be able to count the characters, I separated the character names from their lines. This was done by splitting the first column after the first colon, using the tidyr package.

I also removed all leading and trailing whitespace from the character names, changed all letters in the character column to all caps and changed “ands” and apostrophes to the proper encoding. And I also had to manually correct quite a few misspelled character names.

<span class="c1"># separate first column after first colon
</span><span class="n">library</span><span class="p">(</span><span class="n">tidyr</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">separate</span><span class="p">(</span><span class="n">transcripts</span><span class="p">,</span><span class="w"> </span><span class="s2">"thepage"</span><span class="p">,</span><span class="w"> </span><span class="n">into</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"character"</span><span class="p">,</span><span class="w"> </span><span class="s2">"dialogue"</span><span class="p">),</span><span class="w"> </span><span class="n">sep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">":"</span><span class="p">,</span><span class="w"> </span><span class="n">extra</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"merge"</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"right"</span><span class="p">)</span><span class="w">

</span><span class="c1"># remove leading and trailing whitespace
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"^\\s+|\\s+$"</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">

</span><span class="c1"># convert all character names to all upper case
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">toupper</span><span class="p">(</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">

</span><span class="c1"># fix misspelled character names
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"ZACK"</span><span class="p">,</span><span class="w"> </span><span class="s2">"ZACH"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"LORLEAI"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LORELAI"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"LOREALI"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LORELAI"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"LORELI"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LORELAI"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"LORLAI"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LORELAI"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"LORELA$"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LORELAI"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"LORLELAI"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LORELAI"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"^ORELAI"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LORELAI"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"LOREAI"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LORELAI"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"^ORY"</span><span class="p">,</span><span class="w"> </span><span class="s2">"RORY"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"LUK$"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LUKE"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"BABETE"</span><span class="p">,</span><span class="w"> </span><span class="s2">"BABETTE"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"BABETTER"</span><span class="p">,</span><span class="w"> </span><span class="s2">"BABETTE"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"BARBETTE"</span><span class="p">,</span><span class="w"> </span><span class="s2">"BABETTE"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"BABETTE/MISS PATTY"</span><span class="p">,</span><span class="w"> </span><span class="s2">"BABETTE AND MISS PATTY"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"JACKSON/SOOKIE"</span><span class="p">,</span><span class="w"> </span><span class="s2">"JACKSON AND SOOKIE"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"LORELAI/SOOKIE"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LORELAI AND SOOKIE"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"LORELAI/RORY"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LORELAI AND RORY"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"TAYOR"</span><span class="p">,</span><span class="w"> </span><span class="s2">"TAYLOR"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"TRISTIN"</span><span class="p">,</span><span class="w"> </span><span class="s2">"TRISTAN"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"MICHE$"</span><span class="p">,</span><span class="w"> </span><span class="s2">"MICHEL"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"MICHELL"</span><span class="p">,</span><span class="w"> </span><span class="s2">"MICHEL"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"SOOKI$"</span><span class="p">,</span><span class="w"> </span><span class="s2">"SOOKIE"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"SOOKEI"</span><span class="p">,</span><span class="w"> </span><span class="s2">"SOOKIE"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"SOOKIES"</span><span class="p">,</span><span class="w"> </span><span class="s2">"SOOKIE"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"Mrs.KIM"</span><span class="p">,</span><span class="w"> </span><span class="s2">"MRS KIM"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"MRS.KIM"</span><span class="p">,</span><span class="w"> </span><span class="s2">"MRS KIM"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"MRS KIM"</span><span class="p">,</span><span class="w"> </span><span class="s2">"MRS KIM"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"RICHRAD"</span><span class="p">,</span><span class="w"> </span><span class="s2">"RICHARD"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"RMILY"</span><span class="p">,</span><span class="w"> </span><span class="s2">"EMILY"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"CHRISTOHPER"</span><span class="p">,</span><span class="w"> </span><span class="s2">"CHRISTOPHER"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"CHRISTOPER"</span><span class="p">,</span><span class="w"> </span><span class="s2">"CHRISTOPHER"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"CHRSTOPHER"</span><span class="p">,</span><span class="w"> </span><span class="s2">"CHRISTOPHER"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"CHRIS$"</span><span class="p">,</span><span class="w"> </span><span class="s2">"CHRISTOPHER"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"CHERRY"</span><span class="p">,</span><span class="w"> </span><span class="s2">"SHERRY"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"LINDAY"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LINDSAY"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">

</span><span class="c1"># substitute ’ with apostrophe
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"’"</span><span class="p">,</span><span class="w"> </span><span class="s2">"'"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">

</span><span class="c1"># some ANDs are written as &AMP; so they will be changed as well
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"&AMP;"</span><span class="p">,</span><span class="w"> </span><span class="s2">"AND"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">

</span><span class="c1"># and finally I want ANDs to be written as semicolons
</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">" AND "</span><span class="p">,</span><span class="w"> </span><span class="s2">";"</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">)</span><span class="w">

</span><span class="c1">#  and remove disclaimer lines
</span><span class="n">transcripts_2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">transcripts_2</span><span class="p">[</span><span class="o">-</span><span class="n">which</span><span class="p">(</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"DISCLAIMER"</span><span class="p">),</span><span class="w"> </span><span class="p">]</span><span class="w"> 
</span><span class="n">head</span><span class="p">(</span><span class="n">transcripts_2</span><span class="p">)</span><span class="w">
</span>
##   character                                  dialogue season episode
## 1   LORELAI     Please, Luke. Please, please, please.      1     1_1
## 2      LUKE  How many cups have you had this morning?      1     1_1
## 3   LORELAI                                     None.      1     1_1
## 4      LUKE                                   Plus...      1     1_1
## 5   LORELAI                Five, but yours is better.      1     1_1
## 6      LUKE                       You have a problem.      1     1_1
##   episode_running_nr
## 1                  1
## 2                  1
## 3                  1
## 4                  1
## 5                  1
## 6                  1

This is how the data frame looked like after tidying.

<span class="n">nrow</span><span class="p">(</span><span class="n">transcripts_2</span><span class="p">)</span><span class="w">
</span>
## [1] 116954

In total there are now 116,983 lines.

How many characters are there and how many lines do they have?

To find out how many characters there were in Gilmore Girls during 153 episodes, I couldn’t simply count them because there are combined characters (e.g. Lorelai and Rory speaking together) and voice overs among them.

First, I want to duplicate all lines with two speakers, to make them count for each character. I also want to only count lines where there is only one character, so I removed all character fields with multiple, generic or unspecific characters. And I don’t want to have voice overs either.

Most of the characters, however, were still not recurring characters, so I filtered out all those characters that only occurred in one episode.

<span class="c1"># separating all rows where multiple characters spoke into one line per character with duplicate line text
</span><span class="n">library</span><span class="p">(</span><span class="n">splitstackshape</span><span class="p">)</span><span class="w">
</span><span class="n">transcripts_2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cSplit</span><span class="p">(</span><span class="n">transcripts_2</span><span class="p">,</span><span class="w"> </span><span class="n">splitCols</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"character"</span><span class="p">,</span><span class="w"> </span><span class="n">sep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">";"</span><span class="p">,</span><span class="w"> </span><span class="n">direction</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"long"</span><span class="p">)</span><span class="w">

</span><span class="c1"># manually removing characters I don't want to keep
</span><span class="n">characters</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">(</span><span class="n">table</span><span class="p">(</span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">character</span><span class="p">,</span><span class="w"> </span><span class="n">transcripts_2</span><span class="o">$</span><span class="n">episode</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">" VOICE"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"^ALL"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"AS A GROUP"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"CROWD"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"RADIO"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"BIKERS"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"ANNOUNCER"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"BOTH"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"WOMAN"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"VOICE"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"BARTENDER"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"OFFICER"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"GIRL"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"GIRLS"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"BOYS"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"EVERYONE"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"SUPERVISOR"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"PHOTOGRAPHER"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"RECEPTIONIST"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"CUSTOMER"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"TV"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"VET"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"KID"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"MOM"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"DOCTOR"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">subset</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"BOUNCER"</span><span class="p">,</span><span class="w"> </span><span class="n">Var1</span><span class="p">))</span><span class="w"> </span><s...

To leave a comment for the author, please follow the link and comment on their blog: Shirin's playgRound.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)