Extracting data from Twitter for @hrbrmstr’s #nom foodie images
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Bob Rudis (@hrbrmstr) is a famed expert, author and developer in Data Security and the Chief Security Data Scientist at Rapid7. Bob also creates the most deliciously vivid images of his meals documented by the #nom hashtag. I’m going to use a similar method used in my previous projects (Hipster Veggies & Machine Learning Flashcards) to wrangle all those images into a nice collection – mostly for me to look at for inspiration in recipe planning.
Yum! Have you ever thought about collecting all these recipes & images into a cookbook?!
— Jasmine Dumas (@jasdumas) January 15, 2018
Source Repository: jasdumas/bobs-noms
Analysis
<span class="n">library</span><span class="p">(</span><span class="n">rtweet</span><span class="p">)</span><span class="w"> </span><span class="c1"># devtools::install_github("mkearney/rtweet")</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">stringr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">magick</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">knitr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">kableExtra</span><span class="p">)</span><span class="w">
</span>
<span class="c1"># get all of bob's recent tweets</span><span class="w">
</span><span class="n">bobs_tweets</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">get_timeline</span><span class="p">(</span><span class="n">user</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"hrbrmstr"</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3200</span><span class="p">)</span><span class="w">
</span><span class="c1">#filter noms with images only</span><span class="w">
</span><span class="n">bobs_noms</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">bobs_tweets</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">dplyr</span><span class="o">::</span><span class="n">filter</span><span class="p">(</span><span class="n">str_detect</span><span class="p">(</span><span class="n">hashtags</span><span class="p">,</span><span class="w"> </span><span class="s2">"nom"</span><span class="p">),</span><span class="w"> </span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">media_url</span><span class="p">))</span><span class="w">
</span>
<span class="n">bobs_noms</span><span class="o">$</span><span class="n">clean_text</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">bobs_noms</span><span class="o">$</span><span class="n">text</span><span class="w">
</span><span class="n">bobs_noms</span><span class="o">$</span><span class="n">clean_text</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">str_replace</span><span class="p">(</span><span class="n">bobs_noms</span><span class="o">$</span><span class="n">clean_text</span><span class="p">,</span><span class="s2">"#[a-zA-Z0-9]{1,}"</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">)</span><span class="w"> </span><span class="c1"># remove the hashtag</span><span class="w">
</span><span class="n">bobs_noms</span><span class="o">$</span><span class="n">clean_text</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">str_replace</span><span class="p">(</span><span class="n">bobs_noms</span><span class="o">$</span><span class="n">clean_text</span><span class="p">,</span><span class="w"> </span><span class="s2">" ?(f|ht)(tp)(s?)(://)(.*)[.|/](.*)"</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">)</span><span class="w"> </span><span class="c1"># remove the url link</span><span class="w">
</span><span class="n">bobs_noms</span><span class="o">$</span><span class="n">clean_text</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">str_replace</span><span class="p">(</span><span class="n">bobs_noms</span><span class="o">$</span><span class="n">clean_text</span><span class="p">,</span><span class="w"> </span><span class="s2">"[[:punct:]]"</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">)</span><span class="w"> </span><span class="c1"># remove punctuation</span><span class="w">
</span>
<span class="c1"># let's look at these images in a smaller data set</span><span class="w">
</span><span class="n">bobs_noms_small</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">bobs_noms</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">select</span><span class="p">(</span><span class="n">created_at</span><span class="p">,</span><span class="w"> </span><span class="n">clean_text</span><span class="p">,</span><span class="w"> </span><span class="n">media_url</span><span class="p">)</span><span class="w">
</span><span class="n">bobs_noms_small</span><span class="o">$</span><span class="n">img_md</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="s2">""</span><span class="p">)</span><span class="w">
</span>
<span class="n">data.frame</span><span class="p">(</span><span class="n">images</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">bobs_noms_small</span><span class="o">$</span><span class="n">img_md</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">kable</span><span class="p">(</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"markdown"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">kable_styling</span><span class="p">(</span><span class="n">full_width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">F</span><span class="p">,</span><span class="w"> </span><span class="n">position</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'center'</span><span class="p">)</span><span class="w">
</span>
|images |
|:———————————————————————————————————————————————————-|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
<span class="c1"># create a function to save these images!</span><span class="w">
</span><span class="n">save_image</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">df</span><span class="p">){</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">nrow</span><span class="p">(</span><span class="n">df</span><span class="p">))){</span><span class="w">
</span><span class="n">image</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">try</span><span class="p">(</span><span class="n">image_read</span><span class="p">(</span><span class="n">df</span><span class="o">$</span><span class="n">media_url</span><span class="p">[[</span><span class="n">i</span><span class="p">]]),</span><span class="w"> </span><span class="n">silent</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">F</span><span class="p">)</span><span class="w">
</span><span class="k">if</span><span class="p">(</span><span class="nf">class</span><span class="p">(</span><span class="n">image</span><span class="p">)[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s2">"try-error"</span><span class="p">){</span><span class="w">
</span><span class="n">image</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">image_scale</span><span class="p">(</span><span class="s2">"1200x700"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">image_write</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="s2">"../post_data/data/"</span><span class="p">,</span><span class="w"> </span><span class="n">bobs_noms</span><span class="o">$</span><span class="n">clean_text</span><span class="p">[</span><span class="n">i</span><span class="p">],</span><span class="s2">".jpg"</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">cat</span><span class="p">(</span><span class="s2">"saved images...\n"</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">save_image</span><span class="p">(</span><span class="n">bobs_noms</span><span class="p">)</span><span class="w">
</span>
## saved images...
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.