Combining Github Traffic Plots Across Repositories

[This article was first published on Jonathan Sidi's R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post will show how to use the RSelenium package to scrape your own github account to retrieve all that fun traffic data of clones and visits and create a single traffic plot for your account.

For the single file you can find it in this gist file.

Packages

<span class="n">library</span><span class="p">(</span><span class="n">RSelenium</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">XML</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">reshape2</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">plyr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w">
</span>

Fill in the relevant information for your account. The team is usually your username, but it can be different. The repos can be a vector and since we are going in the front door of the site we can access the private repositories too!

Setup

<span class="n">gh_user</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s1">'<your github login name>'</span><span class="w">
</span><span class="n">gh_pass</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s1">'<your github login password>'</span><span class="w">

</span><span class="n">gh_team</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s1">'<team associated with account>'</span><span class="w">
</span><span class="n">repos</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s1">'<repositories in team>'</span><span class="w">
</span>

The function

<span class="n">github_traffic</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">gh_user</span><span class="p">,</span><span class="n">gh_pass</span><span class="p">,</span><span class="n">gh_team</span><span class="p">,</span><span class="n">repos</span><span class="p">){</span><span class="w">

</span><span class="c1">#open the connection
</span><span class="w">
</span><span class="n">rD</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">RSelenium</span><span class="o">::</span><span class="n">rsDriver</span><span class="p">(</span><span class="n">verbose</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">remDr</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rD</span><span class="p">[[</span><span class="s2">"client"</span><span class="p">]]</span><span class="w">

</span><span class="c1">#going to the first repo to invoke the login
</span><span class="w">
</span><span class="n">remDr</span><span class="o">$</span><span class="n">navigate</span><span class="p">(</span><span class="n">sprintf</span><span class="p">(</span><span class="s1">'https://github.com/%s/%s/graphs/traffic'</span><span class="p">,</span><span class="n">gh_team</span><span class="p">,</span><span class="n">repos</span><span class="p">[</span><span class="m">1</span><span class="p">]))</span><span class="w">

</span><span class="c1">#entering the login information in the form and clicking the button. 
</span><span class="w">
</span><span class="n">webElem</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">remDr</span><span class="o">$</span><span class="n">findElement</span><span class="p">(</span><span class="n">using</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'id'</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"login_field"</span><span class="p">)</span><span class="w">
</span><span class="n">webElem</span><span class="o">$</span><span class="n">setElementAttribute</span><span class="p">(</span><span class="n">attributeName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'value'</span><span class="p">,</span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">gh_user</span><span class="p">)</span><span class="w">
</span><span class="n">webElem</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">remDr</span><span class="o">$</span><span class="n">findElement</span><span class="p">(</span><span class="n">using</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'id'</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"password"</span><span class="p">)</span><span class="w">
</span><span class="n">webElem</span><span class="o">$</span><span class="n">setElementAttribute</span><span class="p">(</span><span class="n">attributeName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'value'</span><span class="p">,</span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">gh_pass</span><span class="p">)</span><span class="w">
</span><span class="n">webElem</span><span class="o">=</span><span class="n">remDr</span><span class="o">$</span><span class="n">findElement</span><span class="p">(</span><span class="n">using</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'xpath'</span><span class="p">,</span><span class="s1">'//*[@id="login"]/form/div[4]/input[3]'</span><span class="p">)</span><span class="w">
</span><span class="n">webElem</span><span class="o">$</span><span class="n">clickElement</span><span class="p">()</span><span class="w">
</span><span class="n">Sys.sleep</span><span class="p">(</span><span class="m">1</span><span class="p">)</span><span class="w">

</span><span class="c1"># Retrieve the plots into an html
</span><span class="n">out</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">plyr</span><span class="o">::</span><span class="n">llply</span><span class="p">(</span><span class="n">repos</span><span class="p">,</span><span class="k">function</span><span class="p">(</span><span class="n">repo</span><span class="p">){</span><span class="w">
  </span><span class="n">remDr</span><span class="o">$</span><span class="n">navigate</span><span class="p">(</span><span class="n">sprintf</span><span class="p">(</span><span class="s1">'https://github.com/%s/%s/graphs/traffic'</span><span class="p">,</span><span class="n">gh_team</span><span class="p">,</span><span class="n">repo</span><span class="p">))</span><span class="w">
  </span><span class="n">Sys.sleep</span><span class="p">(</span><span class="m">1</span><span class="p">)</span><span class="w">
  </span><span class="n">out</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">XML</span><span class="o">::</span><span class="n">htmlParse</span><span class="p">(</span><span class="n">remDr</span><span class="o">$</span><span class="n">getPageSource</span><span class="p">(),</span><span class="n">asText</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
  </span><span class="n">sapply</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s1">'clones'</span><span class="p">,</span><span class="s1">'visitors'</span><span class="p">),</span><span class="k">function</span><span class="p">(</span><span class="n">type</span><span class="p">){</span><span class="w">
  </span><span class="n">XML</span><span class="o">::</span><span class="n">getNodeSet</span><span class="p">(</span><span class="n">out</span><span class="p">,</span><span class="n">sprintf</span><span class="p">(</span><span class="n">sprintf</span><span class="p">(</span><span class="s1">'//*[@id="js-%s-graph"]/div/div[1]/svg/g/g'</span><span class="p">,</span><span class="n">type</span><span class="p">)))</span><span class="w">
</span><span class="p">},</span><span class="n">simplify</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="n">USE.NAMES</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="p">},</span><span class="n">.progress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'text'</span><span class="p">)</span><span class="w">

</span><span class="c1"># set the names (llply doesnt)
</span><span class="nf">names</span><span class="p">(</span><span class="n">out</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">repos</span><span class="w">

</span><span class="c1"># that's it we dont need the connection anymore
</span><span class="n">remDr</span><span class="o">$</span><span class="n">close</span><span class="p">()</span><span class="w">
</span><span class="n">rD</span><span class="p">[[</span><span class="s2">"server"</span><span class="p">]]</span><span class="o">$</span><span class="n">stop</span><span class="p">()</span><span class="w">

</span><span class="c1"># scrape the data from html into a data.frame
</span><span class="w">
</span><span class="n">plot_data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">plyr</span><span class="o">::</span><span class="n">ldply</span><span class="p">(</span><span class="n">out</span><span class="p">,</span><span class="k">function</span><span class="p">(</span><span class="n">repo</span><span class="p">){</span><span class="w">
  </span><span class="n">plyr</span><span class="o">::</span><span class="n">mdply</span><span class="p">(</span><span class="nf">names</span><span class="p">(</span><span class="n">repo</span><span class="p">),</span><span class="k">function</span><span class="p">(</span><span class="n">type</span><span class="p">){</span><span class="w">
    
    </span><span class="n">dat</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">repo</span><span class="p">[[</span><span class="n">type</span><span class="p">]]</span><span class="w">
  
    </span><span class="k">if</span><span class="p">(</span><span class="nf">is.null</span><span class="p">(</span><span class="n">dat</span><span class="p">))</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="kc">NULL</span><span class="p">)</span><span class="w">
    
    </span><span class="c1"># tick values we need for rescaling
</span><span class="w">    </span><span class="n">yticks_total</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">sapply</span><span class="p">(</span><span class="n">getNodeSet</span><span class="p">(</span><span class="n">dat</span><span class="p">[[</span><span class="m">2</span><span class="p">]],</span><span class="s1">'g'</span><span class="p">),</span><span class="n">XML</span><span class="o">::</span><span class="n">xmlValue</span><span class="p">))</span><span class="w">
    </span><span class="n">yticks_unique</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">sapply</span><span class="p">(</span><span class="n">getNodeSet</span><span class="p">(</span><span class="n">dat</span><span class="p">[[</span><span class="m">5</span><span class="p">]],</span><span class="s1">'g'</span><span class="p">),</span><span class="n">XML</span><span class="o">::</span><span class="n">xmlValue</span><span class="p">))</span><span class="w">
    
    </span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">type</span><span class="o">=</span><span class="n">type</span><span class="p">,</span><span class="w">
                    </span><span class="n">date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="n">sapply</span><span class="p">(</span><span class="n">getNodeSet</span><span class="p">(</span><span class="n">dat</span><span class="p">[[</span><span class="m">1</span><span class="p">]],</span><span class="s1">'g'</span><span class="p">),</span><span class="n">XML</span><span class="o">::</span><span class="n">xmlValue</span><span class="p">),</span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'%m/%d'</span><span class="p">),</span><span class="w">
                    </span><span class="n">total</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">sapply</span><span class="p">(</span><span class="n">getNodeSet</span><span class="p">(</span><span class="n">dat</span><span class="p">[[</span><span class="m">3</span><span class="p">]],</span><span class="s1">'circle'</span><span class="p">),</span><span class="n">XML</span><span class="o">::</span><span class="n">xmlGetAttr</span><span class="p">,</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'cy'</span><span class="p">)),</span><span class="w">
                    </span><span class="n">unique</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">sapply</span><span class="p">(</span><span class="n">getNodeSet</span><span class="p">(</span><span class="n">dat</span><span class="p">[[</span><span class="m">4</span><span class="p">]],</span><span class="s1">'circle'</span><span class="p">),</span><span class="n">XML</span><span class="o">::</span><span class="n">xmlGetAttr</span><span class="p">,</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'cy'</span><span class="p">)))</span><span class="w">
    
    </span><span class="c1"># Because this is a d3.js object there are some technical details that
</span><span class="w">    </span><span class="c1"># I'm skipping here, but in short the y values need to be rescaled 
</span><span class="w">    </span><span class="c1"># to show the actual values that you need.
</span><span class="w">    </span><span class="n">x</span><span class="o">$</span><span class="n">total</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">scales</span><span class="o">::</span><span class="n">rescale</span><span class="p">(</span><span class="n">x</span><span class="o">$</span><span class="n">total</span><span class="p">,</span><span class="n">rev</span><span class="p">(</span><span class="nf">range</span><span class="p">(</span><span class="n">yticks_total</span><span class="p">)))</span><span class="w">
    </span><span class="n">x</span><span class="o">$</span><span class="n">unique</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">scales</span><span class="o">::</span><span class="n">rescale</span><span class="p">(</span><span class="n">x</span><span class="o">$</span><span class="n">unique</span><span class="p">,</span><span class="n">rev</span><span class="p">(</span><span class="nf">range</span><span class="p">(</span><span class="n">yticks_unique</span><span class="p">)))</span><span class="w">
    
    </span><span class="c1">#rehape the data.frame from wide to long
</span><span class="w">    </span><span class="n">x</span><span class="o">%>%</span><span class="n">reshape2</span><span class="o">::</span><span class="n">melt</span><span class="p">(</span><span class="n">.</span><span class="p">,</span><span class="nf">c</span><span class="p">(</span><span class="s1">'type'</span><span class="p">,</span><span class="s1">'date'</span><span class="p">),</span><span class="n">variable.name</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="s1">'metric'</span><span class="p">))</span><span class="w">
  </span><span class="p">})</span><span class="w">
</span><span class="p">},</span><span class="n">.id</span><span class="o">=</span><span class="s1">'repo'</span><span class="p">)</span><span class="w">

</span><span class="c1">#Create the plot
</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">plot_data</span><span class="p">,</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">date</span><span class="p">,</span><span class="n">y</span><span class="o">=</span><span class="n">value</span><span class="p">,</span><span class="n">colour</span><span class="o">=</span><span class="n">repo</span><span class="p">))</span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">()</span><span class="o">+</span><span class="n">geom_line</span><span class="p">()</span><span class="o">+</span><span class="w">
  </span><span class="n">facet_grid</span><span class="p">(</span><span class="n">type</span><span class="o">~</span><span class="n">metric</span><span class="p">,</span><span class="n">scales</span><span class="o">=</span><span class="s1">'free_y'</span><span class="p">)</span><span class="o">+</span><span class="w">
  </span><span class="n">scale_x_date</span><span class="p">(</span><span class="n">date_breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"1 day"</span><span class="p">,</span><span class="n">date_labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"%m/%d"</span><span class="p">)</span><span class="o">+</span><span class="w">
  </span><span class="n">theme_bw</span><span class="p">()</span><span class="o">+</span><span class="w">
  </span><span class="n">theme</span><span class="p">(</span><span class="n">axis.text.x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">angle</span><span class="o">=</span><span class="m">90</span><span class="p">),</span><span class="n">legend.position</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'top'</span><span class="p">)</span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="n">sprintf</span><span class="p">(</span><span class="s1">'Github Team: %s'</span><span class="p">,</span><span class="n">gh_team</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">
</span>

Run the function

<span class="n">traffic_plot</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">github_traffic</span><span class="p">(</span><span class="n">gh_user</span><span class="o">=</span><span class="n">gh_user</span><span class="p">,</span><span class="w">
                               </span><span class="n">gh_pass</span><span class="o">=</span><span class="n">gh_pass</span><span class="p">,</span><span class="w">
                               </span><span class="n">gh_team</span><span class="o">=</span><span class="n">gh_team</span><span class="p">,</span><span class="w">
                               </span><span class="n">repos</span><span class="o">=</span><span class="n">repos</span><span class="p">)</span><span class="w">
</span>
<span class="n">traffic_plot</span><span class="w">
</span>

If the function fails for some reason this will release the port RSelenium is holding ransom.

<span class="n">rD</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">RSelenium</span><span class="o">::</span><span class="n">rsDriver</span><span class="p">(</span><span class="n">verbose</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="n">port</span><span class="o">=</span><span class="m">4444L</span><span class="p">)</span><span class="w">
</span><span class="n">remDr</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rD</span><span class="o">$</span><span class="n">client</span><span class="w">
</span><span class="n">remDr</span><span class="o">$</span><span class="n">close</span><span class="p">()</span><span class="w">
</span>

To leave a comment for the author, please follow the link and comment on their blog: Jonathan Sidi's R Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)