Getting data on your government

[This article was first published on Recology - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


I created an R package a while back to interact with some APIs that serve up data on what our elected represenatives are up to, including the New York Times Congress API, and the Sunlight Labs API.

What kinds of things can you do with govdat? Here are a few examples.


How do the two major parties differ in the use of certain words (searches the congressional record using the Sunlight Labs Capitol Words API)?

<span class="c1"># install_github('govdat', 'schamberlain')</span>
library<span class="p">(</span>govdat<span class="p">)</span>
library<span class="p">(</span>reshape2<span class="p">)</span>
library<span class="p">(</span>ggplot2<span class="p">)</span>

dems <span class="o"><-</span> sll_cw_dates<span class="p">(</span>phrase <span class="o">=</span> <span class="s">"science"</span><span class="p">,</span> start_date <span class="o">=</span> <span class="s">"1996-01-20"</span><span class="p">,</span> end_date <span class="o">=</span> <span class="s">"2012-09-01"</span><span class="p">,</span> 
    granularity <span class="o">=</span> <span class="s">"year"</span><span class="p">,</span> party <span class="o">=</span> <span class="s">"D"</span><span class="p">,</span> printdf <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>
repubs <span class="o"><-</span> sll_cw_dates<span class="p">(</span>phrase <span class="o">=</span> <span class="s">"science"</span><span class="p">,</span> start_date <span class="o">=</span> <span class="s">"1996-01-20"</span><span class="p">,</span> end_date <span class="o">=</span> <span class="s">"2012-09-01"</span><span class="p">,</span> 
    granularity <span class="o">=</span> <span class="s">"year"</span><span class="p">,</span> party <span class="o">=</span> <span class="s">"R"</span><span class="p">,</span> printdf <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>
df <span class="o"><-</span> melt<span class="p">(</span>rbind<span class="p">(</span>data.frame<span class="p">(</span>party <span class="o">=</span> rep<span class="p">(</span><span class="s">"D"</span><span class="p">,</span> nrow<span class="p">(</span>dems<span class="p">)),</span> dems<span class="p">),</span> data.frame<span class="p">(</span>party <span class="o">=</span> rep<span class="p">(</span><span class="s">"R"</span><span class="p">,</span> 
    nrow<span class="p">(</span>repubs<span class="p">)),</span> repubs<span class="p">)))</span>
df<span class="p">$</span>count <span class="o"><-</span> as.numeric<span class="p">(</span>df<span class="p">$</span>count<span class="p">)</span>

ggplot<span class="p">(</span>df<span class="p">,</span> aes<span class="p">(</span>yearmonth<span class="p">,</span> count<span class="p">,</span> colour <span class="o">=</span> party<span class="p">,</span> group <span class="o">=</span> party<span class="p">))</span> <span class="o">+</span> geom_line<span class="p">()</span> <span class="o">+</span> 
    labs<span class="p">(</span>y <span class="o">=</span> <span class="s">"use of the word 'Science'"</span><span class="p">)</span> <span class="o">+</span> theme_bw<span class="p">(</span>base_size <span class="o">=</span> <span class="m">18</span><span class="p">)</span> <span class="o">+</span> opts<span class="p">(</span>axis.text.x <span class="o">=</span> theme_text<span class="p">(</span>size <span class="o">=</span> <span class="m">10</span><span class="p">),</span> 
    panel.grid.major <span class="o">=</span> theme_blank<span class="p">(),</span> panel.grid.minor <span class="o">=</span> theme_blank<span class="p">(),</span> legend.position <span class="o">=</span> c<span class="p">(</span><span class="m">0.2</span><span class="p">,</span> 
        <span class="m">0.8</span><span class="p">))</span>

center


Let’s get some data on donations to individual elected representatives.

library<span class="p">(</span>plyr<span class="p">)</span>

<span class="c1"># Let's get Nancy Pelosi's entity ID</span>
sll_ts_aggregatesearch<span class="p">(</span><span class="s">'Nancy Pelosi'</span><span class="p">)[[</span><span class="m">1</span><span class="p">]]</span>
$name
[1] "Nancy Pelosi (D)"

$count_given
[1] 0

$firm_income
[1] 0

$count_lobbied
[1] 0

$seat
[1] "federal:senate"

$total_received
[1] 17197286

$state
[1] "WY"

$lobbying_firm
NULL

$count_received
[1] 11742

$party
[1] "R"

$total_given
[1] 0

$type
[1] "politician"

$id
[1] "85ab2e74589a414495d18cc7a9233981"

$non_firm_spending
[1] 0

$is_superpac
NULL
<span class="c1"># Her entity ID</span>
sll_ts_aggregatesearch<span class="p">(</span><span class="s">'Nancy Pelosi'</span><span class="p">)[[</span><span class="m">1</span><span class="p">]]$</span>id
[1] "85ab2e74589a414495d18cc7a9233981"
<span class="c1"># And search for her top donors by sector</span>
nancy <span class="o"><-</span> ldply<span class="p">(</span>sll_ts_aggregatetopsectors<span class="p">(</span>sll_ts_aggregatesearch<span class="p">(</span><span class="s">'Nancy Pelosi'</span><span class="p">)[[</span><span class="m">1</span><span class="p">]]$</span>id<span class="p">))</span>
nancy <span class="c1"># but just abbreviations for sectors</span>
   sector count     amount
1       P  1386 3263050.00
2       F  2148 3192072.00
3       H  1253 2086900.00
4       Q  1300 1529571.00
5       K  1411 1502517.00
6       N   926 1343187.00
7       B   712 1211544.00
8       W   759  817550.00
9       Y   822  666926.00
10      E   253  363539.00
data<span class="p">(</span>sll_ts_sectors<span class="p">)</span> <span class="c1"># load sectors abbrevations data</span>
nancy2 <span class="o"><-</span> merge<span class="p">(</span>nancy<span class="p">,</span> sll_ts_sectors<span class="p">,</span> by<span class="o">=</span><span class="s">"sector"</span><span class="p">)</span> <span class="c1"># attach full sector names</span>
nancy2_melt <span class="o"><-</span> melt<span class="p">(</span>nancy2<span class="p">[,</span><span class="m">-1</span><span class="p">],</span> id.vars<span class="o">=</span><span class="m">3</span><span class="p">)</span>
nancy2_melt<span class="p">$</span>value <span class="o"><-</span> as.numeric<span class="p">(</span>nancy2_melt<span class="p">$</span>value<span class="p">)</span>
ggplot<span class="p">(</span>nancy2_melt<span class="p">,</span> aes<span class="p">(</span>sector_name<span class="p">,</span> value<span class="p">))</span> <span class="o">+</span> <span class="c1"># and lets plot some results</span>
    geom_bar<span class="p">()</span> <span class="o">+</span>
    coord_flip<span class="p">()</span> <span class="o">+</span>
    facet_wrap<span class="p">(</span>~ variable<span class="p">,</span> scales<span class="o">=</span><span class="s">"free"</span><span class="p">,</span> ncol<span class="o">=</span><span class="m">1</span><span class="p">)</span>

center

<span class="c1">## It looks like a lot of individual donations (the count facet) by finance/insurance/realestate, but by amount, the most (by slim margin) is from labor organizations.</span>

Or we may want to get a bio of a congressperson. Here we get Todd Akin of MO. And some twitter searching too? Indeed.

out <span class="o"><-</span> nyt_cg_memberbioroles<span class="p">(</span><span class="s">"A000358"</span><span class="p">)</span>  <span class="c1"># cool, lots of info, output cutoff for brevity</span>
out<span class="p">[[</span><span class="m">3</span><span class="p">]][[</span><span class="m">1</span><span class="p">]][</span><span class="m">1</span>:<span class="m">2</span><span class="p">]</span>
$member_id
[1] "A000358"

$first_name
[1] "Todd"
<span class="c1"># we can get his twitter id from this bio, and search twitter using</span>
<span class="c1"># twitteR package</span>
akintwitter <span class="o"><-</span> out<span class="p">[[</span><span class="m">3</span><span class="p">]][[</span><span class="m">1</span><span class="p">]]$</span>twitter_id

<span class="c1"># install.packages('twitteR')</span>
library<span class="p">(</span>twitteR<span class="p">)</span>
tweets <span class="o"><-</span> userTimeline<span class="p">(</span>akintwitter<span class="p">,</span> n <span class="o">=</span> <span class="m">100</span><span class="p">)</span>
tweets<span class="p">[</span><span class="m">1</span>:<span class="m">5</span><span class="p">]</span>  <span class="c1"># there's some gems in there no doubt</span>
[[1]]
[1] "RepToddAkin: Do you receive my Akin Alert e-newsletter?  Pick the issues you’d like to get updates on and sign up here!\nhttp://t.co/nZfiRjTF"

[[2]]
[1] "RepToddAkin: If the 2001 &amp; 2003 tax policies expire, taxes will increase over $4 trillion in the next 10 years. America can't afford it. #stopthetaxhike"

[[3]]
[1] "RepToddAkin: A govt agency's order shouldn't defy constitutional rights. I'm still working for #religiousfreedom and repealing the HHS mandate. #prolife"

[[4]]
[1] "RepToddAkin: I am a cosponsor of the bill being considered today to limit abortions in DC. RT if you agree! #prolife http://t.co/Mesrjl0w"

[[5]]
[1] "RepToddAkin: We need to #StopTheTaxHike. Raising taxes like the President wants would destroy more than 700,000 jobs. #4jobs http://t.co/KUTd0M7U"

Get the .Rmd file used to create this post at my github account – or .md file.


Written in Markdown, with help from knitr, and nice knitr highlighting/etc. in in RStudio.

To leave a comment for the author, please follow the link and comment on their blog: Recology - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)