143 search results for "web scraping"

Mapping the Iowa GOP 2012 Caucus Results

January 4, 2012
By
Mapping the Iowa GOP 2012 Caucus Results

Introduction On Tuesday January 3rd 2012 the Iowa Republican party held it’s presidential caucuses, with Mitt Romney beating Rick Santorum by 8 votes as of noon on Jan 4th. This was an exciting race with multiple lead changes and entrance polling showing many late undecideds and large gaps in candidate support by age and income.

Read more »

Outliers in the European Parliament

December 20, 2011
By
Outliers in the European Parliament

Earlier this year I had a lot of fun learning how to use the BeautifulSoup and mechanize modules in python to scrape websites. My goal was to scrape the European Parliament website for information on the activity levels of the different MEPs. I struggl...

Read more »

Subscriptions Feature Added

December 7, 2011
By
Subscriptions Feature Added

You can now subscribe to almost any content on the ProgrammingR website, including the job listings. To be notified of job listings as soon as they are posted, click the “R Jobs” link above and follow the instructions on that page to add the jobs feed to your feed reader.Because of this change, I will The post Subscriptions...

Read more »

Google Scholar (still) sucks

November 13, 2011
By

(This is a follow-up to my previous post on the topic.)I was encouraged by the appearance of two R-based Scholar-scrapers, within a week of each other. One, by Kay Cichini, converts the page URLs into text mode and scrapes from there (There's a slightl...

Read more »

Power Tools for Aspiring Data Journalists: R

October 31, 2011
By
Power Tools for Aspiring Data Journalists: R

Picking up on Paul Bradshaw’s post A quick exercise for aspiring data journalists which hints at how you can use Google Spreadsheets to grab – and explore – a mortality dataset highlighted by Ben Goldacre in DIY statistical analysis: experience the thrill of touching real data, I thought I’d describe a quick way of analysing

Read more »

Forecasting recessions

August 9, 2011
By
Forecasting recessions

John Hussman has a Recession Warning Composite that I am attempting to replicate/improve. The underlying data seems to be easy enough to get from FRED using the quantmod package in R. I don't quite understand the index Hussman is using for commercial...

Read more »

CHCN: Canadian Historical Climate Network

August 4, 2011
By
CHCN: Canadian Historical Climate Network

A reader asked a question about data from   environment canada.  He wanted to know if that data could somehow be integrated into the RGhcnV3 package.  That turned out to be a bit more challenging that I expected.  In short order I’d found a couple other people who had done something similar.  DrJ of course was

Read more »

hacking .gov shortened links

July 30, 2011
By
hacking .gov shortened links

This past Friday, the web portal to the US Federal government, USA.gov, organized hackathons across the US for programmers and data scientists to work with and analyze the data from their link-shortening service. It turns out that if you shorten a web link with bit.ly, the shortened link looks like 1.usa.gov/V6NpL (that one goes to

Read more »

roll calls, ideal points, 112th Congress

June 29, 2011
By
roll calls, ideal points, 112th Congress

Now that classes are over, I took a little time to update my scripts that update the analysis of Congressional roll calls in close to real time.   Links appear at the top of the blog.   As of about 15 minutes ago, we’re up to 77 non-unanimous roll calls in the 112th Senate.  

Read more »

Automating R Scripts on Amazon EC2

June 9, 2011
By
Automating R Scripts on Amazon EC2

Overview: How to setup R on an EC2 instance of Ubuntu 11.04 (Natty Narwhal) How to setup Apache Tomcat 6.0 web server and configuring it with basic authentication so that we can view our output from R on a password … Continue reading →

Read more »