Posts Tagged ‘ XML ’

Gathering RealClearPolitics Polling Trends with XML

November 19, 2012
By
Gathering RealClearPolitics Polling Trends with XML

Now that the election is over, you may want to use polling data in a model of the campaign. Simon Jackman has thoughtfully made his daily state-by-state predictions available for download, but a commonly-used dataset is the RealClearPolitics polling a...

Read more »

Setting up FastRWeb on Mac OS X

February 23, 2012
By
Setting up FastRWeb on Mac OS X

FastRWeb is an infrastructure that allows any webserver to use R scripts for generating dynamic content, such as web pages and graphics. In this post, you’ll learn how to install and set up FastRWeb on a Mac. This tutorial is expendable to any Unix-like operating system. It is an adaptation from Jay Emerson’s post, Setting

Read more »

Credit rating by country

January 17, 2012
By
Credit rating by country

The financial crisis has put a lot of pressure on countries' long-term foreign currency credit ratings, with France recently being downgraded by S&P. Wikipedia provides a list of countries by credit ratings as report by US rating agencies S&P, Fitch, ...

Read more »

Update on Scary Derivatives

November 16, 2011
By
Update on Scary Derivatives

After reading Bloomberg’s article, JPMorgan Chase & Co. and Goldman Sachs Group Inc., among the world’s biggest traders of credit derivatives, disclosed to shareholders that they have sold protection on more than $5 trillion of debt globally. ...

Read more »

GScholarXScraper: Hacking the GScholarScraper function with XPath

November 13, 2011
By
GScholarXScraper: Hacking the GScholarScraper function with XPath

Kay Cichini recently wrote a word-cloud R function called GScholarScraper on his blog which when given a search string will scrape the associated search results returned by Google Scholar, across pages, and then produce a word-cloud visualisation. This was of interest to me because around the same time I posted an independent Google Scholar scraper function  get_google_scholar_df()

Read more »

Web Scraping Google Scholar: Part 2 (Complete Success)

November 8, 2011
By
Web Scraping Google Scholar: Part 2 (Complete Success)

This is a followup to a post I uploaded earlier today about web scraping data off Google Scholar. In that post I was frustrated because I’m not smart enough to use xpathSApply to get the kind of results I wanted. However fast-forward to the evening whilst having dinner with a friend, as a passing remark,

Read more »

Web Scraping Google Scholar (Partial Success)

November 8, 2011
By

I wanted to scrape the information returned by a Google Scholar web search into an R data frame as a quick XPath exercise. The following will successfully extract  the ‘title’, ‘url’ , ‘publication’ and ‘description’.  If any of these fields are not available, as in the case of a citation, the corresponding cell in the data

Read more »

Web Scraping Google URLs

November 7, 2011
By
Web Scraping Google URLs

Google slightly changed the html code it uses for hyperlinks on search pages last Thursday, thus causing one of my scripts to stop working. Thankfully, this is easily solved in R thanks to the XML package and the power and simplicity of XPath expressions: Lovely jubbly! P.S. I know that there is an API of

Read more »

R related books: Traditional vs online publishing

October 12, 2011
By
R related books: Traditional vs online publishing

How many R related books have been published so far? Who is the most popular publisher? How many other manuals, tutorials and books have been published online? Let's find out. A few years ago I used the publication list on r-project.org as an argument ...

Read more »

Measuring Price Dispersion of Marijuana

April 12, 2011
By
Measuring Price Dispersion of Marijuana

The intersection of mapping APIs, fast database operations and user engagement offers a lot of very cool crowdsourcing applications ranging from the benign and powerful (Google’s Person Finder) to the minor and questionable (A DUI checkpoints app). Most intriguing in … Continue reading →

Read more »