literacy rates using semantics and R

June 19, 2013
By

(This article was first published on - R, and kindly contributed to R-bloggers)

Literacy Rates

Somehow I stumbled into the world of linked open data trying to pull information easily off of a wikipedia page without having to write a customer scrapper. Enter in dbpedia, semantic technologies and some wonderful R packages take care of the back-end coding.

The Research Group Data and Web Science at the University of Mannheim has exposed a SPARQL endpoint for the CIA Factbook

Using this and the following query, I was able to quickly pull the gender specific literacy rates:

PREFIX db: <http://wifo5-04.informatik.uni-mannheim.de/factbook/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX d2r: <http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/config.rdf#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX map: <file:/var/www/wbsg.de/factbook/factbook.n3#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX factbook: <http://wifo5-04.informatik.uni-mannheim.de/factbook/ns#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT 
  DISTINCT ?label 
    ?litMale 
    ?litFemale 
    ((?litMale - ?litFemale) AS ?litDiff)
  WHERE { 
    ?resource factbook:literacy_female ?litFemale;
      factbook:literacy_male ?litMale; 
      rdfs:label ?label .
  }

What’s the next logical step after getting data back in tabular form?

Visualization* using ggplot2!

Female literacy rates are on the x-axis, male literacy rates on the y-axis, size of the country name represents the distance between the gender rates and the color of the country name is based on the relative “strength” of the gender differences.

Full code is available in a github repo: dataparadigms - SemanticR.

To leave a comment for the author, please follow the link and comment on his blog: - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.