(If you don’t know what XML is, you should probably read a primer before reading this post,)
When working with data, one inevitably comes across things encoded in XML. I’m in the “anti-XML” camp, but deal with my fair share of XML in “cyber” and help out enough people who have to work with XML that I’ve become pretty proficient when slicing & dicing it.
R has two main packages to deal with XML: the original
XML package and the more lightweight and modern
xml2 package. If you really need all the power of
libxml2 (the C library that powers both packages) or are creating XML from R, then you probably know your way around the
XML package and are pretty self-sufficient.
Most folks can get by with the
xml2 package if their goal is to work with XML data. By “work with” I mean read in files or data from APIs that come in XML format and have to find nuggets of gold in between all those
> tags. To do so requires finding what you need and that means using a query language called
XPath to pinpoint the node(s) you are after. Working with
XPath can be pretty daunting for those who went to school to ultimately cure diseases, build high-performing stock portfolios, target advertising to everyone or perform a host of other real work. Becoming an expert in
XPath was not something on the bucket list but to work with XML you will need to be familiar with it.
xmlview package provides a way to visually inspect XML and interactively test out
XPath expressions. It’s as simple to use as:
devtools::install_github("ramnathv/htmlwidgets") # we use some bleeding edge features devtools::install_github("hrbrmstr/xmlview") library(xml2) library(xmlview) # plain text XML xml_view("
(There’s also an experimental
xml_tree_view() in there by @timelyportfolio that we’ll be adding features to at a pretty rapid pace.)
Here’s a screenshot of it in action:
There are options to change the CSS styling for the formatted code. Yep, it will format and highlight XML for you so it’s easier to work with. There’s an animated gif of a screencast over on github as well.
Once you perfect your
XPath expression, hit the “R” button and it will generate the code you can copy back into RStudio. It understands namespaces but try not to stuff a huge XML document in there as browsers don’t work well with large data elements (the viewer is an
htmlwidget and is, hence, browser-based).
It works with plain character XML/HTML, and many
xml2 data types. I have no current plans for
XML package object support but toss up an issue on github if you really need it (or, better yet, a PR). If there are other desired features (especially from educators), please post a request in github issue as well.
Watch for more features in the coming weeks and a CRAN release once the bleeding edge
htmlwidgets packages makes it to CRAN.