xml2 1.0.0
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We are pleased to announced that xml2 1.0.0 is now available on CRAN. Xml2 is a wrapper around the comprehensive libxml2 C library, and makes it easy to work with XML and HTML files in R. Install the latest version with:
install.packages("xml2")
There are three major improvements in 1.0.0:
- You can now modify and create XML documents.
xml_find_first()
replacesxml_find_one()
, and provides better semantics for missing nodes.- Improved namespace handling when working with XPath.
There are many other small improvements and bug fixes: please see the release notes for a complete list.
Modification and creation
xml2 now supports modification and creation of XML nodes. This includes new functions xml_new_document()
, xml_new_child()
, xml_new_sibling()
, xml_set_namespace()
, xml_remove()
, xml_replace()
, xml_root()
, and replacement methods for xml_name()
, xml_attr()
, xml_attrs()
and xml_text()
.
The basic process of creating an XML document by hand looks something like this:
root <- xml_new_document() %>% xml_add_child("root") root %>% xml_add_child("a1", x = "1", y = "2") %>% xml_add_child("b") %>% xml_add_child("c") %>% invisible() root %>% xml_add_child("a2") %>% xml_add_sibling("a3") %>% invisible() cat(as.character(root)) #> #>
For a complete description of creation and mutation, please see vignette("modification", package = "xml2")
.
xml_find_first()
xml_find_one()
has been deprecated in favor of xml_find_first()
. xml_find_first()
now always returns a single node: if there are multiple matches, it returns the first (without a warning), and if there are no matches, it returns a new xml_missing
object.
This makes it much easier to work with ragged/inconsistent hierarchies:
x1 <- read_xml("See Sea ") c <- x1 %>% xml_find_all(".//b") %>% xml_find_first(".//c") c #> {xml_nodeset (3)} #> [1] #> [2] See #> [3]Sea
Missing nodes are replaced by missing values in functions that return vectors:
xml_name(c) #> [1] NA "c" "c" xml_text(c) #> [1] NA "See" "Sea"
XPath and namespaces
XPath is challenging to use if your document contains any namespaces:
x <- read_xml('') x %>% xml_find_all(".//baz") #> {xml_nodeset (0)}
To make life slightly easier, the default xml_ns()
object is automatically passed to xml_find_*()
:
x %>% xml_ns() #> d1 <-> http://foo.com #> d2 <-> http://bar.com x %>% xml_find_all(".//d1:baz") #> {xml_nodeset (1)} #> [1]
If you just want to avoid the hassle of namespaces altogether, we have a new nuclear option: xml_ns_strip()
:
xml_ns_strip(x) x %>% xml_find_all(".//baz") #> {xml_nodeset (2)} #> [1]#> [2]
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.