Curl 0.9.2: tweaks and proxies for windows

[This article was first published on OpenCPU, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

opencpu logo

Version 0.9.2 of curl has been released to CRAN. The curl package implements a modern and flexible web client for R and is the foundation for the popular httr package. This update includes mostly tweaks for Windows.

Faster downloading

Alex Deng from Microsoft had diagnosed a problem with curl_fetch_memory (which is used by httr) being slower than expected on Windows. After some testing it turned out that the implemenation of realloc (to grow the buffer that holds downloaded data) is poorly optimized on Windows. It basically copies the entire memory block every time the size is increased, which results in a lot of copying for large downloads.

The new release includes a tweak to increase the buffer size exponentially, which solves the problem. This fix is wrapped in an #ifdef _WIN32 because usually the operating system does a better job in optimizing memory allocation than the programmer. But Windows needs a little help sometimes.

Updated libcurl

This release uses the latest build of libcurl and its dependencies from the rwinlib repository. These include:

  • libcurl 7.43.0
  • openssl 1.0.2d
  • libssh2 1.6.0
  • libiconv 1.14-5
  • libidn 1.31-1

The libcurl changelog lists the new features and bug fixes from this release.

Working with proxies

The new version includes two functions specifically for Windows to lookup system proxy settings. This can be used to configure curl to use the same proxy server, which is required to connect to the internet on some networks.

The ie_proxy_info function looks up your current proxy settings as configured in Internet Explorer. In the case of a dynamic proxy, the ie_get_proxy_for_url function shows if and which proxy should be used to connect to a particular URL. If your have an “automatic configuration script” this involves downloading and executing a PAC file.

You should be able to use address returned by ie_get_proxy_for_url as the proxy option in the curl handle to automatically use the correct proxy server for a given URL. However I do not have access to a network with a proxy server so I cannot actually test this feature. If you are on such a network, please help testing this feature.

curl_proxy <- function(url, verbose = TRUE){
  proxy <- ie_get_proxy_for_url(url)
  h <- new_handle(verbose = verbose, proxy = proxy)
  curl(url, handle = h)
}

con <- curl_proxy("https://httpbin.org/get")
readLines(con)

I also created a gist with some more details to test this feature. If it doesn’t work immediately, try fiddling around with some of the other libcurl proxy options and let me know what works!

To leave a comment for the author, please follow the link and comment on their blog: OpenCPU.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)