My employer decided to switch to PIWIK PRO, too. So I was looking for a way to access the data PIWIK PRO was collecting to process it with R. When we used Google Analytics as web analytics tool I used RGoogleAnalytics. I added some enhancements such as caching the data and splitting the requests into daily chunks to handle sampling issues with Google Analytcs.
Unfortunately I haven’t found any R package providing access to PIWIK PRO data. So I wrote my own: piwikproR
Here I want to show you how to use it.
piwikproR isn’t yet available at CRAN. But using
installation from github is as simple as out of CRAN:
Before we can use the API of PIWIK PRO we have to generate API credentials.
Doing so we get two strings:
With these two strings we can generate a token for the actual access. So let’s put the credentials into a list:
1 2 3 4 5 6 7 8 9 10
library(piwikproR) piwik_pro_credentials <- list( client_id = "CLIENT_ID", client_secret = "CLIENT_SECRET", url = "https://my_site.piwik.pro" ) # Fetch token token <- get_login_token(piwik_pro_credentials)
Now let’s define which columns we want to fetch. Therefor we build a tibble containing the column-name and an optional transformation:
1 2 3 4 5 6
columns <- tibble::tribble( ~column, ~transformation, "timestamp", "", "event_url", "to_path", "page_views", "", )
In the example above we will get the date as the first column, the path-part of
each url (instead of
will get only
/some/path/to/the/site.html) and the last column contains the
number of page_views.
For further details take a look at the documentation at PIWIK PRO.
As an optional part we can pass a filter to the API-call so the server will do the filtering.
Let’s say we’re only interested in page_views generated by Desktop-devices. So we build the following filter-object:
1 2 3 4 5
filters <- tibble::tribble( ~column, ~operator, ~value, "device_type", "eq", 0 ) filters <- build_filter(filters, "and")
Adding more lines to filters would add more criteria.
Fetching the data
Now it’s time to fetch the data. We have to choose the date range and the actual website we’re fetching the data for:
1 2 3 4 5 6 7 8 9
website_id <- 'my_website_id' start.date <- "2021-04-01" end.date <- "2021-04-30" query <- build_query(lubridate::ymd(start.date), lubridate::ymd(end.date), website_id, filters = filters, columns, max_lines = 0 ) data <- send_query(query, token, caching = TRUE, fetch_by_day = FALSE)
data is a tibble containing the specified columns.
PIWIK PRO provides a detailed documentation for their API at https://developers.piwik.pro/en/latest/custom_reports/index.html.