Introduction to AzureKusto

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

By Hong Ooi and Alex Kyllo

This post is to announce the availability of AzureKusto, the R interface to Azure Data Explorer (internally codenamed “Kusto”), a fast, fully managed data analytics service from Microsoft. It is available from CRAN, or you can install the development version from GitHub via devtools::install_github("cloudyr/AzureKusto").

AzureKusto provides an interface (including DBI compliant methods for connecting to Kusto clusters and submitting Kusto Query Language (KQL) statements, as well as a dbplyr style backend that translates dplyr queries into KQL statements. On the administrator side, it extends the AzureRMR framework to allow for creating clusters and managing database principals.

Connecting to a cluster

To connect to a Data Explorer cluster, call the kusto_database_endpoint() function. Once you are connected, call run_query() to execute queries and command statements.

library(AzureKusto)

## Connect to a Data Explorer cluster with (default) device code authentication
Samples <- kusto_database_endpoint(
    server="https://help.kusto.windows.net",
    database="Samples")

res <- run_query(Samples,
    "StormEvents | summarize EventCount = count() by State | order by State asc")
head(res)

##            State EventCount
## 1        ALABAMA       1315
## 2         ALASKA        257
## 3 AMERICAN SAMOA         16
## 4        ARIZONA        340
## 5       ARKANSAS       1028
## 6 ATLANTIC NORTH        188

# run_query can also handle command statements, which begin with a '.' character
res <- run_query(Samples, ".show tables | count")
res[[1]]

##   Count
## 1     5

dplyr Interface

The package also implements a dplyr-style interface for building a query upon a tbl_kusto object and then running it on the remote Kusto database and returning the result as a regular tibble object with collect(). All the standard verbs are supported.

library(dplyr)
StormEvents <- tbl_kusto(Samples, "StormEvents")
q <- StormEvents %>%
    group_by(State) %>%
    summarize(EventCount=n()) %>%
    arrange(State)

show_query(q)
## <KQL> database('Samples').['StormEvents']
## | summarize ['EventCount'] = count() by ['State']
## | order by ['State'] asc

collect(q)
## # A tibble: 67 x 2
##    State          EventCount
##    <chr>               <dbl>
##  1 ALABAMA              1315
##  2 ALASKA                257
##  3 AMERICAN SAMOA         16
## ...

DBI interface

AzureKusto implements a subset of the DBI specification for interfacing with databases in R.

The following methods are supported:

  • Connections: dbConnect, dbDisconnect, dbCanConnect
  • Table management: dbExistsTable, dbCreateTable, dbRemoveTable, dbReadTable, dbWriteTable
  • Querying: dbGetQuery, dbSendQuery, dbFetch, dbSendStatement, dbExecute, dbListFields, dbColumnInfo

It should be noted, though, that Data Explorer is quite different to the SQL databases that DBI targets. This affects the behaviour of certain DBI methods and renders other moot.

library(DBI)

Samples <- dbConnect(AzureKusto(),
                     server="https://help.kusto.windows.net",
                     database="Samples")

dbListTables(Samples)
## [1] "StormEvents"       "demo_make_series1" "demo_series2"     
## [4] "demo_series3"      "demo_many_series1"

dbExistsTable(Samples, "StormEvents")
##[1] TRUE

dbGetQuery(Samples, "StormEvents | summarize ct = count()")
##      ct
## 1 59066

If you have any questions, comments or other feedback, please feel free to open an issue on the GitHub repo.

And one more thing...

As of Build 2019, Data Explorer can also run R (and Python) scripts in-database. For more information on this feature, currently in public preview, see the Azure blog and the documentation article.

 

 

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)