Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Series of Azure Databricks posts:

Yesterday we worked toward using notebooks and how to read data using notebooks.

Today we will check Databricks CLI and look into how you can use CLI to upload (copy) files from your remote server to DBFS.

Databricks CLI is a command-line interface (CLI) that provides an easy-to-use interface to the Databricks platform. Databricks CLI is from group of developer tools and should be easy to setup and straightforward to use. You can automate many of the tasks with CLI.

1.Installing the CLI

Using Python 3.6 (or above), run the following pip command in CMD:

pip3 install databricks-cli


But before using CLI, Personal access token needs to be created for authentication.

On your Azure Databricks Workspace home screen go to settings:

Click on Generate New Token and in dialog window, give a token name and lifetime.

After the token is generated, make sure to copy, because you will not be able to see it later. Token can be revoked (when needed), otherwise it has a expiry date (in my case 90 days). So make sure to remember to renew it after the lifetime period!

3. Working with CLI

Go back to CMD and run the following:

databricks --version


will give you the current version you are rocking. After that, let’s configure the connectivity.

databricks configure --token


and you will be prompted to insert two information (!)

Host is is available for you in your browser. Go to Azure databricks tab/Browser and copy paste the URL:

And the token, that has been generated for you in step two. Token should look like: dapib166345f2938xxxxxxxxxxxxxxc.

Once you insert both information, the connection is set!

By using bash commands, now you can work with DBFS from your local machine / server using CLI. For example:

databricks fs ls


will list all the files on root folder of DBFS of your Azure Databricks

Databricks has already shorthanded / aliased databricks fs command to simply dbfs. Essentially following commands are equivalent:

databricks fs ls
dbfs ls


so using DBFS CLI means in otherwords using Databricks FileStore CLI. And with this, we can start copying a file. So copying from my local machine to Azure Databricks should look like:

dbfs cp /mymachine/test_dbfs.txt dbfs:/FileStore/file_dbfs.txt


My complete bash code (as seen on the screen shot) is:

pwd
touch test_dbfs.txt
dbfs cp test_dbfs.txt dbfs:/FileStore/file_dbfs.txt


And after refreshing the data on my Databricks workspace, you can see that the file is there. Commands pwd and touch are here merely for demonstration.

This approach can be heavily automated for daily data loads to Azure Databricks, delta uploads, data migration or any other data engineering and data movement task. And also note, that Databricks CLI is a powerful tool with broader usage.

Tomorrow we will check how to connect Azure Blob storage with Azure Databricks and how to read data from Blob Storage in Notebooks.

Complete set of code and Notebooks will be available at the Github repository.

Happy Coding and Stay Healthy!