Databricks on Azure

Before You Start

Ensure Unity Catalog is enabled for your Databricks instance. For details, see Getting Started with Unity Catalog.

To connect Databricks to Select Star, you will need:

  • A Databricks instance on Azure. For details, see Databricks' documentation

  • Account admin permissions on the Databricks instance

  • Workspace admin permissions on the Databricks instance

Complete all of the following steps to see Databricks metadata, lineage, and popularity in Select Star:

1. Create a Service User in Databricks

Account admins can add users to the Databricks account using the account console or the SCIM Account API. These instructions focus on using the account console approach.

To add a service user to the account using the account console:

  1. As an account admin, log in to the account console.

  2. Click User management.

  3. On the Users tab, click Add user.

  4. Enter any email, first name and last name for the service user.

  5. Click Add.

💡 To use a service user, you must be able to successfully authenticate to it. Depending on the authentication method you have configured for Databricks account (e.g. SAML), you may also need to create a service user in the corporate identity provider, such as Microsoft Entra ID.

2. Assign the Service User to a Workspace Using the Account Console

Account admins can add service users to identity-federated workspaces using the following:

  • The account console

  • The Workspace Assignment API

The following instructions focus on using the account console approach.

To add a service user to a workspace using the account console, the workspace must be enabled for identity federation.

  1. As an account admin, log in to the account console.

  2. Click Workspaces.

  3. On the Permissions tab, click Add permissions.

  4. Search for and select the service user and assign the permission level (workspace Admin), and click Save.

These are the minimum permissions required for Select Star to collect basic metadata and query history. Query history is also used to generate Data Lineage.

3. Grant SQL and Workspace Access to the Service User

To grant SQL Warehouse access for a service user using the workspace admin console, the workspace must be enabled for identity federation.

  1. As a workspace admin, log in to the Databricks workspace.

  2. Click your username in the top bar of the Databricks workspace and select Admin Settings.

  3. Go to the Identity and access tab, and under Users click Manage.

  4. On the User tab, click the service user that was create in the previous steps.

  5. Select the checkbox for Databricks SQL access and Workspace access, and click Update.

4. Grant Service Users Permissions to the Catalog

  1. As a workspace admin, log in to a workspace that is linked to the metastore.

  2. Click Catalog.

  3. Click the catalog that needs to be granted access to, and select Permissions.

  4. Click Grant.

  5. Select the user/group and grant Privilege presets to Data Reader, and select the checkbox for USE CATALOG, USE SCHEMA and SELECT, and click Grant.

5. Generate an Access Token

To authenticate a service user to APIs on Databricks, an administrator can create a Access Tokens.

  1. As a service user, log in to a workspace.

  2. Click your username in the top bar of the Databricks workspace and select Admin Settings. Ensure that the visible username is the service user you created in the previous steps.

  3. Go to the Developer tab, and under Access tokens click Manage.

  4. Click Generate new token and fill form. Once submitted, preserve access token for later use.

6. Connect Databricks to Select Star

Go to the Select Star Settings. Click Data in the sidebar, then + Add to create a new Data Source.

Choose Databricks in the Source Type dropdown and provide the following information:

Display Name: This value is Databricks by default, but you can override it if desired.

Workspace URL: This is the address of the Workspace. This should include the <identifier>.azuredatabricks.net.

Access Token: This is the Access token, which is used to authenticate access to Databricks on Azure.

7. Select Catalogs and Schemas to Sync

After you fill in the information, you'll be asked to select the catalog you'd like to load into Select Star.

💡 Select Star will not read queries or metadata or generate lineage for Catalogs, schemas, or tables that are not loaded. Please load all data for which you expect to see lineage.

You can change the catalogs and schemas you have loaded if needed.

Select the catalogs and click Next.

For each catalog you selected, you'll be able to select the schemas.

Your metadata should start loading automatically. Please allow 24-48 hours to completely generate popularity and lineage.

When the sync is complete, you'll be able to explore Databricks in Select Star.

See the link below for more information on Databricks in Select Star.

pageGetting Started: Databricks

Last updated