# Databricks on Azure

## **Before you start**

{% hint style="info" %}
Ensure Unity Catalog is enabled for your Databricks instance. For details, see [Getting Started with Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/get-started.html).
{% endhint %}

To connect Databricks to Select Star, you will need:

* A Databricks instance on Azure. For details, see [Databricks' documentation](https://www.databricks.com/product/azure)
* Account admin permissions on the Databricks instance
* Workspace admin permissions on the Databricks instance

Complete all of the following steps to see Databricks metadata, lineage, and popularity in Select Star:

1. [Create a service user in Databricks](#id-1.-create-a-service-user-in-databricks)
2. [Assign the service user to a workspace using the account console](#id-2.-assign-a-service-user-to-a-workspace-using-the-account-console)
3. [Grant SQL and Workspace access to the service user](#id-3.-grant-sql-and-workspace-access-for-a-service-user)
4. [Grant service user permissions to the catalog](#id-4.-grant-permissions-to-a-catalog-for-a-service-user)
5. [Grant permission to a workspace for a service user](#id-5.-grant-permission-to-a-workspace-for-a-service-user)
6. [Generate an access token](#id-6.-generate-an-access-token)
7. [Configure System tables lineage (Recommended)](#id-7.-configure-system-tables-lineage-recommended)
8. [Connect Databricks to Select Star](#id-8.-connect-databricks-to-select-star)
9. [Choose Catalogs and Schemas](#id-9.-choose-catalogs-and-schemas)

## 1. Create a Service User in Databricks

Account admins can add users to the Databricks account using the account console or the SCIM Account API. These instructions focus on using the account console approach.

To add a service user to the account using the account console:

1. As an account admin, log in to the [account console](https://accounts.azuredatabricks.net/).
2. Click **User management**.
3. On the **Users** tab, click **Add user**.
4. Enter any email, first name and last name for the service user.
5. Click **Add**.

{% hint style="info" %}
💡 To use a service user, you must be able to successfully authenticate to it. Depending on the authentication method you have configured for Databricks account (e.g. SAML), you may also need to create a service user in the corporate identity provider, such as Microsoft Entra ID.
{% endhint %}

## 2. Assign the Service User to a Workspace Using the Account Console

Account admins can add service users to [identity-federated workspaces](https://docs.databricks.com/administration-guide/users-groups/index.html#assign-users-to-workspaces) using the following:

* The account console
* The Workspace Assignment API

The following instructions focus on using the account console approach.

To add a service user to a workspace using the account console, the workspace must be enabled for identity federation.

1. As an account admin, log in to the [account console](https://accounts.cloud.databricks.com/).
2. Click **Workspaces**.
3. On the **Permissions** tab, click **Add permissions**.
4. Search for and select the service user and assign the permission level (workspace **Admin**), and click **Save**.

These are the minimum permissions required for Select Star to collect basic metadata and query history. Query history is also used to generate [Data Lineage](/features/lineage.md).

## 3. Grant SQL and Workspace Access to the Service User

To grant SQL Warehouse access for a service user using the workspace admin console, the workspace must be enabled for identity federation.

1. As a workspace admin, log in to the Databricks workspace.
2. Click your username in the top bar of the Databricks workspace and select **Admin Settings**.
3. Go to the **Identity and access** tab, and under **Users** click **Manage**.
4. On the **User** tab, click the service user that was create in the previous steps.
5. Select the checkbox for **Databricks SQL access** and **Workspace access**, and click **Update**.

   ![Entitlements for service user](/files/eDivgDOzA25CMziy2wz4)

## 4. Grant Service Users Permissions to the Catalog

1. As a workspace admin, log in to a workspace that is linked to the metastore.
2. Click **Catalog**.
3. Click the catalog that needs to be granted access to, and select **Permissions**.

   <figure><img src="/files/F3hbd3nNazoa8oEFjFCE" alt=""><figcaption><p>Catalog permissions in the Catalog Explorer UI</p></figcaption></figure>
4. Click **Grant**.
5. Select the user/group and grant Privilege presets to **Data Reader**, and select the checkbox for **USE CATALOG, USE SCHEMA** and **SELECT**, and click **Grant**.

![Privileges for service user or User groups](/files/9n4uPAAMpGkxv15vNe0T)

## 5. Grant permission to a workspace for a service user

This step is required to show notebooks in the catalog and notebook lineage.

1. Log in to a workspace that is linked to the metastore.
2. Click **Workspace** and select top folder.
3. Click **Share** button.

   <figure><img src="/files/R9iUdxP4tLhhddhtgQTi" alt=""><figcaption><p>Folder permissions in the Workspace explore UI</p></figcaption></figure>
4. Select the user/group, then select permission "Can view", and click **Add**.

   <figure><img src="/files/KlTjiC5laAMMf2UGBeiK" alt=""><figcaption><p>Permission grant in Workspace share</p></figcaption></figure>

## 6. Generate an Access Token

To authenticate a service user to APIs on Databricks, an administrator can create a Access Tokens.

1. As a **service user**, log in to a workspace.
2. Click your username in the top bar of the Databricks workspace and select **Admin Settings**. Ensure that the visible username is the service user you created in the previous steps.
3. Go to the **Developer** tab, and under **Access tokens** click **Manage**.
4. Click **Generate new token** and fill out the form. Once submitted, preserve access token for later use.

## **7. Configure System tables lineage (Recommended)**

{% hint style="info" %}
💡 This section is optional but recommended. System tables lineage provides better performance and scalability by using Databricks system tables instead of individual API calls. If you skip this section, Select Star will use API lineage collection.
{% endhint %}

System tables lineage requires additional permissions beyond the basic setup. These permissions allow Select Star to query Databricks system tables that contain lineage metadata, without accessing your actual data.

### **Grant SQL Warehouse access permissions**

The service user needs permission to use a specific SQL Warehouse for executing lineage queries.

1. In your Databricks workspace, go to **SQL Warehouses**.
2. Select the SQL Warehouse you want to use for Select Star.
3. Click the **Permissions** button.
4. Click **Add** and search for your service user.
5. Grant **Can use** permission and click **Add**.

<figure><img src="/files/wiJWUOLfKXMcf5I7W8gY" alt=""><figcaption><p>Grant CAN USE permission on SQL Warehouse</p></figcaption></figure>

{% hint style="info" %}
💡 Note the **Warehouse ID** from the SQL Warehouse details page - you'll need this when connecting to Select Star.
{% endhint %}

<figure><img src="/files/dG1MClQn0ycInsY1a4Kd" alt=""><figcaption><p>SQL Warehouse ID location</p></figcaption></figure>

### **Grant system.access schema permissions**

The service user needs permissions to read lineage data from Databricks system tables.

1. In your Databricks workspace, go to **Catalog**.
2. Select the **system** catalog.
3. Select the **access** schema.
4. Go to the **Permissions** tab.
5. Click **Grant** and search for your service user.
6. Select **USE** and **SELECT** permissions.
7. Click **Grant**.

<figure><img src="/files/rNlPL4ja6IiAP5Yz0miR" alt=""><figcaption><p>Grant permissions on system.access schema</p></figcaption></figure>

### **Ensure SQL access entitlement**

Verify that your service user has the SQL access entitlement enabled:

1. In your Databricks workspace, click your username and select **Admin Settings**.
2. Go to the **Identity and access** tab, and under **Users** click **Manage**.
3. Click on your service user.
4. Ensure **Databricks SQL access** is checked and click **Update** if needed.

<figure><img src="/files/BCSwWnSrFD0EUeQyUwaz" alt=""><figcaption><p>Enable SQL access for service user</p></figcaption></figure>

## **8. Connect Databricks to Select Star**

Go to the Select Star **Settings**. Click **Data** in the sidebar, then **+ Add** to create a new Data Source.

<figure><img src="/files/9BYfkitHCNhECWwMfX9v" alt=""><figcaption></figcaption></figure>

Choose **Databricks** in the Source Type dropdown and provide the following information:

<figure><img src="/files/GEGxuKC1kjv42prllmDR" alt=""><figcaption></figcaption></figure>

**Display Name:** This value is `Databricks` by default, but you can override it if desired.

**Workspace URL:** This is the address of the Workspace. This should include the `<identifier>.azuredatabricks.net`.

**Access Token:** This is the **Access token**, which is used to authenticate access to Databricks on Azure.

**Lineage Method:** Choose between System tables (recommended) or API lineage collection.

**SQL Warehouse ID:** Required when using System tables lineage. This is the Warehouse ID noted in Step 7. Not available for use with API lineage.

## **9. Choose Catalogs and Schemas**

After you fill in the information, you'll be asked to select the catalog you'd like to load into Select Star.

{% hint style="info" %}
💡 Select Star will not read queries or metadata or generate lineage for Catalogs, schemas, or tables that are not loaded. Please load all data for which you expect to see lineage.
{% endhint %}

You can [change the catalogs and schemas](https://docs.selectstar.com/data-source-management/manage-data-sources#configure-a-data-source) you have loaded if needed.

Select the catalogs and click **Next**.

<figure><img src="/files/RapsV8PCuymb1MAfXXAL" alt=""><figcaption></figcaption></figure>

For each catalog you selected, you'll be able to select the schemas.

<figure><img src="/files/EOScBnNVbuQhqqs2OmoU" alt=""><figcaption></figcaption></figure>

Your metadata should start loading automatically. Please allow 24-48 hours to completely generate popularity and lineage.

When the sync is complete, you'll be able to explore Databricks in Select Star.

See the link below for more information on Databricks in Select Star.

{% content-ref url="/pages/KGlgaHmYbTKOq1TVMuVR" %}
[Getting Started: Databricks](/learning-data/getting-started-databricks.md)
{% endcontent-ref %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.selectstar.com/integrations/databricks/databricks-azure.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
