LogoLogo
About UsCustomersResourcesGet Started for Free
  • What is Select Star?
  • 🏁Getting Started
    • 1. Data Source Setup
    • 2. Mark Service Accounts
    • 3. Hide Unwanted Datasets
    • 4. Invite Owners
    • 5. Add Documentation
    • Next Steps
  • πŸ”„Integrations
    • Snowflake
      • Using Key Pair Authentication
      • Using Password Authentication
      • Snowflake Tag Sync
      • Snowflake Key Pair Rotation
    • Databricks
      • Databricks on AWS
      • Databricks on Azure
    • BigQuery
    • AWS Redshift
      • Manual setup
    • Microsoft SQL Server / MS SQL (beta)
      • Query Logs
    • MySQL (beta)
      • Query Logs
    • Oracle (beta)
      • Query Logs
    • Salesforce (beta)
    • DB2 (beta)
    • PostgreSQL
      • AWS Aurora PostgreSQL
      • AWS RDS PostgreSQL
      • PostgreSQL on-prem
    • AWS Glue (beta)
    • dbt
      • dbt Cloud
      • dbt Core (open source)
      • dbt Tags
      • dbt Tests
      • dbt docs Sync
        • Github dbt docs Sync
        • Bitbucket dbt docs Sync
      • dbt Impact Report
      • dbt Project Dependencies
    • Apache Airflow (beta)
    • Tableau
      • Tableau Cloud
      • Tableau Server
    • PowerBI
    • Looker
    • Metabase
    • Fivetran (beta)
    • Mode
    • Sigma Computing
    • Sisense / Periscope (beta)
    • Looker Studio (beta)
    • ThoughtSpot
    • QuickSight (beta)
      • Event Logs
    • Hex (beta)
    • Slack
    • Monte Carlo
    • Private Network
    • Request an Integration
  • ✨Features
    • Search
    • Table Page
    • Database Page
    • Dashboard Page
    • Data Lineage
    • Entity Relationship Diagram (ERD)
    • Queries & Joins
    • Tags
    • Teams
    • Discussion
    • Downstream Notifications
    • Documentation
      • Pages
      • Metrics
        • Metrics Generation
      • Glossary
    • Automated Documentation
    • User Analytics
    • Chrome Extension
      • Organization-wide install
    • Source Tables
    • Cost Analysis
    • Schema Change Detection
    • AI Features & Settings
      • Ask AI Chatbot
    • Request a Feature
  • 🧭Data Discovery
    • Where's my data?
    • Where's my dashboard?
    • How can I get the full context of this data?
    • My dashboard looks off
    • Change management
    • I'm new to the team
    • I have a data question
  • πŸ—ƒοΈData Management
    • Add Documentation
      • CSV Metadata Upload
    • Collections
    • Tags
    • Data Ownership
    • Sensitive / PII Data
    • Automated PII Detection
  • πŸ“šLearning Data
    • Getting Started: Looker
    • Getting Started: Mode
    • Getting Started: Tableau
    • Getting Started: Snowflake
    • Getting Started: Databricks
    • Getting Started: Data Warehouse
    • Getting Started: BigQuery
      • Nested Fields
    • Getting Started: Sigma
    • Getting Started: ThoughtSpot
  • πŸ› οΈData Source Management
    • Manage Data Sources
    • Connect Data Source Users to Select Star
    • Custom Attributes
    • Recent Queries
  • πŸ‘₯User Management
    • Invite Users
    • Roles & Permissions
    • SAML SSO
    • Importing Roles and Teams (Okta)
    • Policy Based Access Control
    • Account and User Settings
  • πŸ’»Select Star API
    • Overview
    • API Token
    • Getting Started
    • Rich Text Descriptions via API
    • Troubleshooting
    • API Examples
    • API Reference
  • πŸ”“Security & Compliance
  • ❓FAQ
    • Icon Map
  • πŸ“°Changelog
    • May 20, 2025 - Chrome Extension, Notifications, and More!
    • April 16, 2025 - Semantic Models, AI Metrics, and More!
    • March 12, 2025 - Fivetran Integration, Tableau Updates and More!
    • February 6, 2025 - Collections, Slack App Published, Salesforce Formula Lineage and more!
    • December 10, 2024 - Hex Integration, Impact Score & Snowflake Key Pair Authentication!
    • November 13, 2024 - New Navigation, Airflow and More!
    • September 30, 2024 - Upstream Data Quality Issue Tracking & 5 New Integrations!
    • August 30, 2024 - Monte Carlo, dbt Cross-Project Lineage
    • July 31, 2024 - Glossary Import, Lineage Updates & more!
    • July 9, 2024 - Lineage Explorer 2.0, Slack AI and Notifications
    • February 29, 2024 - AI Chat, Schema Change Notifications
    • February 23, 2024 - Manual Lineage Creation
    • November 23, 2023 - Bulk AI Documentation
    • October 19, 2023 - Downstream Notifications
    • October 16, 2023 - New Homepage
    • October 13, 2023 - dbt Impact Report
    • Historical Changelogs
  • Security & Compliance
  • System Status
Powered by GitBook
On this page
  • Before you start
  • 1. Create a new Data Source in Select Star
  • 2. Configure Apache Airflow
  • Install OpenLineage provider
  • Transport setup
  • 3. Sync Metadata in Select Star

Was this helpful?

  1. Integrations

Apache Airflow (beta)

Follow these steps to connect your Apache Airflow to Select Star (via OpenLineage).

Previousdbt Project DependenciesNextTableau

Last updated 6 months ago

Was this helpful?

Before you start

To connect Apache Airflow to Select Star, you will need...

  • Permission to install and update packages in your Airflow environment

Select Star won't need any permissions for your Airflow directly, but you will need to install a Python package and configure an environment variable in your Airflow environment.

Complete the following steps to connect Apache Airflow to Select Star.

Note that Select Star does not connect to Apache Airflow directly. Instead, we connect via , which is an open platform for collection and analysis of data lineage. It tracks metadata about your Apache Airflow datasets and DAGs, DAG Runs, and sends that metadata to Select Star. Airflow DAGs will not appear in the catalog until metadata is received and ingestion is run.

1. Create a new Data Source in Select Star

Go to the Select Star Settings. Click Data in the sidebar, then + Add to create a new Data Source.

Fill the form in the required information:

  • Display Name - This value is Apache Airflow by default, but you can override it.

  • Source Type - Choose Apache Airflow from the dropdown.

  • Base URL - The URL of your Apache Airflow instance. For example, http://airflow.example.com.

Click Save to proceed.

On the next screen, you will get the API Token, Events Endpoint and the Events URL. You will need these in the next steps to configure your Apache Airflow environment.

  • API Token - This is a secret key that Select Star will use to authenticate the traffic coming from your Apache Airflow instance.

  • Events Endpoint - This is the Select Star endpoint where your Apache Airflow instance will send OpenLineage events, containing the metadata about your DAGs, DAG Runs, and datasets.

  • Events URL - This is the Select Star Base URL where your Apache Airflow instance will send OpenLineage events.

2. Configure Apache Airflow

Install OpenLineage provider

Install the provider package or add the following line to your requirements file (usually requirements.txt):

apache-airflow-providers-openlineage==1.10.0

Transport setup

  1. Self-hosted Apache Airflow Provide a Transport configuration so that OpenLineage knows where to send the events. Keep the API Token and Events Endpoint handy from the previous step.

  • Within airflow.cfg file

[openlineage]
transport = {"type": "http", "url": "<EVENTS_URL_PROVIDED_BY_SELECT_STAR>", "endpoint": "<EVENTS_ENDPOINT_PROVIDED_BY_SELECT_STAR>", "auth": {"type": "api_key", "api_key": "<API_KEY_PROVIDED_BY_SELECT_STAR>"}}
  • or with AIRFLOW__OPENLINEAGE__TRANSPORT environment variable

AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "http", "url": "<EVENTS_URL_PROVIDED_BY_SELECT_STAR>", "endpoint": "<EVENTS_ENDPOINT_PROVIDED_BY_SELECT_STAR>", "auth": {"type": "api_key", "api_key": "<API_KEY_PROVIDED_BY_SELECT_STAR>"}}'
  1. Amazon Managed Workflows for Apache Airflow (MWAA) In the case of Amazon MWAA, the installation of OpenLineage does not change, however, setting up transport is done using the plugin.

First, create env_var_plugin.py file. Paste the following code:

from airflow.plugins_manager import AirflowPlugin
import os

os.environ["AIRFLOW__OPENLINEAGE__NAMESPACE"] = "airflow"
os.environ["AIRFLOW__OPENLINEAGE__TRANSPORT"] = '''{
  "type": "http", 
  "url": "<EVENTS_URL_PROVIDED_BY_SELECT_STAR>",
  "endpoint": "<EVENTS_ENDPOINT_PROVIDED_BY_SELECT_STAR>",
  "auth": { 
    "type": "api_key", 
    "api_key": "<API_KEY_PROVIDED_BY_SELECT_STAR>"
   }
}'''
os.environ["AIRFLOW__OPENLINEAGE__CONFIG_PATH"] = ""
os.environ["AIRFLOW__OPENLINEAGE__DISABLED_FOR_OPERATORS"] = ""


class EnvVarPlugin(AirflowPlugin):
    name = "env_var_plugin"

If you already have plugins.zip file, add env_var_plugin.py to it. Otherwise, you can create it by calling:

zip plugins.zip env_var_plugin.py

Update your plugins in MWAA environment by following these steps:

  • Upload plugins.zip to the S3 bucket associated with MWAA environment.

  • Go to your MWAA environment

  • Click Edit

  • Scroll to section DAG code in Amazon S3

  • Under Plugins file choose your plugins.zip file and set the version to the latest

NOTE: You should do the same for requirements.txt file

Now environment will update itself by downloading and installing the plugin. It may take a while for changes to take effect.

That’s it! OpenLineage events should be sent to the Select Star when DAGs are run.

3. Sync Metadata in Select Star

After you have configured your Apache Airflow environment, make sure to trigger your healthcheck DAGs. This will send OpenLineage events to Select Star, and help you verify that the integration is working correctly.

Afterwards, you can go to the Select Star Settings and click on the Data in the sidebar. Click on the Sync metadata button on your Apache Airflow Data Source.

Note that Select Star does not connect to Apache Airflow directly. That means the lineage and your DAGs metadata will be available in Select Star only after you run your DAGs and OpenLineage events are sent to Select Star and ingestion is completed.

If you want to examine OpenLineage events without sending them anywhere, you can set up ConsoleTransport. The events will end up in task logs.

[openlineage]
transport = {"type": "console"}

Make sure to replace <EVENTS_URL_PROVIDED_BY_SELECT_STAR>, <EVENTS_ENDPOINT_PROVIDED_BY_SELECT_STAR> and <API_KEY_PROVIDED_BY_SELECT_STAR> with the actual values provided by Select Star in

Make sure to replace <EVENTS_URL_PROVIDED_BY_SELECT_STAR>, <EVENTS_ENDPOINT_PROVIDED_BY_SELECT_STAR> and <API_KEY_PROVIDED_BY_SELECT_STAR> with the actual values provided by Select Star in

For more details on using OpenLineage integration with Apache Airflow, please read the .

πŸ”„
Step 1
Step 1
official airflow documentation
OpenLineage
Create a new Data Source in Select Star
Configure Apache Airflow
Sync Metadata in Select Star