Data Lineage

Lineage is an important part of understanding your data ecosystem, in this page you will learn how to:

  • Understand how different parts of your data relate to each other

  • Evaluate impact of changes to upstream or downstream data sources

  • Filter your data lineage by Data Type, Search Term, and by Excluding your Search Term

Watch this video for detailed information on using the lineage feature, or continue reading for an overview. Or skip to the next section to read about Lineage in more detail.

Lineage

Select Star can show you column-level lineage for your data assets. The lineage view is designed to show where the data is coming from and where is it flowing towards, so you can find dependencies of each table, column, or dashboard, and see how changes to your assets would impact your data environment.

When you connect a data source to Select Star, Lineage is automatically generated by parsing the SQL statements that ran in your data source.

There are 4 different views of lineage we show out of the box:

  1. Upstream: Shows the immediate upstream dependencies in a tree hierarchy

  2. Downstream: Shows the immediate downstream dependencies in a tree hierarchy

  3. Downstream Dashboards: Shows all dashboard dependencies downstream with extended information like Top User, or dashboard Popularity.

  4. Explore: Shows an advanced lineage graph that allows to navigate the flow of data at a column level.

There are many ways to see lineage: Check out our Rest API, or click on a column from the Column view to start exploring more advanced ways.

Lineage Graph

The Lineage graph shows the Upstream Sources and Downstream Targets of the data asset. You can explore the graph by (1) clicking on the tree hierarchy displayed on the left hand side, or (2) by clicking on each of the nodes and columns.

Please note that not all columns are shown in the lineage graph. Select star only shows columns that have any lineage. If a column has no lineage, it is not shown in the lineage graph.

📊 Make sure Dashboards is checked at the bottom of the lineage graph if you want to see downstream dashboards or reports.

💡 Want to keep track of all the tables you've clicked on? Check Auto-close unrelated assets is checked by default to keep your lineage loading quickly. If you'd like to keep everything you've clicked on open on the graph, uncheck this option.

Hover over any of the icons in the graph to show what will happen if you click on them.

Search is available within the Lineage Graph too, so exploring tables that have a large number of columns is easier. Follow the instructions below to show the search.

  1. Pin any given node by clicking it

  2. Click on the magnifying glass icon

  3. Type the term you are looking for

The search is available wherever you can find a magnifying glass icon. If you want to do a wider search through the whole graph, you can use the tree hierarchy to search through all the nodes.

Filtering Data Lineage

If you need to narrow down results, use some of our filtering features on your upstream or downstream lineage:

  • Filter by:

    • Data Type

    • Search Term

    • Exclude Search Term

Note that you can layer the Data Type and Search Term/Exclude Search Term, but Search Term and Exclude Search Term are not able to be layered/applied simultaneously.

Open your 🔍 Filtering Options to get started:

From there, you can Search by Term:

Search by Term and Filter by Data Type:

And Exclude Search Term:

Types of Data propagation

When talking about lineage, we say that data is propagated downwards to downstream data asset (another table, view, dashboard, etc). Data can be propagated as follows

  • AS IS: The data in the target is identical in value and format to that in the source.

  • AGGREGATED: The data in the target has been aggregated and the value in target may be different from the one at source.

  • TRANSFORMED: The data in the target has been aggregated and the format and values might be different from the ones at source.

When calculating lineage between your assets, we also automatically classify downstream propagation. You can see how a column is propagated by editing the column tags. Learn more about tagging in Tag Management.

Lineage FAQs

How often does lineage refresh?

Lineage refreshes approximately every 24 hours, after metadata sync is complete.

How does Select Star detect updates to lineage?

Select Star looks at DDL statements (used to build and modify the structure of your database) and DML statements (used to query and modify the data in your tables) to identify the lineage of your data.

Select Star will add new relationships to lineage based on both DDL (e.g. CREATE) and DML (e.g. INSERT/UPDATE) statements, however will only remove lineage relationships if a new DDL statement is detected.

Last updated