Data Lineage
Last updated
Last updated
Lineage is an important part of understanding your data ecosystem, in this page you will learn how to:
Understand how different parts of your data relate to each other
Evaluate impact of changes to upstream or downstream data sources
Filter your data lineage by Data Type, Search Term, and by Excluding your Search Term
Watch this video for detailed information on using the lineage feature, or continue reading for an overview. Or skip to the next section to read about Lineage in more detail.
Select Star can show you column-level lineage for your data assets. The lineage view is designed to show where the data is coming from and where is it flowing towards, so you can find dependencies of each table, column, or dashboard, and see how changes to your assets would impact your data environment.
When you connect a data source to Select Star, Lineage is automatically generated by parsing the SQL statements that ran in your data source.
There are 4 different views of lineage we show out of the box:
Upstream: Shows the immediate upstream dependencies in a tree hierarchy
Downstream: Shows the immediate downstream dependencies in a tree hierarchy
Downstream Dashboards: Shows all dashboard dependencies downstream with extended information like Top User, or dashboard Popularity.
Explore: Shows an advanced lineage graph that allows to navigate the flow of data at a column level.
There are many ways to see lineage: Check out our Rest API, or click on a column from the Column view to start exploring more advanced ways.
The Lineage graph shows the Upstream Sources and Downstream Targets of the data asset. You can explore the graph by (1) clicking on the tree hierarchy displayed on the left hand side, or (2) by clicking on each of the nodes and columns.
Please note that not all columns are shown in the lineage graph. Select star only shows columns that have any lineage. If a column has no lineage, it is not shown in the lineage graph.
Search is available within the Lineage Graph too, so exploring tables that have a large number of columns is easier. Follow the instructions below to show the search.
Pin any given node by clicking it
Click on the magnifying glass icon
Type the term you are looking for
The search is available wherever you can find a magnifying glass icon. If you want to do a wider search through the whole graph, you can use the tree hierarchy to search through all the nodes.
If you need to narrow down results, use some of our filtering features on your upstream or downstream lineage:
Filter by:
Data Type
Search Term
Exclude Search Term
Note that you can layer the Data Type and Search Term/Exclude Search Term, but Search Term and Exclude Search Term are not able to be layered/applied simultaneously.
Open your 🔍 Filtering Options to get started:
From there, you can Search by Term:
Search by Term and Filter by Data Type:
And Exclude Search Term:
When talking about lineage, we say that data is propagated downwards to downstream data asset (another table, view, dashboard, etc). Data can be propagated as follows
AS IS
: The data in the target is identical in value and format to that in the source.
AGGREGATED
: The data in the target has been aggregated and the value in target may be different from the one at source.
TRANSFORMED
: The data in the target has been aggregated and the format and values might be different from the ones at source.
When calculating lineage between your assets, we also automatically classify downstream propagation. You can see how a column is propagated by editing the column tags. Learn more about tagging in Tag Management.
Lineage refreshes approximately every 24 hours, after metadata sync is complete.
Select Star looks at DDL statements (used to build and modify the structure of your database) and DML statements (used to query and modify the data in your tables) to identify the lineage of your data.
Select Star will add new relationships to lineage based on both DDL (e.g. CREATE) and DML (e.g. INSERT/UPDATE) statements, however will only remove lineage relationships if a new DDL statement is detected.