Manual setup

This guide provides comprehensive instructions for manually setting up Redshift integration with Select Star. The setup process includes creating an IAM role, configuring Redshift logging, managing databases and users, and establishing network access. Follow these steps carefully to ensure a successful integration.

Use this manual process if automatic integration using the AWS CloudFormation template is not possible. However, it is strongly recommended to use the AWS CloudFormation template setup to minimize manual errors and ensure all necessary steps are completed efficiently.

1. Create a new data source

1. Go to the Select Star Settings. Click Data in the sidebar, then + Add to create a new Data Source.

2. Fill in the required information:

  • zSource Type:** Select "Redshift"

  • Display Name: This value is Redshift by default, but you can override it if desired.

  • Cluster Name: The name of your AWS Redshift cluster in the AWS management console. Also known as "Cluster identifier" by AWS.

  • Database: The name of the database in AWS Redshift you've given us access to.

  • AWS Region: ID of the AWS region where the cluster was created. For example us-east-2, us-west-1, eu-central-1.

3. Click Connect.

2. Access AWS Console

  1. Log into the AWS Management Console

    • Ensure you have the necessary permissions to manage Redshift clusters and VPC.

3. Check Cluster State

  1. Ensure the Cluster is Available

    • The cluster should be publicly accessible. Modify the cluster if necessary.

    • For instances operating within a non-publicly accessible environment, such as an AWS VPC, please refer to our guide on Integrating Private Network Data Sources for detailed instructions and best practices.

4. Logging Configuration

  1. In the “Properties” Tab Scroll to the “Database Configurations” Section

  2. Enable Audit Logging:

    • Use the "Edit" button to enable audit logging by specifying an S3 bucket for logs.

For details, refers to AWS documentation about database audit logging.

5. Cluster Parameter Group

  1. If the Current Parameter Group is Default, Create a New Custom Parameter Group

    • Base it on the family of the existing parameter group.

  2. Apply This New Parameter Group to Your Cluster

For details, refers to AWS documentation about managing parameter groups using the console.

6. Enable User Activity Logging

  1. Modify the Custom Parameter Group

    • Set the enable_user_activity_logging parameter to true.

7. Reboot Cluster

  1. Reboot the Cluster if Changes Require It

    • Operations such as switching the parameter group or modifying its parameters may require a restart to apply.

    • If required by your organization's management policy, ensure to schedule downtime before restarting the cluster.

8. Database and User Management

  1. Go to the “Query Editor” in Redshift

  2. Run SQL Commands to Create Users or Grant Permissions:

    CREATE USER selectstar WITH PASSWORD DISABLE syslog ACCESS UNRESTRICTED;
    GRANT SELECT ON SVV_TABLE_INFO TO selectstar;
    GRANT SELECT ON SVV_TABLES TO selectstar;
    GRANT SELECT ON SVV_COLUMNS TO selectstar;
    GRANT SELECT ON STL_QUERYTEXT TO selectstar;
    GRANT SELECT ON STL_DDLTEXT TO selectstar;
    GRANT SELECT ON STL_QUERY TO selectstar;

    Retry operation in context of each database which should be ingested in SelectStar

9. IAM Role Management

1. Navigate to the AWS IAM Console 2. Create a new IAM Role 3. Note ARN of AWS IAM Role

  • It will be required in Select Star UI. 4. Use In-line Permission Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "redshift:DescribeClusters",
                "redshift:DescribeLoggingStatus",
                "redshift:ListSchemas",
                "redshift:ListTables",
                "redshift:ListDatabases",
                "redshift:ExecuteQuery",
                "redshift:FetchResults",
                "redshift:CancelQuery",
                "redshift:DescribeQuery",
                "redshift:DescribeTable",
                "redshift:ViewQueriesFromConsole"
            ],
            "Resource": "arn:aws:redshift:{region_id}:{account_id}:cluster:{cluster_name}",
            "Effect": "Allow"
        },
        {
            "Action": [
                "redshift:GetClusterCredentials"
            ],
            "Resource": "arn:aws:redshift:{region_id}:{account_id}:dbuser:{cluster_name}/selectstar",
            "Effect": "Allow"
        },
        {
            "Action": [
                "redshift:GetClusterCredentials"
            ],
            "Resource": [
                "arn:aws:redshift:{region_id}:{account_id}:dbname:{cluster_name}/*"
            ],
            "Effect": "Allow"
        },
        {
            "Action": [
                "redshift-data:ExecuteStatement",
                "redshift-data:ListDatabases",
                "redshift-data:ListSchemas",
                "redshift-data:ListTables",
                "redshift-data:DescribeTable"
            ],
            "Resource": "arn:aws:redshift:{region_id}:{account_id}:cluster:{cluster_name}",
            "Effect": "Allow",
            "Sid": "DataAPIPermissions"
        },
        {
            "Action": [
                "redshift-data:GetStatementResult",
                "redshift-data:CancelStatement",
                "redshift-data:DescribeStatement",
                "redshift-data:ListStatements"
            ],
            "Resource": "*",
            "Effect": "Allow",
            "Sid": "DataAPIIAMSessionPermissionsRestriction"
        },
        {
            "Action": [
                "s3:GetLifecycleConfiguration",
                "s3:GetBucketTagging",
                "s3:GetInventoryConfiguration",
                "s3:GetObjectVersionTagging",
                "s3:ListBucketVersions",
                "s3:GetBucketLogging",
                "s3:GetBucketPolicy",
                "s3:GetBucketOwnershipControls",
                "s3:GetBucketPublicAccessBlock",
                "s3:GetBucketPolicyStatus",
                "s3:ListBucketMultipartUploads",
                "s3:GetBucketVersioning",
                "s3:GetBucketAcl",
                "s3:ListMultipartUploadParts",
                "s3:GetObject",
                "s3:GetBucketLocation",
                "s3:GetObjectVersion",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{bucket}",
                "arn:aws:s3:::{bucket}/*"
            ],
            "Effect": "Allow",
            "Sid": "ListObjectsInBucket"
        }
    ]
}

Replace placeholders like {cluster_name}, {region_id}, {account_id}, {bucket}, etc., with your specific details.

5. Use for AWS IAM Role Assume Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "{select_star_principal_id}"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "{external_id}"
                }
            }
        }
    ]
}

Replace placeholders like {select_star_principal_id} and {external_id} with values specific for your data source instance. Please contact Select Star support to obtain values after creating the data source. They are also encoded in the link "Open CloudFormation" in Select Star UI.

10. Network Management

To create an AWS VPC ingress rule for your Redshift cluster to allow access from specific IP addresses of the Select Star platform (3.23.108.85/32 and 3.20.56.105/32) via the AWS Management Console, follow these steps:

  1. Navigate to the VPC Console: In the AWS Management Console, go to "Services" and select "VPC".

  2. Open Security Groups: In the VPC dashboard, locate and click on "Security Groups" in the left-hand navigation pane.

  3. Find the Redshift Cluster’s Security Group: If you know the Security Group ID associated with your Redshift cluster, find it directly. Otherwise, navigate to the Redshift dashboard, select your cluster, and note the Security Group under its properties.

  4. Select the Security Group: Click on the relevant Security Group for your Redshift cluster.

  5. Switch to the “Inbound Rules” Tab: Click on the “Inbound rules” tab.

  6. Click on “Edit Inbound Rules”: Click the "Edit inbound rules" button.

  7. Add a Rule for the First IP:

    • Click “Add Rule”.

    • For “Type”, select “Redshift” (default port is TCP 5439).

    • Select "Custom" in "Source" and enter 3.23.108.85/32.

    • Optionally, add a description.

  8. Add a Rule for the Second IP:

    • Click “Add Rule” again.

    • Repeat the same steps but enter 3.20.56.105/32 in “Source”.

  9. Review and Save the Rules: Ensure the rules are correctly set up, then click “Save rules” to apply the new inbound rules.

Always confirm that you are modifying the correct security group associated with your Redshift cluster to avoid any unintended access issues.

11. Confirm Authorization

1. Return to Select Star. You should see a form that allows you to provide the "Role ARN". Fill in the required information:

  • Role ARN: Identifier of the AWS IAM Role to use by Select Star. You'll see this after creating the new IAM Role.

2. Click Connect.

12. Choose Databases and Schemas

After you fill in the information, you'll be asked to select the databases you'd like to load into Select Star.

Select Star will not read queries or metadata or generate lineage for databases, schemas, or tables that are not loaded. Please load all data for which you expect to see lineage.

You can change the databases and schemas you have loaded if needed.

Select the database and click Next.

For each database you selected, you'll be able to select the schemas.

Your metadata should start loading automatically. Please allow 24-48 hours to completely generate popularity and lineage.

When the sync is complete, you'll be able to explore Redshift in Select Star.

Notes

  • Replace placeholders like {cluster}, {role}, {bucket}, {db}, etc., with your specific details.

  • Some steps may require waiting for the cluster to become available after changes.

  • Always verify changes and ensure they align with your infrastructure and security policies.

Last updated