AWS RDS PostgreSQL

Follow these steps to connect your AWS RDS PostgreSQL to Select Star.

Before you start

To connect AWS RDS PostgreSQL to Select Star, you will need...

  • access to AWS User with with permissions to deploy CloudFormation, modify IAM and AWS RDS cluster

  • access to admin user of AWS RDS

Select Star requires only minimal metadata access to AWS RDS. The granted permissions are defined in CloudFormation template:

For instances operating within a non-publicly accessible environment, such as an AWS VPC private subnet, please refer to our guide on Integrating Private Network Data Sources for detailed instructions and best practices.

1. Create PostgreSQL user

Connect to the PostgreSQL database using an administrative user account and create a new user (service account) for the integration, by executing SQL query:

CREATE USER selectstar WITH encrypted password 's313ctst8r'

Replace s313ctst8r with strong and secure password.

Then, it is necessary to grant permissions to selected databases and schemas. To do this, run the following query individually in the context of the selected database for each schema:

GRANT USAGE ON SCHEMA <schema_name> TO selectstar;
GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO selectstar;
ALTER DEFAULT PRIVILEGES IN SCHEMA <schema_name> GRANT SELECT ON TABLES TO selectstar;

Replace <schema_name> with the expected schema name and repeat for each databases & schemas.

2. Ensure network connectivity

To establish a connection between Select Star and your RDS instance, it is essential that your RDS instance is accessible from the following IP addresses:

  • 3.23.108.85

  • 3.20.56.105

If your RDS instance is protected by a firewall, you'll need to add these two IP addresses to your whitelist to allow for the connection. If you encounter any challenges or require further assistance in adapting these configurations to your specific network topology, please don't hesitate to reach out to our technical support team for expert guidance and solutions.

3. Enable query logging

To be able to generate lineage and popularity, we need to have access to a log of all queries performed on the cluster via AWS CloudWatch. Enabling query logging for an Amazon RDS instance and sending the logs to AWS CloudWatch involves several steps. Here's a step-by-step guide to achieve this using the AWS Management Console:

1. Sign in to AWS Console: Log in to your AWS Management Console using your credentials.

2. Open RDS Dashboard: Navigate to the Amazon RDS service in the AWS Management Console.

3. Create a New Parameter Group:

  • In the RDS dashboard, click on "Parameter groups" in the left-hand navigation pane.

  • Click the "Create parameter group" button.

  • Provide a name for the new parameter group, e.g., "CustomRDSParameterGroup."

  • In the "Family" dropdown, select the appropriate DB engine family.

  • Provide a description for the parameter group (optional).

  • Click the "Create" button to create the new parameter group.

4. Edit the Parameter Group:

  • In the parameter group list, find your newly created parameter group, "CustomRDSParameterGroup," and click on its name.

  • In the "Parameter group details" page, find the "Parameters" tab.

  • Click the "Edit parameters" button.

5. Set log_min_duration_statement and log_statement parameters:

  • In the "Modifiable parameters" page, you can search for parameters. In the search box, type "log_min_duration_statement" and "log_statement" one by one.

  • For parameter log_min_duration_statement set value to 0 (to log all statements, regardless of duration).

  • For parameter log_statement set value to all (to log all SQL statements).

  • After setting these parameters, click the "Save changes" button.

6. Select the AWS RDS instance: Choose the Amazon RDS instance for which you want to enable query logging.

7. Modify the DB instance: In the instance details page, click the "Modify" button to make changes to the instance configuration.

8. Enable the Query Logging Parameter:

  • In the "Modify DB instance" page, find the "Log exports" section. Look for the "PostgreSQL log" parameter. Set this parameter to "enabled."

  • In the "Database options" section, look "DB parameter group" parameter and Select the custom parameter group you created, "CustomRDSParameterGroup" from the dropdown.

9. Apply the Changes: Scroll down to the bottom of the "Modify DB instance" page and click "Continue."

10. Review and Apply Changes: Review the changes you're about to make and click "Apply immediately" if you want the changes to take effect immediately. Otherwise, choose a maintenance window for applying the changes. Click "Continue."

11. Monitor the Update: The changes will be applied to your RDS instance. You can monitor the progress on the "Databases" page in the RDS dashboard.

12. Verify Query Logging: After the changes have been applied, query logging will be enabled for your RDS instance, and the logs will be sent to CloudWatch. You can access these logs by navigating to the CloudWatch Logs section of the AWS Management Console. Before accepting credentials, we verify whether the query log has been configured, so it is important that some queries have already been logged.

4. Create a new data source

1. Go to the Select Star Settings. Click Data in the sidebar, then + Add to create a new Data Source.

2. Fill form in the required information:

  • Source Type: Select "PostgreSQL"

  • Display Name: This value is PostgreSQL by default, but you can override it if desired.

  • Hostname: The public hostname of your instance.

  • Port: The port used to connect. By default is 5432, but you can adjust it if required.

  • Username: The PostgreSQL user name to connect. In our examples of SQL queries above, we use selectstar.

  • Password: The password of the PostgreSQL user.

  • DB Name: Any database we have access to that is used to initiate a first connection.

3. Click Connect.

5. Create CloudFormation stack

Select Star recommends use of AWS CloudFormation to setup integration, which allows you to make necessary changes to the RDS instance environment in a automatic, transparent, safe and auditable manner.

AWS CloudFormation will create AWS resources to enable us to access AWS CloudWatch logs.

The source code of the CloudFormation template along with build scripts and real-time logs of the continuous deployment system is available on our public repository selectstar/cloudformation-templates hosted on Github, to be freely audited.

You can pass the link to CloudFormation to the infrastructure team to enable the integration to be created.

1. A simple form will be displayed in Select Star. In option Source Type select "AWS RDS PostgreSQL".

2. Click Open CloudFormation. A new window will open to proceed to the creation of a CloudFormation stack by AWS Management Console. Make sure you are logged into the AWS account in which the RDS instance is hosted.

3. The Create Stack form will be displayed. Some of the values will be filled in by default, under Parameters, enter:

  • RDS instance name: Use the Amazon RDS instance name

4. Review the information and under Capabilities choose "I acknowledge that AWS CloudFormation might create IAM resources".

5. Choose Create stack.

6. Wait until the stack changes it status to "CREATE_COMPLETE" from "CREATE_IN_PROGRESS" in tab "Stack info". The operation should take up to 5 minutes. You need to refresh tab to see the progress.

7. After completing stack creation, the Role ARN is available from the "Outputs". Copy and save the RoleArn for later use.

6. Confirm authorization

1. Return to Select Star. You should see a form that allows you to provide "Role ARN" . Fill form in the required information:

  • Role ARN: Identifier of AWS IAM Role to use by Select Star.

  • RDS instance name: Name of the AWS RDS instance.

2. Click Connect.

7. Choose databases and schemas

After you fill in the information, you'll be asked to select the databases you'd like to load into Select Star.

Select Star will not read queries or metadata or generate lineage for databases, schemas, or tables that are not loaded. Please load all data for which you expect to see lineage.

You can change the databases and schemas you have loaded if needed.

Select the database and click Next.

For each database you selected, you'll be able to select the schemas.

Your metadata should start loading automatically. Please allow 24-48 hours to completely generate popularity and lineage.

When the sync is complete, you'll be able to explore PostgreSQL in Select Star.

Optional steps

The AWS RDS logging setup may in some cases result in sensitive information being recorded and stored in Amazon CloudWatch. To assess impact, risk, and take precaution and remediation steps corresponding to your organization's workload characteristics, refer to relevant section of AWS RDS service documentation and AWS support if required.

Last updated