Configure Existing Redshift Cluster for Datacoral

As part of Datacoral installation, you could spin up a new Redshift cluster or choose to use an existing Redshift cluster as the warehouse. Click here for instructions on how to create a new Datacoral installation with your existing Redshift cluster directly through the Amazon Redshift Console.

Note that if you already have an existing redshift cluster that you want to use, you would have to make sure the networking configuration is setup appropriately. Services within the Datacoral VPC should be able to connect to the VPC that the redshift cluster is in. In addition, Datacoral will not be providing additional management capabilities like

  1. WLM Management
  2. Managed resizes
  3. Query management

To utilize an existing Redshift cluster with the Datacoral installation, please follow the steps below:

Step 1: Create datacoral user

Execute the following commands as master or a privileged user in Redshift

create group datacoral;
create user datacoral password '<set password for datacoral>';

Please refer to this link to set the password according the Redshift password policy.

Step 2: Grant Privileges to datacoral user

Execute the following commands as master or a privileged user in Redshift

alter group datacoral add user datacoral;
grant create on database <databasename> to group datacoral;
grant select on svv_table_info to group datacoral;
grant select on stv_tbl_perm to group datacoral;
grant select on stl_analyze to group datacoral;
alter default privileges for user datacoral grant select on tables to public;
alter default privileges for user datacoral grant execute on functions to public;
grant usage on language plpythonu to group datacoral;
grant usage on schema pg_catalog to datacoral;
grant select on all tables in schema pg_catalog to datacoral;

Step 3: Set up connectivity to your existing Redshift cluster

You can allow Datacoral to connect to your existing Redshift cluster using one of the options below:

  1. Add Datacoral's Elastic IP to your Redshift cluster's security group. Click here to see detailed instructions for this option
OR
  1. VPC peering
    • Setup VPC peering between the Datacoral VPC and the Redshift VPC
    • Enable Outbound rules to the Redshift port from the Datacoral VPC
    • Enable Inbound rules to the Redshift port from the Datacoral VPC
    • Add route table entries to subnets and security groups as applicable
OR
  1. For more advanced configuration of allowing access to your Redshift cluster through a copy role, please go through the steps here

Step 4: Follow along the Datacoral onboarding flow

Once the above steps are done, you can provide the credentials to your redshift cluster in the onboarding flow.

[Optional] Step 5: Enable Redshift Logging

All Redshift queries that are used for monitoring by Datacoral and their corresponding outputs can be logged in an S3 bucket. See the steps here for how to set this up yourself.