1. Identify your database credentials
Use an existing database user OR create a new user for Datacoral. If creating a new user, please run the following statement on your PostgreSQL database, replacing
datacoral-password with a password of your choice.
Encrypt password (recommended)
All customer data is encrypted at rest as well as in motion using customer managed KMS keys. Datacoral’s cross-account role does not have decrypt permissions on the KMS keys. However, we do recommend using pre-encrypted credentials, as it allows for you to source control configuration files to automate installations for Data Ops and share configurations without compromising security. Please refer to the instructions on encrypting your password before adding them to encrypt your passwords.
Then grant the
datacoral user read-only access to all tables you need to replicate. If the tables you want to replicate are in a separate schema other than
public, please change the commands below appropriately. The last command makes sure that any future tables will be accessible to
datacoral. If you have several schemas and you wish to grant access to all of them, you will need to repeat these three commands for each schema.
Restricted table permissions (optional)
To grant access only to a subset of tables in a schema for security reasons, follow the below instructions
- Revoke access to the schema and ensure
datacoraluser has access to the schema itself:
- Repeat this command for each table you wish to include:
Restricted column-level permissions (optional)
To grant access only to some columns in a table, first make sure you have removed any previously given permission to all columns in that table:
Then give permission to the specific columns, e.g. columnA and columnB:
2. Enable logical decoding at source
- Edit parameter group to modify parameters (such as
rds.logical_replication) directly in RDS Console (Assuming Postgres is an AWS RDS Instance as well).
- Default parameter group cannot be modified, you will need to create a new parameter group for your Postgres version, and then edit the parameter group (rds.logical_replication)
- Modify Postgres instance to use this new parameter group, and check “apply immediately” box.
- Reboot the instance to have the parameter take effect.
Note : As part of applying rds.logical_replication parameter, RDS internally sets the
3. Provide rds_replication privileges to Datacoral user
The database user (e.g. “datacoral”) who will access the data should have access to rds_replication.
Check Datacoral user privilege
- Login to source database using same user credentials that is used by slice e.g,
- Execute the following query and verify that
datacoralis a member of
4. Make sure your database is addressable from your Datacoral VPC
Database accessible via the internet
If your database is hosted on AWS and is not publicly accessible, need to authorize ingress for Datacoral's elastic IP to the db security group. You can do this in the AWS RDS console or via AWS CLI:
Here are the steps to add Datacoral's elastic IP to the db security group through the AWS console
- Click on this link to access AWS RDS home and select the appropriate AWS region click on the source PostgreSQL database
- You will land on the below page Scroll down to find the Security group rules and click on the Security Group
- On Security group page, go to Inbound rules tab and click on Edit inbound rules
- On the Edit inbound rule page, click on Add rule
- Add the Type as PostgreSQL or Custom TCP
- Add the port as 5432
- Source as custom
- Copy paste the elastic IP
A CIDR block is expected here, so please add '/32' at the end of Datacoral elastic IP before pasting. For example, if the elastic IP address is 220.127.116.11, then please paste 18.104.22.168/32
and click on Save rules
Database accessible via VPC Peering
If you don't want any traffic from your database to go over the internet, you can setup VPC Peering between the Datacoral Installation's VPC and your database's VPC. See VPC peering for how to set it up.
If your Postgres Database is in AWS, you would have to modify the subnet group and security groups so that network connections are allowed to the Datacoral VPC.
- For the RDS subnet group, the Network ACL should allow inbound and outbound connection to the Datacoral VPC.
- For the RDS security group, we would need an inbound and outbound rule to/from the Datacoral VPC.
If your database is hosted outside of AWS, please follow the appropriate networking setup to allow for ingress traffic for Datacoral to access your database. Please note that you will have to do this for both master as well as the replica.
5. Handling TOAST columns (optional)
TOAST is a mechanism in PostgreSQL that allows database administrators to keep physical data rows from exceeding the size of a data block. Having tables with "TOASTed" columns does have implications on data replication using Change Data Capture (as has been noted before).
If the actual value of "TOASTed" columns is important for replication into the data warehouse, then the "replica identity" for the table has to be set to
FULL. This can be done by running:
Alternatively, if the "TOASTed" column is part of an index (or can be used in an index), the data can be replicated successfully by setting the "replica identity" to INDEX. Commands for this:
If this is not performed, we will have
NULL or placeholder values show up in the column instead. As a warning, setting "replica identity" to
FULL might increase the disk space consumed, since all the columns are being written to WAL all the time.