PostgreSQL CDC Connector Prerequisites
Prerequisites
1. Identify your database credentials
Use an existing database user OR create a new user for Datacoral. If creating a new user, please run the following statement on your PostgreSQL database, replacing datacoral-password
with a password of your choice.
Create user
Encrypt password (recommended)
All customer data is encrypted at rest as well as in motion using customer managed KMS keys. Datacoral’s cross-account role does not have decrypt permissions on the KMS keys. However, we do recommend using pre-encrypted credentials, as it allows for you to source control configuration files to automate installations for Data Ops and share configurations without compromising security. Please refer to the instructions on encrypting your password before adding them to encrypt your passwords.
Grant permissions
Then grant the datacoral
user read-only access to all tables you need to replicate. If the tables you want to replicate are in a separate schema other than public
, please change the commands below appropriately. The last command makes sure that any future tables will be accessible to datacoral
. If you have several schemas and you wish to grant access to all of them, you will need to repeat these three commands for each schema.
Restricted table permissions (optional)
To grant access only to a subset of tables in a schema for security reasons, follow the below instructions
- Revoke access to the schema and ensure
datacoral
user has access to the schema itself:
- Repeat this command for each table you wish to include:
Restricted column-level permissions (optional)
To grant access only to some columns in a table, first make sure you have removed any previously given permission to all columns in that table:
Then give permission to the specific columns, e.g. columnA and columnB:
2. Enable logical decoding at source
- Edit parameter group to modify parameters (such as
rds.logical_replication
) directly in RDS Console (Assuming Postgres is an AWS RDS Instance as well). - Default parameter group cannot be modified, you will need to create a new parameter group for your Postgres version, and then edit the parameter group (rds.logical_replication)
- Modify Postgres instance to use this new parameter group, and check “apply immediately” box.
- Reboot the instance to have the parameter take effect.
Note : As part of applying rds.logical_replication parameter, RDS internally sets the wal_level
, max_wal_senders
, max_replication_slots
and max_connections
parameters.
3. Provide rds_replication privileges to Datacoral user
The database user (e.g. “datacoral”) who will access the data should have access to rds_replication.
Check Datacoral user privilege
- Login to source database using same user credentials that is used by slice e.g,
datacoral
- Execute the following query and verify that
datacoral
is a member ofrds_replication
4. Make sure your database is addressable from your Datacoral VPC
Database accessible via the internet
If your database is hosted on AWS and is not publicly accessible, need to authorize ingress for Datacoral's elastic IP to the db security group. You can do this in the AWS RDS console or via AWS CLI:
Database accessible via VPC Peering
If you don't want any traffic from your database to go over the internet, you can setup VPC Peering between the Datacoral Installation's VPC and your database's VPC. See VPC peering for how to set it up.
If your Postgres Database is in AWS, you would have to modify the subnet group and security groups so that network connections are allowed to the Datacoral VPC.
- For the RDS subnet group, the Network ACL should allow inbound and outbound connection to the Datacoral VPC.
- For the RDS security group, we would need an inbound and outbound rule to/from the Datacoral VPC.
Other
If your database is hosted outside of AWS, please follow the appropriate networking setup to allow for ingress traffic for Datacoral to access your database. Please note that you will have to do this for both master as well as the replica.
5. Handling TOAST columns (optional)
TOAST is a mechanism in PostgreSQL that allows database administrators to keep physical data rows from exceeding the size of a data block. Having tables with "TOASTed" columns does have implications on data replication using Change Data Capture (as has been noted before).
If the actual value of "TOASTed" columns is important for replication into the data warehouse, then the "replica identity" for the table has to be set to FULL
. This can be done by running:
Alternatively, if the "TOASTed" column is part of an index (or can be used in an index), the data can be replicated successfully by setting the "replica identity" to INDEX. Commands for this:
If this is not performed, we will have NULL
or placeholder values show up in the column instead. As a warning, setting "replica identity" to FULL
might increase the disk space consumed, since all the columns are being written to WAL all the time.
Next steps
- Setup connector in UI
- Initiate historic sync