PostgreSQL CDC Connector Prerequisites

Prerequisites

1. Identify your database credentials

Use an existing database user OR create a new user for Datacoral. If creating a new user, please run the following statement on your PostgreSQL database, replacing datacoral-password with a password of your choice.

Create user

CREATE USER datacoral PASSWORD 'datacoral-password';

Encrypt password (recommended)

All customer data is encrypted at rest as well as in motion using customer managed KMS keys. Datacoral’s cross-account role does not have decrypt permissions on the KMS keys. However, we do recommend using pre-encrypted credentials, as it allows for you to source control configuration files to automate installations for Data Ops and share configurations without compromising security. Please refer to the instructions on encrypting your password before adding them to encrypt your passwords.

Grant permissions

Then grant the datacoral user read-only access to all tables you need to replicate. If the tables you want to replicate are in a separate schema other than public, please change the commands below appropriately. The last command makes sure that any future tables will be accessible to datacoral. If you have several schemas and you wish to grant access to all of them, you will need to repeat these three commands for each schema.

GRANT USAGE ON SCHEMA "public" TO datacoral;
GRANT SELECT ON ALL TABLES IN SCHEMA "public" TO datacoral;
ALTER DEFAULT PRIVILEGES IN SCHEMA "public" GRANT SELECT ON TABLES TO datacoral;

Restricted table permissions (optional)

To grant access only to a subset of tables in a schema for security reasons, follow the below instructions

  1. Revoke access to the schema and ensure datacoral user has access to the schema itself:
GRANT USAGE ON SCHEMA "schema_name" TO datacoral;
ALTER DEFAULT PRIVILEGES IN SCHEMA "schema_name" REVOKE SELECT ON TABLES FROM datacoral;
REVOKE SELECT ON ALL TABLES IN SCHEMA "schema_name" FROM datacoral;
  1. Repeat this command for each table you wish to include:
GRANT SELECT ON "schema_name"."table_name" TO datacoral;

Restricted column-level permissions (optional)

To grant access only to some columns in a table, first make sure you have removed any previously given permission to all columns in that table:

REVOKE SELECT ON "schema_name"."table_name" FROM datacoral;

Then give permission to the specific columns, e.g. columnA and columnB:

GRANT SELECT ("column_one", "column_two") ON "schema_name"."table_name" TO datacoral;

2. Enable logical decoding at source

  • Edit parameter group to modify parameters (such as rds.logical_replication) directly in RDS Console (Assuming Postgres is an AWS RDS Instance as well).
  • Default parameter group cannot be modified, you will need to create a new parameter group for your Postgres version, and then edit the parameter group (rds.logical_replication)
  • Modify Postgres instance to use this new parameter group, and check “apply immediately” box.
  • Reboot the instance to have the parameter take effect.

Note : As part of applying rds.logical_replication parameter, RDS internally sets the wal_level, max_wal_senders, max_replication_slots and max_connections parameters.

3. Provide rds_replication privileges to Datacoral user

The database user (e.g. “datacoral”) who will access the data should have access to rds_replication.

grant rds_replication to datacoral;

Check Datacoral user privilege

  • Login to source database using same user credentials that is used by slice e.g, datacoral
  • Execute the following query and verify that datacoral is a member of rds_replication
SELECT
r.rolname,
r.rolsuper,
r.rolinherit,
r.rolcreaterole,
r.rolcreatedb,
r.rolcanlogin,
r.rolconnlimit,
r.rolvaliduntil,
ARRAY
( SELECT b.rolname
FROM pg_catalog.pg_auth_members m
JOIN pg_catalog.pg_roles b
ON (m.roleid = b.oid)
WHERE m.member = r.oid
)as memberof
,r.rolreplication
,r.rolbypassrls
FROM pg_catalog.pg_roles r
WHERE r.rolname = 'datacoral'
ORDER BY 1

4. Make sure your database is addressable from your Datacoral VPC

Database accessible via the internet

If your database is hosted on AWS and is not publicly accessible, need to authorize ingress for Datacoral's elastic IP to the db security group. You can do this in the AWS RDS console or via AWS CLI:

aws ec2 authorize-security-group-ingress \
--group-id <security group associated with rds instance> \
--protocol tcp \
--port <db port> \
--cidr <elastic ip>

Database accessible via VPC Peering

If you don't want any traffic from your database to go over the internet, you can setup VPC Peering between the Datacoral Installation's VPC and your database's VPC. See VPC peering for how to set it up.

If your Postgres Database is in AWS, you would have to modify the subnet group and security groups so that network connections are allowed to the Datacoral VPC.

  • For the RDS subnet group, the Network ACL should allow inbound and outbound connection to the Datacoral VPC.
  • For the RDS security group, we would need an inbound and outbound rule to/from the Datacoral VPC.

Other

If your database is hosted outside of AWS, please follow the appropriate networking setup to allow for ingress traffic for Datacoral to access your database. Please note that you will have to do this for both master as well as the replica.

5. Handling TOAST columns (optional)

TOAST is a mechanism in PostgreSQL that allows database administrators to keep physical data rows from exceeding the size of a data block. Having tables with "TOASTed" columns does have implications on data replication using Change Data Capture (as has been noted before).

If the actual value of "TOASTed" columns is important for replication into the data warehouse, then the "replica identity" for the table has to be set to FULL. This can be done by running:

ALTER TABLE <schema_name.table_name> REPLICA IDENTITY FULL;

Alternatively, if the "TOASTed" column is part of an index (or can be used in an index), the data can be replicated successfully by setting the "replica identity" to INDEX. Commands for this:

ALTER TABLE <schema_name.table_name> ALTER COLUMN <toasted_column> SET NOT NULL;
CREATE UNIQUE INDEX <toast_col_idx_name> ON <schema_name.table_name>(<toasted_column>);
ALTER TABLE <schema_name.table_name> REPLICA IDENTITY USING INDEX <toast_col_idx_name>;

If this is not performed, we will have NULL or placeholder values show up in the column instead. As a warning, setting "replica identity" to FULL might increase the disk space consumed, since all the columns are being written to WAL all the time.

Next steps