PgLogicalDecoding is Datacoral’s PostgresSQL Change Data Capture (CDC) slice that leverages the Write-Ahead-Logs (WAL) records and derive the changes happening at the source. This is particularly useful to capture changes including Hard Deletes of rows from tables.
The slice fetches inserted and updated rows as well as IDs of hard deleted rows. The slice DELETES all rows with the IDs in the changed row set and copies all the changed rows into the table (Note that while IDs of Deleted rows are present in the changed rows set, they do not get inserted back).
- Identify your database credentials
- Make sure your database is addressable from your Datacoral VPC
- Enable logical decoding at the source
- Provide rds_replication privileges to Datacoral user
- Connect to Master
- Connect to Slave
1. Identify your database credentials
- Use an existing database user OR create a new user for Datacoral. If creating a new user, please run the following statement on your PostgreSQL database, replacing
datacoral-passwordwith a password of your choice.
CREATE USER datacoral PASSWORD 'datacoral-password';
Encrypt password (recommended)
All customer data is encrypted at rest as well as in motion using customer managed KMS keys. Datacoral’s cross-account role does not have decrypt permissions on the KMS keys. However, we do recommend using pre-encrypted credentials, as it allows for you to source control configuration files to automate installations for Data Ops and share configurations without compromising security. Please refer to the instructions on encrypting your password before adding them to encrypt your passwords.
- Then grant the
datacoraluser read-only access to all tables you need to replicate. If the tables you want to replicate are in a separate schema other than
public, please change the commands below appropriately. The last command makes sure that any future tables will be accessible to
datacoral. If you have several schemas and you wish to grant access to all of them, you will need to repeat these three commands for each schema.
GRANT USAGE ON SCHEMA "public" TO datacoral; GRANT SELECT ON ALL TABLES IN SCHEMA "public" TO datacoral; ALTER DEFAULT PRIVILEGES IN SCHEMA "public" GRANT SELECT ON TABLES TO datacoral;
Restricted table permissions (optional)
To grant access only to a subset of tables in a schema for security reasons, follow the below instructions
- Revoke access to the schema and ensure
datacoraluser has access to the schema itself:
GRANT USAGE ON SCHEMA "schema_name" TO datacoral; ALTER DEFAULT PRIVILEGES IN SCHEMA "schema_name" REVOKE SELECT ON TABLES FROM datacoral; REVOKE SELECT ON ALL TABLES IN SCHEMA "schema_name" FROM datacoral;
- Repeat this command for each table you wish to include:
GRANT SELECT ON "schema_name"."table_name" TO datacoral;
Restricted column-level permissions (optional)
To grant access only to some columns in a table, first make sure you have removed any previously given permission to all columns in that table:
REVOKE SELECT ON "schema_name"."table_name" FROM datacoral;
Then give permission to the specific columns, e.g. columnA and columnB:
GRANT SELECT ("column_one", "column_two") ON "schema_name"."table_name" TO datacoral;
2. Make sure your database is addressable from your Datacoral VPC
If your database is hosted on AWS and is not publicly accessible, need to authorize ingress for Datacoral's elastic IP to the db security group. You can do this in the AWS RDS console or via AWS CLI:
aws ec2 authorize-security-group-ingress \ --group-id <security group associated with rds instance> \ --protocol tcp \ --port <db port> \ --cidr <elastic ip>
If your database is hosted outside of AWS, please follow the appropriate networking setup to allow for ingress traffic for Datacoral to access your database. Please note that you will have to do this for both master as well as slave db.
3. Enable logical decoding at source
- Edit parameter group to modify parameters (such as
rds.logical_replication) directly in RDS Console (Assuming Postgres is AWS RDS Instance).
- Default parameter group cannot be modified, you will need to create a new parameter group for your Postgres version, and then edit the parameter group (rds.logical_replication)
- Modify Postgres instance to use this new parameter group, and check “apply immediately” box.
- Reboot the instance to have the parameter take effect.
Note : As part of applying rds.logical_replication parameter, RDS internally sets the
4. Provide rds_replication privileges to Datacoral user
The database user (e.g. “datacoral”) who will access the data should have access to rds_replication.
5. Check Datacoral user privilege
- Login to source database using same user credentials that is used by slice e.g,
- Execute the following query and verify that
datacoralis a member of
SELECT r.rolname, r.rolsuper, r.rolinherit, r.rolcreaterole, r.rolcreatedb, r.rolcanlogin, r.rolconnlimit, r.rolvaliduntil, ARRAY ( SELECT b.rolname FROM pg_catalog.pg_auth_members m JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid) WHERE m.member = r.oid) as memberof , r.rolreplication , r.rolbypassrls FROM pg_catalog.pg_roles r WHERE r.rolname = 'datacoral' ORDER BY 1
6. Connect to Master
Below details will be needed to connect to Master Database -
- Encrypted Password
- List of tables to be synced
7. Connect to Slave
Below details will be needed to connect to Slave Database -
- Encrypted Password
- List of tables to be synced
Add the Slice
we currently do not support adding the slice through our CLI or the UI. Please contact firstname.lastname@example.org to get this slice added to your installation once you go through the list of steps 1 through 7 listed above.
Advanced configuration can be enabled through describing and updating a slice
Once PgLogicalDecoding slice has been set up, data will start flowing from Write Ahead Log to Redshift -
- Changed data (Insert / Update / Delete) will first flow into S3 on a continuous basis
- On an hourly basis, this data will be stored into slot table in Redshift
- On completion of storing in slot table, final changes will be applied to corresponding tables
- Usually, Slot table will have a retention of a few days so that just the change log does not consume too much space. All change logs will be available in s3. Retention is a configurable value and we can set up the number of days change logs are retained
In parallel, we will do One Time Full Table sync from slave for all historical data. This action is performed on Slave so that there is no additional load on Master.
Monitoring and Failure Handling
Datacoral monitors for slot size and data freshness -
when slot size grows to some extent - automatically recover via a release valve mechanism involving AWS Batch when slot size is going out of hand - delete slot, redo snapshot from slave, and then start capturing changes
Mealpal can set up alerts around slot sizes so that when there is an increase in slot sizes, you can action appropriately too. As this is an action on Master database, a close monitoring and Failure recovery will be key.
For reference -
How to check replication slots -
- Login to source database
- Execute following query :
SELECT * FROM pg_replication_slots
How to drop replication slots -
Login to source database Execute following query:
Data stored in Amazon S3 is partitioned by date and time in the following bucket
Schema - schema name will be same as a slice-name.
Tables produced by the slice will have the same name as the source database with
mv_ as the prefix.
- PostgreSQL Logical Decoding 9.6.10 and above
- Hosted flavors
- Generic PostgreSQL
- Amazon Aurora
- Amazon RDS
- Google Cloud MySQL and Azure databases addressable from your AWS VPC