Database Connector Overview
Overall Capabilities of Database Connector
- Introspection of a schema to discover tables to extract
- Application of heuristics to determine the appropriate mode for extraction. Ex. small tables are automatically setup as snapshot & tables with an indexed timestamp attribute
- Schema change detection & propagation of schema changes. Dropped tables and columns are not propagated, by design.
- Whitelist and blacklist of tables. Helpful to filter out temp or sensitive tables from replicating to the warehouse. By default the slice replicates all the tables in a schema.
- Column list for a table to replicate. By default, the slice extracts all the columns in a table. Useful when sensitive columns should not be * replicated
- Pagination for large tables using primary key or a date column.
- Column-level SQL based transformation at the source for Snapshot & Incremental modes. Ex. conversion of timestamp columns to standard format
- Filtering rows at the source
- Ordering rows by columns
Extract modes
The Database connector provides three modes to extract data from Postgres. When the slice is bootstrapped, it performs an auto detection of snapshot vs. incremental mode based on presence of primary key and an indexed timestamp column.
- Snapshot: to extract all the data, every time. Useful with small tables (<100K rows) OR when hard deletes occur and Change Data Capture (CDC) cannot be used.
- Incremental: to extract only data changed since the last pull. During slice addition, backfill can be initiated to perform a one-time fetch of all data
- Change Data Capture (CDC): to extract and propagate every change that occurs in the source. Useful to propagate hard deletes and/or to build a full audit log of all changes valuable in security, compliance or data debugging Auto detection of snapshot vs. incremental mode based on presence of primary key and an indexed timestamp column
Load modes
Three load constructs into your warehouse
- Replace: to replace the data in the warehouse table with the latest fetched version.
- Append: to append the newly fetched data to an existing warehouse table.
- Merge: to update or delete existing records and add new records into an existing warehouse table.
Replace | Append | Merge | |
---|---|---|---|
Snapshot | Always represents current view of the source | Allows for full horizontal view of the table over time. Helpful for trending of data over time | Merges only the changes into the warehouse table preserving hard deletes |
Incremental | Not Applicable | Allows for full horizontal view of the rows modified over time based based on the schedule of the extracts | Merges only the changes into the warehouse table preserving hard deletes |
CDC | Not Applicable | Allows for full audit view of every modification over time | Always represents current view of the source |
Types of tables
Warehouse tables can be setup as Regular or Partitioned
- Regular: Normal tables that are ideal for small to medium sizes and do not need periodic pruning of old rows via retention rules
- Partitioned: Large tables that require efficient periodic pruning of old rows via retention rules
Replace | Append | Merge | |
---|---|---|---|
Regular | Supported | Supported | Supported |
Partitioned | Not Applicable | Supported | Not Applicable |
Change Data Capture Support
The following database connector support Change Data Capture mode
- MySQL
- Postgres