Database Connector Overview

Overall Capabilities of Database Connector

  • Introspection of a schema to discover tables to extract
  • Application of heuristics to determine the appropriate mode for extraction. Ex. small tables are automatically setup as snapshot & tables with an indexed timestamp attribute
  • Schema change detection & propagation of schema changes. Dropped tables and columns are not propagated, by design.
  • Whitelist and blacklist of tables. Helpful to filter out temp or sensitive tables from replicating to the warehouse. By default the slice replicates all the tables in a schema.
  • Column list for a table to replicate. By default, the slice extracts all the columns in a table. Useful when sensitive columns should not be * replicated
  • Pagination for large tables using primary key or a date column.
  • Column-level SQL based transformation at the source for Snapshot & Incremental modes. Ex. conversion of timestamp columns to standard format
  • Filtering rows at the source
  • Ordering rows by columns

Extract modes

The Database connector provides three modes to extract data from Postgres. When the slice is bootstrapped, it performs an auto detection of snapshot vs. incremental mode based on presence of primary key and an indexed timestamp column.

  • Snapshot: to extract all the data, every time. Useful with small tables (<100K rows) OR when hard deletes occur and Change Data Capture (CDC) cannot be used.
  • Incremental: to extract only data changed since the last pull. During slice addition, backfill can be initiated to perform a one-time fetch of all data
  • Change Data Capture (CDC): to extract and propagate every change that occurs in the source. Useful to propagate hard deletes and/or to build a full audit log of all changes valuable in security, compliance or data debugging Auto detection of snapshot vs. incremental mode based on presence of primary key and an indexed timestamp column

Load modes

Three load constructs into your warehouse

  • Replace: to replace the data in the warehouse table with the latest fetched version.
  • Append: to append the newly fetched data to an existing warehouse table.
  • Merge: to update or delete existing records and add new records into an existing warehouse table.
ReplaceAppendMerge
SnapshotAlways represents current view of the sourceAllows for full horizontal view of the table over time. Helpful for trending of data over timeMerges only the changes into the warehouse table preserving hard deletes
IncrementalNot ApplicableAllows for full horizontal view of the rows modified over time based based on the schedule of the extractsMerges only the changes into the warehouse table preserving hard deletes
CDCNot ApplicableAllows for full audit view of every modification over timeAlways represents current view of the source

Types of tables

Warehouse tables can be setup as Regular or Partitioned

  • Regular: Normal tables that are ideal for small to medium sizes and do not need periodic pruning of old rows via retention rules
  • Partitioned: Large tables that require efficient periodic pruning of old rows via retention rules
ReplaceAppendMerge
RegularSupportedSupportedSupported
PartitionedNot ApplicableSupportedNot Applicable

Change Data Capture Support

The following database connector support Change Data Capture mode

  • MySQL
  • Postgres