Datacoral's Data and Metadata Events

The Datacoral platform is built as an event-driven framework. This means that Datacoral's entire orchestration is built as a data/metadata event loop and services communicate with each other by sending or receiving events. Broadly speaking, there are two kinds of data-related events that are produced:

  • Data Events
  • Metadata Events

Data Events

Each data event corresponds to a batch of data and there are different kinds of data events:

  • A batch of data fetched by an ingest connector that was written to S3
  • A batch of ingested data getting loaded into a warehouse table (Redshift, Snowflake, etc)
  • A batched update of a destination warehouse table through a SQL-based or Python transformation (Materialized Views)
  • A batch of data published to an external destination object/table (such as Postgres, Salesforce etc)

Each data event is a JSON object that contains:

  • Event Type: data
  • Data Event Type: s3, redshift, glue, athena, snowflake, publish
  • Status: SUCCESS, Failure
  • Other information: Timelabel; start, end, update timestamps for the data events

An example data event is below:

"event_type": "data",
"type": "redshift", // Refers to s3, redshift, glue, athena or publish
"schema": "sample_schema", // Destination schema name
"table": "sample_table_name", // Destination table name
"timelabel": "20200303112000", // Timelabel
"status": "SUCCESS", // SUCCESS or FAILURE status
"startTime": "2020-03-03 11:25:00:000 +00:00", // Event start time
"updateTime": "2020-03-03 11:25:47:445 +00:00", // Event last update time
"endTime": "2020-03-03 11:25:47:445 +00:00", // Event end time

Metadata Events

Aside from replicating changes in data at source into a destination, Datacoral also handles schema changes for data at source. For example, there might be new tables added in a database like Postgres or a custom field may be added to an object in Salesforce. Metadata events are sent by Datacoral services when schema changes are detected at source and applied to a destination. Consequently, there are two types of metadata events:

  1. Change Detection Event: This is the event to indicate that Datacoral sensors have detected a change at the data source. There will be one event per change detected.
  2. Change Propagation Event: This is the event to indicate that the change has been applied to the destination. There will be one event per change applied in each of the destinations (if there are multiple destination warehouses for a Datacoral connector).
Event Typemetadata
Typedatasource, <destination data warehouse> (such as redshift))
SchemaConnector name or the schema name in the data warehouse
TableTable name at source or at destination
TimelabelRead about Timelabels here
Operation Sourceuser_initiated (if the change was caused because the user updated the connector configuration), automated (if Datacoral sensors detected the change that happened at the source)
Column NameName of the column if a column was added/removed/updated

An example metadata event is shown below:

"event_type": "metadata",
"type": "redshift", // Change applied to Redshift
"schema": "sample_schema", // Destination schema name
"table": "sample_table_name", // Destination table name
"timelabel": "20200303112000", // Timelabel
"operation": "ADD", // Table was added at source
"operation_source": "automated", // Datacoral detected change
"status": "SUCCESS", // SUCCESS or FAILURE status

Subscribing to Events

Customers can subscribe to Datacoral's data and metadata events via: