How to Add Self-Referential Transformations
Datacoral allows transformations to depend on their prior outputs. This feature is enabled by specifying 'self-referential' flag in the transformation definition.
note
This feature supports Amazon Redshift and Amazon Athena destination warehouses.
Self-referential transformations have multiple use cases
- Setting retention
- Performing cumulative aggregates without making copies
- Delete table contents based on conditions
Here is an example illustrating how to add a self-referential transformation that deletes data older than 3 months
Step 1: Create a non datacoral connector
Create a non-datacoral connector named input_schema
and add a loadunit called input_table
syncing every 5 minutes.
Contact support at support@datacoral.co to add the Non-Datacoral connector.
Step 2: Create table in Redshift
The input_table will have rows one per month.
Step 3: Create the transformation schema
Create the transformation schema using the below query.
Step 4: Create the self-referential transformation
You can now create the self-referential transformation using a dpl (mv_schema/mvname.dpl) file like below:
Then, use the Datacoral CLI to create the transformation:
note
historical-sync-query
is currently used to figure out the schema of the destination table in order to create it and handle self-referential transformations. Going forward,
- The sync query will be used to perform historical sync as soon as the transformation is added
- There will be CLI commands to trigger a historical sync when needed
For now, please use the below query to insert the contents from input_schema.input_table onto the matview (mvname)
When the transformation is refreshed, mv_schema.mv_mvname table in redshift will have only the past 90 days worth of data.