Checking updates to Transformations
Given that Datacoral makes it very easy to build transformations by just writing SQL, analysts can just focus on business logic of the data rather than the underlying plumbing. The most complicated part of managing the plumbing is when transformations have to be changed causing the transformation graph to get modified. When workflow managers are used to build pipelines, it is the responsibility of the analyst to make sure that changes to one transformation do not break downstream transformations.
Datacoral automatically verifies that changes to transformations wont cause any issues downstream. There are several areas where Datacoral's guardrails catch potentially critical bugs from being pushed to even testing environments.
The most critical check done statically is the one for downstream transformations. Datacoral provides the exact set of transformations that might get affected by the changes being performed. Analsyts can make changes to the affected transformations as well, which might in turn affect other downstream transformations. This check in and of itself catches several critical errors.
When updating a transformation, an analyst could be modifying different attributes of the transformation. Datacoral provides guardrails to make sure that the modifications dont break the pipeline.
The actual details of the CLI commands used to update the transformations are here. In this document, we will consider an example and go over the different guardrails provided by Datacoral.
Compile time checks
First off, Datacoral offers
- Compile-time type checks for changes to transformations, catching bugs early
- Ability to apply all related changes to the transformation graph in one shot
- This means that analysts can be confident that the changes they are making are consistent and wont break the pipeline.
Furthermore, with Datacoral's Data Quality Check feature, runtime checks can also be used to make sure that changes dont break anything.
For example the following command will provide visibility to analyst about potential downstream transformations that might need modifications based on the changes already made.
Steps to perform updates
In this section, we will go over an example of how Datacoral makes it easy for analysts to make consistent changes to their transformations. The details of the CLI commands are here.
Let us consider the following example where
salesforce.opportunity_cohort is downstream of
salesforce.opportunity_latest to understand how matview update works.
Step 1: Modify the transformation definitions
We are required to perform the following update operations
- Modify dependencies of
salesforce.accountin the from clause.
The transformation definitions can be modified as shown
Step 2: Add the updated definitions to the local staging area
Add these dpl files to local staging area by running following commands. Think of this step as similar to
git add <file-path> in the git development flow, where are all the updated dpl files are staged in a local manifest file.
You can also inspect the files that have been added to the local staging area by running the following command
Step 3: Stage the updated tranformations to be pushed to production
Push the updated changes from local staging area to a staging area in production by running the following command. Datacoral computes a safe timelabel at which these changes can be applied.
Matview update stage has multiple options. Please refer to CLI help section.
That's it. You don't have to worry about an
Restrictions on MV updates
There are several restrictions on the kinds of updates that are allowed in MVs. These restrictions are essential to protect the integrity of data in the MV being updated as well as all of its downstream MVs. Currently update-set can be used to do following updates to MV:
- Only MVs with
replacecan be updated by changing the SQL query. MVs with
mergeupdate modes cannot be changed.
- Schedule changes are not allowed
- Changing the
update-modeis not allowed
- Changing the
table-typeis not allowed
- Changing the table configuration (like dist key, sort keys) is not allowed
- Changing dependencies which cause invalid DAGs are not allowed
There are several restrictions on the kinds of updates that are allowed in MVs. These restrictions are essential to protect the integrity of data in the MV being updated as well as all of its downstream MVs.