For now, the organize slices we support are Managed Redshift, Athena, and Snowflake. And within there, we provide the ability to create and manipulate multiple levels of transformations.
Type of transformations
Transformations are written as SQL queries in the dialect of the underlying query engine.
The results of the transformations are written into a destination table. The destination
table can be of two types, defined by the
- Regular table
- Partitioned table
The way the results of the transformation are written into the destination table are
determined by the
update-mode attribute. Update mode can be on of
- replace - this mode replaces all the rows of the destination table with the results of the transformation query. This is a wipe and load operation.
- append - in this mode, the transformation query result rows are inserted into the destination table, rows already in the destination table are not updated.
- merge - in this mode, the transformation query can have rows that indicate that the destination table rows have to be inserted, updated, or even deleted. This mode allows for efficient incremental updates to destination tables.
For regular tables, all update modes are allowed. But, for partitioned tables, only append mode is allowed.
Click here for more information. You will also need to
define the frequency of the transformation, and the schema and name of the destination table.
Destination table configuration like sort keys and distribution keys are optional parameters.
Create a transformation
Use a DPL file
Creating a MV is as easy as writing an SQL statement. Use DPL file to create a MV with the command below.
A DPL file has following format
Note: You can only specify an existing
<schema name> as the value for
You can choose to create your own schemas as well. You will need to ensure that Datacoral platform can manage the materialized views created in that schema. This will require you to grant specific priviliges to the
datacoral user (which is created when the Datacoral platform was installed in your AWS account).
Once you have finished the above steps, you will now be able to specify the created schema name as a value for
Use CLI args
File-based matview management flow now allows users to update one or more matview definitions in one shot using update-set. An update-set is created using matview update-set manifest file. An update-set manifest file is a text file stored on a local disk. It contains following information for each matview being updated:
- action type to update matview: “update” is the only action type currently supported, more action types will be added in future releases;
- path to DPL file (DPL stands for data programming language) describing updates which need to be done to each matview.
Create an update set locally
update-set- Path to update-set manifest. If this option not specified, following default update-set manifest file will be used:
dpl-file- Path to DPL file describing matview update.
- each update-set manifest can contain only one update action for each MV.
- each update-set manifest can contain multiple update actions for different MVs.
See Updating Transformations for a detailed description of the different types of updates that are allowed and the types that are not.
Describe one MV
List all MVs
Get the list of all the materialized views in redshift
Download all MVs
download-dir- Path to matview download directory. This directory should be empty unless --overwrite option has been specified.
overwrite- Overwrite content of matview download directory if it is not empty.
- Each MV will be downloaded into individual DPL file.
Understanding MV dependencies
List upstream dependencies
List downstream dependents
List dependents in graph form
Describe a version of MV
Pause and resume
Get Status of MV
Get the status of the refreshes on the MVs over a given time period.
Pause a MV
Resume a MV
Pause refreshes of all MVs
A good brute force tool to use if your Redshift cluster is in imminent danger of being overloaded.
Resume refreshes of all MVs
Go back to normal operations.
Pause and resume downstream MVs
There are cases where you are trying to debug/update an MV that is an internal node in your DAG of MVs. In these cases, the safest way to apply the updates is by first pausing refreshes of all the downstream MVs and publishers of a the MV being updated. With this command a given MV or a list of MVs can first be brought to a 'good shape' before resuming their downstream MVs or publishers.
Note: For redshift MVs, you will need to specify the
mv_ prefix in the MV name unlike in other commands.
Pause all downstream MVs and publishers of a given list of tables
The given list of tables can be either loadunits or MVs themselves. This command pauses all MVs and publishers that are downstream to the list of tables provided.
Resume all downstream MVs and publishers of a given list of tables
The given list of tables can be either loadunits or MVs themselves. This command resumes all MVs and publishers that are downstream to the list of tables provided.
resume is done on a future timelabel, the missing timelabels will not get reprocessed.
For example, when an MV
mysqltest.mv_leveltwo depends on an MV
mysqltest.mv_levelone, pausing the downstream of
mysqltest.mv_levelone does the following:
This can be verified by checking the
currentStatus of the downstream MV like below.
pauseTimelabel indicates the timelabel from which the MV is paused.
When you resume the downstreams of an MV, the status can be similarly verified.
Reprocess MVs using
This can be used to reprocess timelabel of one or more MVs. Here reprocess.json file looks like the following:
Reprocess MVs without
- Command expects either
--timelabelOR both (
- Combination of
--end-timelabelcan be used to reprocess multiple timelabels. Providing same value to
--end-timelabelmeans that only one timelabel will be reprocessed.
--timelabelis equilvalent to providing same value to both
--timelabelwill be deprecated in future.)
- Command defaults
redshiftunless specified. (Other possible values:
- Not providing
--slice-namedefaults it to
--timelabelis the same as providing the same input to
The above command is equivalent to: