How to Add Transformations

Create a Materialized View

Step 1. Populate the Configuration

There are two options for populating the configuration of a materialized view. Both work well and will be explained in the following sections.

Populate a DPL File

A DPL file represents the parameter configurations for a materialized view. Each file will have the format shown below. There are more optional paramters discussed in the Organize Commands Referrence too.

/**
* @datacoral
* @matview-schema <schema-name>
* @matview-name <view-name>
* @matview-type <view-type>
* @interval-in-minutes <interval-in-minutes>
* @sort-keys <sort-keys>
* @dist-key <dist-key>
* @slice-name <slice-name>
* @input-warehouse <input-warehouse-name>
* @storage-format <storage-format>
*/
<query>
  • schema-name: Name of an existing schema that the Datacoral platform has permission to manage
  • interval-in-minutes: The interval in minutes must be an element of the following set → 5, 10, 15, 20, 30, 60, 120, 180, 240, 360, 480, 720, or 1440
  • dist-key: Optional expression to specify the dist key for the materialized view. Example: diststyle key distkey(column1)
  • sort-key(s): Optional expression to specify sort keys for the materialized view. Example: compound sortkey(column1, column2)

A list of existing schemas in a Datacoral installation can be found by logging into the Datacoral app and clicking Organize in the left navigation bar.

Example

The following are fake values in an example view_test.dpl DPL file.

/**
* @datacoral
* @matview-schema john_doe
* @matview-name view_test1
* @matview-type regular
* @interval-in-minutes 15
* @dist-key distkey(column3)
* @sort-keys interleaved sortkey(column1, column2)
*/
select id, fullname from jane_doe.registers

Utilize CLI Arguments

The CLI command to create a materialized view also has options to pass the necessary paramters during creation of the view. An example of the formatting will be shown in the next step, but the same values must be gathered as in the above DPL file.

Step 2. Create the Materialized View

The new materialized view will be created in your installation by running one of the following commands.

Creating from DPL File

The command to create a materialized view from a DPL configuration file is seen below.

datacoral organize matview-create --dpl-file <mv_file_name>.dpl
  • mv_file_name: The path to the file in which the paramters were specified during step 1

Example

Using our example DPL file from the previous step,

datacoral organize matview-create --dpl-file view_test.dpl

Creating with CLI Arguments

The command to create a materialized view by passing arguments to the parameters is seen below.

datacoral organize matview-create --matview-schema <schema-name> --matview-name <view-name> \
--query <sql query defining view> \
--matview-type <regular|timeseries|timeseriesio|timeserieso> \
--dist-key <dist-key> \
--sort-keys <sort-key(s)> \
--interval-in-minutes <num-minutes>

Example

Using the same parameter values as in our DPL file, but altering the name of the materialized view so the two examples are unique, results in the command seen below.

datacoral organize matview-create --matview-schema jown_doe --matview-name view_test2 \
--query "select id, fullname from jane_doe.registers" \
--matview-type regular \
--dist-key "distkey(column3)" \
--sort-keys "interleaved sortkey(column1, column2)" \
--interval-in-minutes 15

Step 3. Verify the Materialized View

After creating a materialized view, execute the following command to verify the view was successfully created.

datacoral organize matview-stats --matview-schema <schema-name> --matview-name <view-name>
  • schema-name: Name of the schema used to create the materialized view

Example

For each materialized view created as examples thus far (both via DPL and via CLI args) the commends would respectively be as follows.

datacoral organize matview-stats --matview-schema john_doe --matview-name view_test1
datacoral organize matview-stats --matview-schema john_doe --matview-name view_test2

Troubleshooting

Failure in fetching dependencies

We have built a SQL parser to automatically detect data dependencies to simplify the cognitive load for analysts while creating transformations within a pipeline. This parser is built to handle all the SQL dialects that exist in different warehouses. However, we are coming across some cases where our parser fails even though the SQL is valid in the dialect of the underlying warehouse. In order to allow users to make progress even in these cases, we have offered a way for analysts to specify the table names that the SQL query includes. The requirement to specify these dependencies will be removed over time as we enhance the parser. In such cases, you can specify the list of such table dependencies using the dependencies field.

Example

/**
* @datacoral
* @slice-name snowflake
* @matview-schema jira
* @matview-name issue
* @matview-type regular
* @interval-in-minutes 15
*/
select parse_json(fields):status as status from jira.issue;

The SQL parser for the above query will fail. To be able to create the materialized view with the same query, we can specify the dependencies like below

/**
* @datacoral
* @slice-name snowflake
* @matview-schema jira
* @matview-name issue
* @matview-type regular
* @interval-in-minutes 15
* @dependencies
* [
* {
* "warehouseName": "snowflake",
* "schema": "jira",
* "name": "issue",
* "owner": "datacoral"
* }
* ]
*/
select parse_json(fields):status as status from jira.issue;

Update a Materialized View

The file-based management flow allows updating of one or more materialized views with a single command.

Step 1. Fetch the Materialized View Configuration

The existing parameters for a materialized view can be saved to a file by running the following command.

dtacoral organize matview-describe --matview-schema <schema-name> --matview-name <view-name> \
--output-format dpl > parameter-file.dpl
  • schema-name: Name of the schema used to create the materialized view
  • view-name: Name given that uniquely identifies the view
  • parameter-file: The path to the file in which the parameters will be saved

Example

Again, the materialized view created earlier will be used.

datacoral organize matview-describe --matview-schema john_doe --matview-name view_test1 \
--output-format dpl > view_test1.dpl

Step 2. Modify the Configurations

There are a few rules as to how a materialized view can be updated:

  • view-type must be "regular" in order to be updated
  • Only the SQL query can be edited or manipulated
  • New columns can be added after the last existing column in the query
  • Existing columns and column types cannot be edited or removed

Example

Below are the modifications to the original view_test1 materialized view. Notice the addition of the age column.

/**
* @datacoral
* @matview-schema john_doe
* @matview-name view_test1
* @matview-type regular
* @interval-in-minutes 15
* @dist-key distkey(column3)
* @sort-keys interleaved sortkey(column1, column2)
*/
select id, fullname, age from jane_doe.registers

Step 3. Create an Update Set

An update set is represented by a manifest file stored on a local disk. This manifest file contains two pieces of information for each materialized view in the update set:

  • The action type to be performed on the materialized view
  • The file path to the DPL file in which the updates are saved

NOTES:

  • The only action type currently available is "update"
  • Each update set manifest can only contain up to one action for a given materialized view
  • An update set manifest can contain actions for multiple different materialized views

The modified materialized view configuration can be added to the update set by running the following command.

datacoral organize matview-update-local --dpl-file <mv-file-name>.dpl
  • mv_file_name: The path to the file in which the configuration was saved during step 1

Example

In the example of the configuration in step 3, the following command will add the modified configuration to the update set.

datacoral organize matview-update-local --dpl-file view_test1.dpl

Step 4. Stage the Updates

The materialized view updates will be staged in your installation by running the following command.

datacoral organize matview-update-stage

Example

Rounding out the example of updating a materialized view is the command to stage the modifications.

datacoral organize matview-update-stage

Please note that in this example, the timelabel during which to stage these updates will be automatically computed. For more information regarding optional control parameters, please read the Organize CLI referrence.

Delete a Materialized View

A materialized view can be deleted from an installation by running the following command

datacoral organize matview-delete --matview-schema <schema-name> --matview-name <view-name>
  • schema-name: Name of the schema used to create the materialized view
  • view-name: Name that uniquely identifies the materialized view

Example

In order to delete the materialized views created over the course of this page, the commands below are run.

datacoral organize matview-delete --matview-schema john_doe --matview-name view_test1
datacoral organize matview-delete --matview-schema john_doe --matview-name view_test2