Materialized Views in Redshift

These tests assume that the MVs work correctly, so any errors are due to the CLI commands and aren't MV errors.

This series of commands will show the usage the following matview CLI commands:

  • matview-create
  • matview-delete
  • matview-describe
  • matview-stats
  • matview-list
  • matview-pause-all
  • matview-resume-all

We will create a timeseries table of fake data called datacoral_test.fake_events. The timeseries, timeseriesio, and regular table will be based on this table. The timeserieso table will be based on the regular MV.

Create a timeseries table to use in the tests:

BEGIN;
DROP SCHEMA IF EXISTS customer_cli_test CASCADE;
CREATE SCHEMA customer_cli_test;
CREATE TABLE customer_cli_test._fake_events_201703020000
(user_id INT,
event_name VARCHAR,
event_timestamp TIMESTAMP);
CREATE TABLE customer_cli_test.fake_events
(user_id INT,
event_name VARCHAR,
event_timestamp TIMESTAMP);
CREATE VIEW customer_cli_test.fake_events_view AS
SELECT * FROM customer_cli_test.fake_events
UNION ALL
SELECT * FROM customer_cli_test._fake_events_201703020000;
INSERT INTO customer_cli_test._fake_events_201703020000 VALUES
(1, 'page_load', '2017-03-01 06:58:56'),
(1, 'registration', '2017-03-01 12:18:04'),
(3, 'registration', '2017-03-01 12:06:14'),
(3, 'page_load', '2017-03-01 10:00:52'),
(3, 'click_button', '2017-03-01 14:38:32'),
(1, 'registration', '2017-03-01 23:22:21');
INSERT INTO customer_cli_test.fake_events VALUES
(4, 'page_load', '2017-03-02 10:00:52'),
(4, 'click_button', '2017-03-02 14:38:32'),
(2, 'page_load', '2017-03-02 06:58:56'),
(4, 'registration', '2017-03-02 12:18:04'),
(2, 'page_load', '2017-03-02 22:06:14');
END;

Now you have a series of timeseries tables. We need these two tables to make sure that the MVs we create don't pick up the wrong data somehow.

Create different kinds of MVs

Create a timeseriesio MV

SELECT
user_id,
event_timestamp
FROM
customer_cli_test.fake_events_view
WHERE
event_name = 'page_load'
/*
This comment is here just to keep track of what we'll set the sort and dist keys to in the CLI command below.
sort_keys - interleaved sortkey(user_id, event_timestamp)
dist_key - distkey(user_id)
*/

Save as timeseriesio_page_loads_mv.sql.

Run the following CLI command:

datacoral organize matview-create \
--matview-schema customer_cli_test \
--matview-name page_loads \
--matview-type timeseriesio \
--interval-in-minutes 1440 \
--query file://timeseriesio_page_loads_mv.sql \
--sort-keys 'interleaved sortkey(user_id, event_timestamp)' \
--dist-key 'distkey(user_id)'

Create a timeseries MV

SELECT
user_id,
event_timestamp AS reg_timestamp
FROM
customer_cli_test.fake_events_view
WHERE
event_name = 'registration'
/*
sort_keys - interleaved sortkey(user_id, reg_timestamp)
dist_key - distkey(user_id)
*/

Save as timeseries_registrations_mv.sql.

Run the following CLI command:

datacoral organize matview-create \
--matview-schema customer_cli_test \
--matview-name registrations \
--matview-type timeseries \
--interval-in-minutes 1440 \
--query file://timeseries_registrations_mv.sql \
--sort-keys 'interleaved sortkey(user_id, reg_timestamp)' \
--dist-key 'distkey(user_id)'

Create regular MV

SELECT
event_timestamp::DATE AS reg_date,
COUNT(user_id) AS registrations
FROM
customer_cli_test.fake_events_view
WHERE
event_name = 'registration'
AND
event_timestamp::DATE = '2017-03-02'::DATE
GROUP BY
1

Save as regular_daily_snapshot_mv.sql.

Run the following CLI command:

datacoral organize matview-create \
--matview-schema customer_cli_test \
--matview-name daily_snapshot \
--matview-type regular \
--interval-in-minutes 1440 \
--query file://regular_daily_snapshot_mv.sql \
--sort-keys 'interleaved sortkey(reg_date)' \
--dist-key 'distkey(reg_date)'

Create timeserieso MV

Initially the mv_company_metrics table will be identical to the mv_daily_snapshot table.

SELECT
reg_date,
registrations
FROM
customer_cli_test.mv_daily_snapshot;

Save as timeserieso_company_metrics_mv.sql.

Run the following CLI command:

datacoral organize matview-create \
--matview-schema customer_cli_test \
--matview-name company_metrics \
--matview-type timeserieso \
--interval-in-minutes 1440 \
--query file://timeserieso_company_metrics_mv.sql

Check that the datacoral organize matview-create commands above did actually work:

SELECT
tablename
FROM
pg_table_def
WHERE
schemaname = 'customer_cli_test';

Returns mv_page_loads, mv_registrations, mv_daily_snapshot, mv_company_metrics.

Check that the sort and dist keys were created correctly:

SELECT
sortkey1,
sortkey_num,
diststyle
FROM
SVV_TABLE_INFO
WHERE
"schema" = 'customer_cli_test'
AND
"table" = 'mv_registrations'

This should return sortkey1=reg_date, sortkey_num = 1, and diststyle=reg_date.

SELECT
sortkey1,
sortkey_num,
diststyle
FROM
SVV_TABLE_INFO
WHERE
"schema" = 'customer_cli_test'
AND
"table" = 'company_metrics'

This should return empty sortkey1, sortkey_num, and diststyle fields.

Check datacoral organize matview-list

datacoral organize matview-list

Returns

[{"matview-schema": "customer_cli_test",
"matview-name": "mv_page_loads",
"redshift-table-name": "customer_cli_test.mv_page_loads"},
{"matview-schema": "customer_cli_test",
"matview-name": "mv_registrations",
"redshift-table-name": "customer_cli_test.mv_registrations"},
{"matview-schema": "customer_cli_test",
"matview-name": "mv_daily_snapshot",
"redshift-table-name": "customer_cli_test.mv_daily_snapshot"},
{"matview-schema": "customer_cli_test",
"matview-name": "mv_company_metrics",
"redshift-table-name": "customer_cli_test.mv_company_metrics"}]

Check datacoral organize matview-describe

datacoral organize matview-describe \
--matview-schema customer_cli_test \
--matview-name page_loads

Returns

{
"definition-view-name": "def_mv_1440_timeseriesio_page_loads",
"sort-key": "interleaved sortkey(user_id, event_timestamp)",
"dist-key": "distkey(user_id)",
"interval-in-minutes": 1440,
"matview-type": "timeseriesio",
"matview-schema": "customer_cli_test",
"matview-name": "page_loads",
"matview-definition": "SELECT events.user_id, events.event_timestamp FROM customer_cli_test.fake_events_view events WHERE ((events.event_name)::text = 'page_load'::text)\n/*\nsort_keys - interleaved sortkey(user_id, event_timestamp)\ndist_key - distkey(user_id)\n*/;",
"redshift-table-name": "customer_cli_test.mv_page_loads",
}

Check datacoral organize matview-describe --dependencies

datacoral organize matview-describe \
--matview-schema customer_cli_test \
--matview-name page_loads \
--dependencies

Returns

[ { table-name: 'fake_events_view',
schema-name: 'customer_cli_test' } ]

Check datacoral organize matview-stats

datacoral organize matview-stats \
--matview-schema customer_cli_test \
--matview-name page_loads

Returns

datacoral organize matview-stats --matview-name daily_snapshot --matview-schema customer_cli_test --from 2018-06-01T00:00:00 --to 2018-07-01T00:00:00
[
{
"matview-schema": "customer_cli_test",
"matview-name": "mv_daily_snapshot",
"timelabel": "20180604000000",
"status": "WAITING_FOR_DEPENDENCIES",
"reason": "Materialized view is waiting for its dependencies to get updated. Dependencies already updated - {\"customer_cli_test.fake_events_view\":[\"20180604000000\",\"20180604003000\",\"20180604010000\",\"20180604013000\",\"20180604020000\",\"20180604023000\",\"20180604030000\",\"20180604033000\",\"20180604040000\",\"20180604043000\",\"20180604050000\",\"20180604053000\",\"20180604060000\",\"20180604063000\",\"20180604070000\",\"20180604073000\",\"20180604080000\",\"20180604083000\",\"20180604090000\",\"20180604093000\",\"20180604100000\",\"20180604103000\",\"20180604110000\",\"20180604113000\",\"20180604120000\",\"20180604123000\",\"20180604130000\",\"20180604133000\",\"20180604140000\",\"20180604143000\",\"20180604150000\",\"20180604153000\",\"20180604160000\",\"20180604163000\",\"20180604170000\",\"20180604173000\",\"20180604180000\"]} Dependencies to be updated - {\"customer_cli_test.fake_events_view\":[\"20180604183000\",\"20180604190000\",\"20180604193000\",\"20180604200000\",\"20180604203000\",\"20180604210000\",\"20180604213000\",\"20180604220000\",\"20180604223000\",\"20180604230000\",\"20180604233000\"]}"
},
{
"matview-schema": "customer_cli_test",
"matview-name": "mv_daily_snapshot",
"timelabel": "20180603000000",
"status": "SUCCESS",
"reason": "Materialized view successfully updated",
"start-time": "2018-06-04 00:02:06:386 +00:00",
"end-time": "2018-06-04 00:02:21:998 +00:00",
"duration (ms)": 15612
},
{
"matview-schema": "customer_cli_test",
"matview-name": "mv_daily_snapshot",
"timelabel": "20180602000000",
"status": "SUCCESS",
"reason": "Materialized view successfully updated",
"start-time": "2018-06-03 00:02:39:953 +00:00",
"end-time": "2018-06-03 00:14:30:959 +00:00",
"duration (ms)": 711006
},
{
"matview-schema": "customer_cli_test",
"matview-name": "mv_daily_snapshot",
"timelabel": "20180601000000",
"status": "SUCCESS",
"reason": "Materialized view successfully updated",
"start-time": "2018-06-02 00:01:56:345 +00:00",
"end-time": "2018-06-02 00:02:08:615 +00:00",
"duration (ms)": 12270
}
]

Check datacoral organize matview-pause-all

datacoral organize matview-pause-all

This and the resume-all commands will have to be tested around the update time. Or another method must be used.

Check datacoral organize matview-resume-all

datacoral organize matview-resume-all

Check datacoral organize matview-delete

datacoral organize matview-delete \
--matview-schema customer_cli_test \
--matview-name registrations

In Redshift the following commands should return zero results.

SELECT
tablename
FROM
pg_table_def
WHERE
schemaname = 'customer_cli_test'
AND
tablename = 'mv_registrations';
SELECT
*
FROM
datacoral.mv_configuration
WHERE
schema = 'customer_cli_test'
AND
mat_view_name = 'mv_registrations';

Clean up - delete the remaining test tables and drop the schema

datacoral organize matview-delete \
--matview-schema customer_cli_test \
--matview-name page_loads
datacoral organize matview-delete \
--matview-schema customer_cli_test \
--matview-name daily_snapshot
datacoral organize matview-refresh \
--matview-schema customer_cli_test \
--matview-name company_metrics

In Redshift:

DROP SCHEMA customer_cli_test CASCADE;