Materialized Views in Athena

This series of commands will show the usage the following matview CLI commands:

  • matview-create
  • matview-stats
  • matview-describe
  • matview-list
  • matview-pause
  • matview-resume
  • matview-delete

Note:

  • Only timeseriesio materialized views are supported in athena.
  • Currently we only support CSV and JSON storage formats.

We will create a table in Glue data catalog (GDC) and construct athena materialized view on top of it.

Create a table in Glue data catalog using athena query

CREATE EXTERNAL TABLE IF NOT EXISTS datacoral_secure_website.events (
`user_id` string,
`event_name` string,
`c` string
) PARTITIONED BY (
y string,
m string,
d string,
h string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('serialization.format' = '1' ) LOCATION 's3://secure.datacoral/website/events/';

Create a timeseriesio MV using DPL file

/**
* @datacoral
* @matview-schema datacoral_secure_website
* @matview-name page_loads
* @matview-type timeseriesio
* @interval-in-minutes 5
* @slice-name athena
* @storage-format CSV
*/
SELECT user_id,
event_name,
event_name
FROM datacoral_secure_website.events account
WHERE event_name = 'page_load'
AND account.y = '{YYYY}'
AND account.m = '{MM}'
AND account.d = '{DD}'
AND account.h = '{HH}'
AND account.n = '{mm}';

Save as page_loads_mv.dpl.

Run the following CLI command:

datacoral organize matview-create \
--dpl-file page_loads_mv.dpl

Returns

Materialized view has been successfully created. Use the following command to know the status of the materialized view:
datacoral organize matview-stats --matview-schema datacoral_secure_website --matview-name page_loads --slice-name athena

Check datacoral organize matview-stats

datacoral organize matview-stats \
--matview-schema datacoral_secure_website \
--matview-name page_loads \
--slice-name athena

Returns

[
{
"matview-schema": "datacoral_secure_website",
"matview-name": "page_loads",
"timelabel": "20191101220000",
"status": "SUCCESS",
"reason": null,
"start-time": "2019-11-01 22:05:31:970 +00:00",
"end-time": "2019-11-01 22:05:32:640 +00:00",
"last-updated-at": "2019-11-01 22:05:32:640 +00:00",
"duration (ms)": 670,
"executionContext": {
"dependencies": {
"athena|datacoral_secure_website|events": {
"expectedTimelabels": [
"20191101220000"
],
"timelabelObjects": [
{
"timelabel": "20191101220000",
"success": true
}
],
"intervalInMinutes": 5
}
}
}
}
]

Check datacoral organize matview-describe

Returns JSON by default

datacoral organize matview-describe \
--matview-schema customer_cli_test \
--matview-name page_loads \
--slice-name athena
{
"interval-in-minutes": 5,
"slice-name": "athena",
"matview-type": "timeseriesio",
"matview-schema": "datacoral_secure_website",
"matview-name": "page_loads",
"matview-definition": "SELECT user_id, event_name, event_name FROM datacoral_secure_website.events account WHERE event_name = 'page_load' AND account.y = '{YYYY}' AND account.m = '{MM}' AND account.d = '{DD}' AND account.h = '{HH}' AND account.n = '{mm}';",
"currentStatus": "Running"
}

Returns DPL file

datacoral organize matview-describe \
--matview-schema customer_cli_test \
--matview-name page_loads \
--slice-name athena \
--output-format DPL
/**
* @datacoral
* @matview-schema datacoral_secure_website
* @matview-name page_loads
* @matview-type timeseriesio
* @interval-in-minutes 5
* @slice-name athena
* @storage-format CSV
*/
SELECT user_id,
event_name,
event_name
FROM datacoral_secure_website.events account
WHERE event_name = 'page_load'
AND account.y = '{YYYY}'
AND account.m = '{MM}'
AND account.d = '{DD}'
AND account.h = '{HH}'
AND account.n = '{mm}';

Check datacoral organize matview-list

datacoral organize matview-list --slice-name athena

Returns

[{"matview-schema": "datacoral_secure_website",
"matview-name": "page_loads"}]

Check datacoral organize matview-pause

datacoral organize matview-pause \
--matview-schema datacoral_secure_website \
--matview-name page_loads \
--slice-name athena

Returns

Successfully paused datacoral_secure_website.page_loads materialized view

Check datacoral organize matview-resume

datacoral organize matview-resume \
--matview-schema datacoral_secure_website \
--matview-name page_loads \
--slice-name athena

Returns

Successfully resumed datacoral_secure_website.page_loads materialized view

Check datacoral organize matview-delete

datacoral organize matview-delete \
--matview-schema datacoral_secure_website \
--matview-name page_loads \
--slice-name athena

Returns

Materialized view has been successfully deleted