Redshift Overview

Redshift is one of the most popular analytics databases largely because of its cost of deployment and administration, but with Redshift you lose a lot compared with a commercial or self-managed solution. For example, Redshift does not offer features found in other data warehousing products like materialized views and time series tables. Datacoral is able to add those features to Redshift without disrupting your workflow.

Datacoral Managed-Redshift gives you the features you wish Redshift had built-in. Create materialized views and time-series tables automatically using SQL, Python, or our CLI.

The Managed-Redshift slice can be used alone to improve your Redshift experience, or with the rest of the Datacoral stack which provides Collection Slices to gather data and Harness Slices to take advantage of the data:

Managed Redshift Diagram

Features

Time Series Tables

Time series tables provide a scalable way to handle time series data. Instead of storing all the data in a single table it is partitioned by timestamp in multiple tables. This helps ensure that the queries run fast and simplifies managing the retention of the time series data — old data can easily be deleted from Redshift and then retrieved from S3 if needed again - maintaining the hygiene of Redshift, improving performance.

Tables are partitioned with a view that unions them:

events_20170910
2017-09-10 01:00:00	page_view
2017-09-10 02:01:00	registration
2017-09-10 12:11:00	click

events_20170911
2017-09-11 01:00:00	purchase
2017-09-11 01:22:00	click
2017-09-11 16:04:00	page_view

events_20170912
2017-09-12 00:00:02	page_view
2017-09-12 13:57:00	click
2017-09-12 18:44:00	click

CREATE VIEW events AS
  SELECT * FROM events_20170910
  UNION ALL
  SELECT * FROM events_20170911
  UNION ALL
  SELECT * FROM events_20170912;

Create materialized views in Redshift

Datacoral's materialized views combine the speed of creating a view with the speed of querying a real table. In addition our software takes care of updating these tables so you don't have to worry about writing update scripts or maintaining an ETL infrastructure, you just define the update frequency when you create the materialized view.

Materialized views can be created and managed either direcly using SQL in Redshift or using Datacoral’s CLI.

Resize Redshift without downtime

Redshift’s native resizing requires a Redshift cluster be be brought down for hours to days depending on the size of the cluster, which leads many companies to not resize at all. Datacoral enables a zero downtime Redshift cluster resize.

Monitor Redshift for bad queries

In Redshift one bad query can bring down your entire cluster. Use the queryTimeLimit deploy parameter when deploying the Managed-Redshift slice to indicate when queries can be terminated.

Audit all Redshift queries

Redshift only maintains a small window of queries though often more is needed. Datacoral persists all queries in stl_* tables in S3 for querying using Redshift and other tools. This is particularly valuable with GDPR (General Data Protection Regulation) and other data governance - by knowing who has made queries you can figure out who has had access to certain datasets.

Regular automated vacuuming and analyzing of tables

Intelligently identifies tables that need to be vacuumed and analyzed, and performs these actions regularly to maintain the health of the cluster.

Hive integration

Datacoral also offers a Managed-Hive Slice. If both are running Datacoral enables tables to be unloaded from Redshift and brought into Hive and vice-versa. This helps automate analytics and machine learning workflows within the Datacoral system.

Self-service CLI interface.

The Managed-Redshift slice uses a CLI interface to enable you to manage your Redshift deployment. In addition, it can be used to define and/or change distribution and sort keys on existing tables and takes care of all dependents (both views and grants).

Features#

Time Series Tables#

Create materialized views in Redshift#

Resize Redshift without downtime#

Monitor Redshift for bad queries#

Audit all Redshift queries#

Regular automated vacuuming and analyzing of tables#

Hive integration#

Self-service CLI interface.#