Redshift is one of the most popular analytics databases largely because of its cost of deployment and administration, but with Redshift you lose a lot compared with a commercial or self-managed solution. For example, Redshift does not offer features found in other data warehousing products like materialized views and time series tables. Datacoral is able to add those features to Redshift without disrupting your workflow.
Datacoral Managed-Redshift gives you the features you wish Redshift had built-in. Create materialized views and time-series tables automatically using SQL, Python, or our CLI.
The Managed-Redshift slice can be used alone to improve your Redshift experience, or with the rest of the Datacoral stack which provides Collection Slices to gather data and Harness Slices to take advantage of the data:
Time Series Tables
Time series tables provide a scalable way to handle time series data. Instead of storing all the data in a single table it is partitioned by timestamp in multiple tables. This helps ensure that the queries run fast and simplifies managing the retention of the time series data — old data can easily be deleted from Redshift and then retrieved from S3 if needed again - maintaining the hygiene of Redshift, improving performance.
Tables are partitioned with a view that unions them:
CREATE VIEW events AS SELECT * FROM events_20170910 UNION ALL SELECT * FROM events_20170911 UNION ALL SELECT * FROM events_20170912;
Create materialized views in Redshift
Datacoral's materialized views combine the speed of creating a view with the speed of querying a real table. In addition our software takes care of updating these tables so you don't have to worry about writing update scripts or maintaining an ETL infrastructure, you just define the update frequency when you create the materialized view.
Resize Redshift without downtime
Redshift’s native resizing requires a Redshift cluster be be brought down for hours to days depending on the size of the cluster, which leads many companies to not resize at all. Datacoral enables a zero downtime Redshift cluster resize.
Monitor Redshift for bad queries
In Redshift one bad query can bring down your entire cluster. Use the
queryTimeLimit deploy parameter when deploying the Managed-Redshift slice to indicate when queries can be terminated.
Audit all Redshift queries
Redshift only maintains a small window of queries though often more is needed. Datacoral persists all queries in
stl_* tables in S3 for querying using Redshift and other tools. This is particularly valuable with GDPR (General Data Protection Regulation) and other data governance - by knowing who has made queries you can figure out who has had access to certain datasets.
Regular automated vacuuming and analyzing of tables
Intelligently identifies tables that need to be vacuumed and analyzed, and performs these actions regularly to maintain the health of the cluster.
Datacoral also offers a Managed-Hive Slice. If both are running Datacoral enables tables to be unloaded from Redshift and brought into Hive and vice-versa. This helps automate analytics and machine learning workflows within the Datacoral system.
Self-service CLI interface.
The Managed-Redshift slice uses a CLI interface to enable you to manage your Redshift deployment. In addition, it can be used to define and/or change distribution and sort keys on existing tables and takes care of all dependents (both views and grants).