Non-Datacoral Collect Slice

Overview

There might be cases where tables are not populated into warehouses by Datacoral like when the tables are being loaded by other tools or manually. In such cases, Datacoral allows you to include these tables as dependencies in materialized views by first declaring such tables as non-datacoral tables. The way to do that is by adding a nondatacoral slice corresponding to the schema where such tables are being populated.

Note that there needs to be one nondatacoral slice per schema since it allows for logical separation of tables in different schemas. Each non-datacoral table becomes a loadunit within the slice.

A schedule can be specified at a slice level or at a loadunit level. Once the slice is added, it just creates SUCCESS timelabels for those loadunits based on the timelabels. Given that the slice has no visibility into the actual data within the tables, even though there is a SUCCESS timelabel, there are no guarantees on the freshness of the data in the tables.

So, be aware of how often the table is being updated by an external tool or manually and specify the appropriate schedule to the non-datacoral slice.

Steps to add this slice to your installation

The steps to launch your slice are:

  1. Add the schedule
  2. Add list of tables as loadunits, specify datalayout
  3. Add the Non-Datacoral slice

Setup instructions

  1. Create parameters_file.json with all loadunits

  2. To get a template for the Non-Datacoral slice configuration save the output of the describe --input-parameters command as follows:

datacoral collect describe --slice-type nondatacoral \
--input-parameters > nondatacoral_parameters_file.json

Input parameters:

  • schedule - in cron format (note: you can specify different schedules for different loadunits)

Modify the nondatacoral_parameters_file.json file to add the loadunits you need.

3. Add the Slice

datacoral collect add --slice-type nondatacoral --slice-name <slice-name> --parameters-file <params-file>
  • slice-name Name of your slice. A schema with your slice-name is automatically created in your warehouse
  • params-file File path to your input parameters file. Ex. nondatacoral_parameters_file.json

Supported load units

Notes

By default, the slice runs daily. If desired, you can change the slice configuration and specify different schedules for the revisions and revisions loadunits.

Slice output

No data stored in AWS S3

AWS Redshift: Schema - schema name will be same as a slice-name. Tables - will be the list of loadunits specified in the slice. Note that these tables already exist and are readable by the datacoral user

Questions? Interested?

If you have questions or feedback, feel free to reach out at hello@datacoral.co or Request a demo