Connector Commands

Use the Datacoral Collect CLI to add and manage Collect slices. You can use the CLI to get a list of the different Collect slices, learn how they work, add/update them. Here is a guide on how to get started with Datacoral CLI.

Table of contents


Slice Types

List Slice Types

To list out different types of slices provided by Datacoral, use the list-slice-types command.

datacoral collect list-slice-types

Describe Slices Type

Outputs information about what data a particular Collect slice collects, information needed to hook it up and where to get that information (API keys, connection parameters, etc), and where the data ends up in S3 and Redshift. The options can be used to look up particular information like only the input parameters or the output parameters of the specific slice type.

datacoral collect describe --slice-type <slice-type> \
[--input-parameters] [--output-parameters] \
[--docs]

Example

datacoral collect describe --slice-type mysql --output-parameters
Datasource output resources:
S3 bucket: installation-name.datacoral/slice-name
S3 bucket keys for each load unit:
installation-name.datacoral/slice-name/loadunit1
installation-name.datacoral/slice-name/loadunit2
installation-name.datacoral/slice-name/loadunit1/y=year/m=month/d=day/h=hour/data.json
Redshift resources:
schema: slice-name,
tables: loadunit1, loadunit2

Format the parameter file

datacoral collect format --parameter-file <params-file> [--overwrite]

You can format the parameter file content as consistent to describe command format.

Add a collect slice

Add the slice and start ingesting data. You will need to specify a unique name for the slice in order to refer to it once it has been added.

datacoral collect add --slice-type <slice-type> --slice-name <slice-name> \
[--parameters-file <params-file>]

Step 1: Download the params template

datacoral collect describe --slice-type mysql --input-parameters > my_mysql.json

Step 2: Populate endpoint information

Note: The schedule parameter has the cron format. We limit the granularity to be up to 5 minutes. If you want to schedule more frequently please contact support@datacoral.co.

{
"database": "my_mysql_database",
"host": "rds_...aws.com",
"port": 3306,
"username": "martha",
"password": "marthas_password_123",
"schedule": "0 0 * * *",
"tableWhiteList": ["table1", "table2"],
"tableBlackList": ["table3", "table4"],
}

Step 3: Add the slice

Provide a slice name parameter. Note that the slice name uniquely identifies the added slice.

datacoral collect add --slice-type mysql --slice-name my_mysql \
--parameters-file mysql-input-params.json

List installed slices

Lists slices installed in your installation

datacoral <slice-category> list

Describe the added collect slice

datacoral describe --slice-name my_mysql

Download the configurations of all the collect slices

datacoral collect download --download-dir <download-dir> [--overwrite]

NOTES:

  • download-dir - Path to download directory. This directory should be empty unless --overwrite option has been specified.
  • overwrite - Overwrite content of the download directory if it is not empty.
  • Each collect slice will be downloaded into an individual json file.

Update a collect slice

You can update the configuration of the slice that has already been added.

Step 1: Fetch the slice configuration

datacoral describe --slice-name my_mysql > my_mysql.json

Step 2: Modify the configuration

In the example below, the tableWhiteList has been changed to a SQL LIKE-compatible pattern.

{
"sliceName": "mysql",
"database": "my_mysql_database",
"host": "rds_...aws.com",
"port": 3306,
"username": "martha",
"secretKey": "marthas_password_123",
"schedule": "0 0 * * *",
"tableWhiteList": ["tablepatternschema.%"],
"tableBlackList": ["table3", "table4"],
}

Step 3: Update the slice

A. Update only the slice software

datacoral collect update --slice-name <slice-name>

B. Update only the configuration

datacoral collect update --slice-name <slice-name> --configuration-only [--parameters-file <params-file>]

C. Update slice software and the configuration

datacoral collect update --slice-name <slice-name> --parameters-file <params-file>

Example

datacoral collect update --slice-name my_mysql --configuration-only \
--parameters-file my_mysql.json

Remove a slice

datacoral remove --slice-name my_mysql

Pause and Resume slices

Pause a given slice

datacoral collect pause --slice-name <slice-name> \
[--loadunit <loadunit-name>] [--start-time <start-time>]

Pauses the data sync for a given slice name. You can optionally specify load units that need to be paused for the slice. You can also optionally specify the start time from when the data sync should be paused for the slice/load unit (eg: --start-time "2018-01-05 19:14 +00:00"). When no loadunit is specified, all the loadunits for the slice are paused. When no start time is specified, it defaults to the current time.

Resume a given slice

datacoral collect resume --slice-name <slice-name> \
[--loadunit <loadunit-name>] [--start-time <start-time>]

Resumes the data sync for a given slice name that was paused earlier. You can optionally specify load units that need to be resumed for the paused slice. You can also optionally specify the start time from when the data sync should be resumed for the slice/load unit (eg: --start-time "2018-01-05 19:14 +00:00"). When no loadunit is specified, all the loadunits for the slice are resumed. When no start time is specified, it defaults to the current time.

Pause all slices

datacoral collect pauseall [--start-time <start-time>]

Pauses the data sync for all the slices in the installation. You can specify the start time from when the data sync should be paused for all the slices (eg: --start-time "2018-01-05 19:14 +00:00"). When no start time is specified, it defaults to the current time.

Resume all slices

datacoral collect resumeall [--start-time <start-time>]

Resumes the data sync for all the slices that were paused in the installation. You can specify the start time from when the data sync should be resumed for all the slices (eg: --start-time "2018-01-05 19:14 +00:00"). When no start time is specified, it defaults to the current time.

Reprocess Slice

There are a number of ways we can reprocess loadunits of a slice:

datacoral collect reprocess --slice-name <slice-name>
-name>
Use case<reprocess-command>
Reprocess only a subset of loadunits--loadunits
First timelabel for which reprocess should be carried out--start-timelabel
Optional last timelabel for which reprocess should be carried out--end-timelabel
Optional flag to forcefully reprocess a successful timelabel--force
Optional flag to reprocess as a single time-interval from 'start-timelabel' to 'end-timelabel'--ignore-schedule
  • '--loadunits' when not specified, reprocesses all the loadunits in a slice
  • '--end-timelabel' when not specified, reprocesses the single timelabel sepecified by '--start-timelabel'
  • '--ignore-schedule' is typically used to carry out a backfill over a wider window as a single time-interval instead of having multiple invocations

The reprocess commands

Reprocess all the loadunits in a slice for one timelabel

datacoral collect reprocess --slice-name <slice-name> --start-timelabel <YYYYMMDDHHmmss>

Reprocess specific loadunits in a slice for one timelabel

datacoral collect reprocess --slice-name <slice-name> --loadunits [loadunits] --start-timelabel <YYYYMMDDHHmmss>

Reprocess specific loadunits in a slice for all timelabels between start and end-timelabels (both inclusive)

datacoral collect reprocess --slice-name <slice-name> --loadunits [loadunits] --start-timelabel <YYYYMMDDHHmmss> --end-timelabel <YYYYMMDDHHmmss>

Reprocess specific loadunits in a slice as a single time-interval

datacoral collect reprocess --slice-name <slice-name> --loadunits [loadunits] --start-timelabel <YYYYMMDDHHmmss> --end-timelabel <YYYYMMDDHHmmss> --ignore-schedule