Historical Sync CLI Guide

Using Datacoral CLI historical syncs can be triggered. Here is the list of commands that can be used for it.

Historical sync command

$ datacoral collect historical-sync -h
Usage: historical-sync [options]
carry out various operations around historical-sync
Options:
--slice-name <slice-name> name of the deployed slice
--loadunits [loadunits] comma seperated list of one or more loadunits. Default is all loadunits of the slice
--status get the historical sync status of loadunit(s)
--detailed get detailed historical sync status of loadunit(s). Applicable only with '--status'
-h, --help output usage information

Picking the right backfill for your usecase

  • Backfills are typically carried out in case of data discrepancy between source and destination.

  • It is extremely critical that the correct type of backfill is chosen based on the type of discrepency.

  • Below is the list of 4 types of backfills and the scnearios in which each of them should be used -

    1. FULL backfill - When either the table that needs to be backfilled is pretty small or if FULL backfill was never successfully carried out in the past
    2. PARTIAL_TIME_WINDOW backfill - When there is a small time window for which data did not sync AND the source table is pretty large to carry out full backfill
    3. PARTIAL_PK_LIST backfill - When there are just a handfull of primary keys which are not correctly synced to the destination
    4. COLUMN backfill - When the issue is only there with one or more columns of the table and the source table is pretty large to carry out full backfill

Trigger partial backfill/partial historical sync for the connector

Create a control file

The control yaml file needs be of the below format with multiple loadunits and multiple types.

note

Backfill and historical sync are used interchangeably and mean the same.

# For full historical sync of the loadunit
- <loadunit_1>:
- type: FULL
# For historical sync of particular column in the loadunit
- <loadunit_2>:
- type: COLUMN
- columns:
- <column_1>
- <column_2>
# For historical sync of particular time window in the loadunit
- <loadunit_3>:
- type: PARTIAL_TIME_WINDOW
- startTimestamp: <start_timestamp_value> # ex: 2021-01-10 17:45 +00:00
- endTimestamp: <end_timestamp_value> # ex: 2021-01-10 18:45 +00:00
- timestampCol: <timestamp_column_name> # ex: date
# For full historical sync for a given list of primary keys in the loadunit
- <loadunit_4>:
- type: PARTIAL_PK_LIST
- timestampCol: <timestamp_column_name> # ex: date
- primaryKeys:
- <primary_key_column_name>: <primary_key_column_value_1> #ex: id: 123456
- <primary_key_column_name>: <primary_key_column_value_2> #ex: id: 456789

Trigger partial backfill

datacoral collect historical-sync --slice-name <connector-name> --control-file <path-to-control-file>

Example

datacoral collect historical-sync --slice-name mysqlcdc_1610347509533 --control-file ~/Documents/scratchpad/ctrl3.yaml

Trigger full historical sync for all loadunits in the connector

Historical syncs for all the tables in a connector

datacoral collect historical-sync --slice-name <connector-name>

Example

datacoral collect historical-sync --slice-name cdc_aaa01

Output

2020-11-06T17:43:05-08:00 - info: 'historical-sync' request for loadunit 'table_110' has been submitted
2020-11-06T17:43:08-08:00 - info: 'historical-sync' request for loadunit 'table_119' has been submitted
...
2020-11-06T17:43:16-08:00 - info: 'historical-sync' request for loadunit 'table_112' has been submitted
2020-11-06T17:43:18-08:00 - info: 'historical-sync' request for loadunit 'table_111' has been submitted
2020-11-06T17:43:20-08:00 - info: 'historical-sync' request for loadunit 'table_114' has been submitted
2020-11-06T17:43:21-08:00 - info: 'historical-sync' request for loadunit 'table_113' has been submitted

Trigger full historical sync for specific loadunits in the connector

Historical syncs for specific tables in a connector

datacoral collect historical-sync --slice-name <connector-name> --loadunits <loadunits>

Example

datacoral collect historical-sync --slice-name cdc_aaa01 --loadunits table_110,table_114

Output

2020-11-06T18:43:05-08:00 - info: 'historical-sync' request for loadunit 'table_110' has been submitted
2020-11-06T18:43:08-08:00 - info: 'historical-sync' request for loadunit 'table_114' has been submitted

Get status for all loadunits in a connector

datacoral collect historical-sync --slice-name <slice-name> --status

Example

datacoral collect historical-sync --slice-name cdc_aaa01 --status

Output

Obtained historical sync status for 'table_11'
...
Obtained historical sync status for 'table_113'
Writing out status object
[
{
"loadunit": "table_11",
"status": "SUCCESS"
},
...
{
"loadunit": "table_113",
"status": "INPROGRESS"
}
]
Done writing status object

Get status for specific loadunit

datacoral collect historical-sync --slice-name <slice-name> --loadunits <loadunits> --status

Example

datacoral collect historical-sync --slice-name cdc_aaa01 --loadunits 'table_11, table_114, table_119'
--status

Output

Obtained historical sync status for 'table_11'
Obtained historical sync status for 'table_114'
Obtained historical sync status for 'table_119'
Writing out status object
[
{
"loadunit": "table_11",
"status": "SUCCESS"
},
{
"loadunit": "table_114",
"status": "INPROGRESS"
},
{
"loadunit": "table_119",
"status": "SUCCESS"
}
]
Done writing status object

Get detailed historical sync status of all loadunits

datacoral collect historical-sync --slice-name <slice-name> --status --detailed

Example

datacoral collect historical-sync --slice-name cdc_aaa01 --status --detailed

Output

Obtained historical sync status for 'table_11'
Obtained historical sync status for 'table_110'
[
{
"loadunit": "table_11",
"status": "SUCCESS",
"targetWarehouses": [
"redshift"
],
"timelabels": [
{
"startTime": "2020-11-07 01:43:02:740 +00:00",
"version": null,
"timelabel": "20201107011500",
"executionContext": {},
"endTime": "2020-11-07 01:43 +00:00",
"sliceName": "cdc_aaa01",
"durationInMillis": 0,
"updateTime": "2020-11-07 01:43 +00:00",
"status": "SUCCESS",
"key": "backfillmanager|cdc_aaa01|table_11",
"reason": null,
"operation": "BACKFILLMANAGER"
},
...
]
},
...
{
"loadunit": "table_114",
"status": "INPROGRESS",
"targetWarehouses": [
"redshift"
],
"timelabels": [
{
}
]
}
]
Done writing status object

Get detailed sync status of specific loadunits

datacoral collect historical-sync --slice-name <slice-name> --loadunits <loadunits> --status --detailed

Example

datacoral collect historical-sync --slice-name cdc_aaa01 --loadunits 'table_11, table_114, table_119' --status --detailed

Output

Obtained historical sync status for 'table_11'
Obtained historical sync status for 'table_114'
Writing out status object
[
{
"loadunit": "table_11",
"status": "SUCCESS",
"targetWarehouses": [
"redshift"
],
"timelabels": [
{
"startTime": "2020-11-07 01:43:02:740 +00:00",
"version": null,
"timelabel": "20201107011500",
"executionContext": {},
"endTime": "2020-11-07 01:43 +00:00",
"sliceName": "cdc_aaa01",
"durationInMillis": 0,
"updateTime": "2020-11-07 01:43 +00:00",
"status": "SUCCESS",
"key": "backfillmanager|cdc_aaa01|table_11",
"reason": null,
"operation": "BACKFILLMANAGER"
},
...
]
},
...
{
"loadunit": "table_114",
"status": "INPROGRESS",
"targetWarehouses": [
"redshift"
],
"timelabels": [
{
}
]
}
]
Done writing status object