Google Spreadsheet JSON Guide

Add a new connector

In the UI click on the Json Editor and add the loadunit form parameters as key value pairs to add the connector.

  • loadunitName : Edit the name of the loadunit here
  • spreadsheetId : Id of the Google Spreadsheet which lies in between d/ and /edit of the url
  • sheetName: Name of the Sheet, located at the bottom of the screen
  • range: Comma separated ranges of columns (data only) to fetch, expressed in A1 notation. Ex: A1:1
  • columnNameRange: Comma separated ranges in which column headers are located, expressed in A1 notation. Ex: A2:H
  • executionMode : Should be 'snapshot'
  • fieldOrder: List of Columns (specify in case there is no header row in the source sheet) ex. [\"col 1\",\"col 2\",\"col 3\"]
  • notificationEmail : Email address to send notification

Update existing connector

Step 1: Update the connector

Click on update in the top right corner of the loadunit list page

Step 2: Edit the configuration

You will be redirected to the JSON editor page. Here is a guide on how to edit this file.

A loadunit can be updated in one of the two ways

1) Extraction configuration

2) Load configuration

Use cases for configuration

Use caseExtract modeLoad mode
Need only the latest dataSnapshotReplace
Need all instances of past dataSnapshotAppend
Need all the data for analysisIncrementalAppend
Need all the data with the latest changesIncrementalMerge
note
  • Extraction modes can be paginated in case of higher volumes of data
  • Merge load mode will necessarily need a primary key, but its recommended to provide the timestamp column as well
  • If the table is large and expected to sync frequently, snapshot is not recommended

Extract configuration:

This pertains to the extraction mode, the frequency of extraction and pagination. The extraction configuration of the loadunit is found in inputParams > loadunits

The following parameters can be added or edited. The extractionmode parameter can take two values - snapshot and incremental

ParameterDescriptionSS(P)
scheduleParameter will take input in cron format, setoo
paginateTakes boolean input, true when paginatingm
pageSizeTakes numeric input indicating size of pagem
pageSortColUnique key (usually primary key)m
dataformatThe format of the extracted dataoo
columnlistComma-separated column names that needs to be includedoo
columnBlacklistComma-separated column names that needs to be excludedoo
extractionqueryA SQL query to extract at the sourceoo
verifyrecordsTakes a boolean input to verify if the number of records fetched is equal to the number of records in the sourceoo
lagInMinutesLag in minutes before starting the loading of dataoo
redshiftTimestampColParameter used to compare latest value of timestampCol at the source and redshift warehouseoo

Legend:

  • S - snapshot mode
  • S(P) - snapshot mode with pagination
  • m - mandatory parameter
  • o - optional parameter
note
  • Changing the schedule is likely to impact downstream dependencies
  • It is recommended to keep lagInMinutes as a multiple of schedule interval

Load configuration:

This pertains to load configuration onto the warehouse. The load configuration of the loadunit is found in loaderConfig > loadunit

The following parameters can be added or edited. There are three load modes : merge, update and append to learn more please visit features and capabilities tab.

ParameterDescriptionMergeReplaceAppend
targetTableName the target table in the warehouse in the format <connector-name>.<target-table-name>ooo
dataFormatFormat of the loaded data, ex:parquet, json, avro, orc, csv (warehouse specific)ooo
copyOptionsClick redshift, snowflake to learn moreooo
TruncateTargetTableTakes boolean input to indicate load mode. It is true for replace,false for append and merge modesmmm
IsUpdateBoolean, Only for merge load modem
PrimaryKeyColPrimary key of the sourcem
UpdateTSColTimestamp column which captures the updated timem
DeleteTSColTimestamp column which indicates record deletiono
softDeletesTakes boolean value, should be set as true to capture soft deleteso

Legend:

  • m - mandatory parameter
  • o - optional parameter
note
  • The new loadunit should be added in both inputParams and loaderConfig with all the mandatory fields
  • Timestamp column should be indexed and auto incremented

Ask a question

If you have questions or feedback, feel free to reach out at hello@datacoral.co or Request a demo