Google Spreadsheet JSON Guide
Add a new connector
In the UI click on the Json Editor and add the loadunit form parameters as key value pairs to add the connector.
- loadunitName : Edit the name of the loadunit here
- spreadsheetId : Id of the Google Spreadsheet which lies in between d/ and /edit of the url
- sheetName: Name of the Sheet, located at the bottom of the screen
- range: Comma separated ranges of columns (data only) to fetch, expressed in A1 notation. Ex: A1:1
- columnNameRange: Comma separated ranges in which column headers are located, expressed in A1 notation. Ex: A2:H
- executionMode : Should be 'snapshot'
- fieldOrder: List of Columns (specify in case there is no header row in the source sheet) ex. [\"col 1\",\"col 2\",\"col 3\"]
- notificationEmail : Email address to send notification
Update existing connector
Step 1: Update the connector
Click on update in the top right corner of the loadunit list page
Step 2: Edit the configuration
You will be redirected to the JSON editor page. Here is a guide on how to edit this file.
A loadunit can be updated in one of the two ways
1) Extraction configuration
2) Load configuration
Use cases for configuration
Use case | Extract mode | Load mode |
---|---|---|
Need only the latest data | Snapshot | Replace |
Need all instances of past data | Snapshot | Append |
Need all the data for analysis | Incremental | Append |
Need all the data with the latest changes | Incremental | Merge |
note
- Extraction modes can be paginated in case of higher volumes of data
- Merge load mode will necessarily need a primary key, but its recommended to provide the timestamp column as well
- If the table is large and expected to sync frequently, snapshot is not recommended
Extract configuration:
This pertains to the extraction mode, the frequency of extraction and pagination. The extraction configuration of the loadunit is found in inputParams
> loadunits
The following parameters can be added or edited.
The extractionmode
parameter can take two values - snapshot and incremental
Parameter | Description | S | S(P) |
---|---|---|---|
schedule | Parameter will take input in cron format, set | o | o |
paginate | Takes boolean input, true when paginating | m | |
pageSize | Takes numeric input indicating size of page | m | |
pageSortCol | Unique key (usually primary key) | m | |
dataformat | The format of the extracted data | o | o |
columnlist | Comma-separated column names that needs to be included | o | o |
columnBlacklist | Comma-separated column names that needs to be excluded | o | o |
extractionquery | A SQL query to extract at the source | o | o |
verifyrecords | Takes a boolean input to verify if the number of records fetched is equal to the number of records in the source | o | o |
lagInMinutes | Lag in minutes before starting the loading of data | o | o |
redshiftTimestampCol | Parameter used to compare latest value of timestampCol at the source and redshift warehouse | o | o |
Legend:
- S - snapshot mode
- S(P) - snapshot mode with pagination
- m - mandatory parameter
- o - optional parameter
note
- Changing the schedule is likely to impact downstream dependencies
- It is recommended to keep
lagInMinutes
as a multiple of schedule interval
Load configuration:
This pertains to load configuration onto the warehouse. The load configuration of the loadunit is found in loaderConfig
> loadunit
The following parameters can be added or edited. There are three load modes : merge, update and append to learn more please visit features and capabilities tab.
Parameter | Description | Merge | Replace | Append |
---|---|---|---|---|
targetTable | Name the target table in the warehouse in the format <connector-name>.<target-table-name> | o | o | o |
dataFormat | Format of the loaded data, ex:parquet, json, avro, orc, csv (warehouse specific) | o | o | o |
copyOptions | Click redshift, snowflake to learn more | o | o | o |
TruncateTargetTable | Takes boolean input to indicate load mode. It is true for replace,false for append and merge modes | m | m | m |
IsUpdate | Boolean, Only for merge load mode | m | ||
PrimaryKeyCol | Primary key of the source | m | ||
UpdateTSCol | Timestamp column which captures the updated time | m | ||
DeleteTSCol | Timestamp column which indicates record deletion | o | ||
softDeletes | Takes boolean value, should be set as true to capture soft deletes | o |
Legend:
- m - mandatory parameter
- o - optional parameter
note
- The new loadunit should be added in both inputParams and loaderConfig with all the mandatory fields
- Timestamp column should be indexed and auto incremented
Ask a question
If you have questions or feedback, feel free to reach out at hello@datacoral.co or Request a demo