Greenhouse Connector UI Setup Guide

Prerequisites

Generate Greenhouse Harvest API key

Before getting started please make sure to have the following admin access to an active Greenhouse account

The Greenhouse connector requires an Harvest API token to collect data. An auth token can be obtained from Greenhouse through the following steps:

  • Click on the "Configure" icon on the top right corner
  • Click on "Dev Center" on the left
  • Click on "API Credential Management"
  • Click on "Create New API Key"
  • Use an identifier, like "Datacoral API Key" to describe the new API key, "Harvest" as Type and click "Create"
    set datacoral password
  • Click on "Manage Permissions" and select all the objects that you need to ingest from Greenhouse to your warehouse
Important!

The API key will have permission to access only the selected objects. Please review the objects and select only the ones needed.

For example, if offers object is considered confidential, please uncheck it under "Manage API Key Permissions" before generating the key.

  • Copy the newly generated Harvest API key
    set datacoral password

Step 1: Select Greenhouse Connector

  • From the main menu, click on Add connector
  • Find and select Greenhouse connector

Step 2. Configure connection parameters

Fill in the details for

  • Connector name : Set the name of the connector, please note that this cannot be changed as this becomes the name of the schema
  • Destination warehouse : Choose the destination warehouse from the drop down
  • Fill in the API key created to connect to your Greenhouse account, click on Check Connection to validate the harvest API key
  • Click on Next after succesfully connecting to the source

Step 3: Configure source information

  • Interval : Set the frequency of data extraction
  • Sync Historical data : It will load the entire past database as a one-time activity
    note

    This functionality will be enabled in the future. For now contact Datacoral Support to initiate historical sync

  • Click on Fetch Source Metadata to retrieve the metadata and then click on Next

Step 4: Configure load units information

The list of loadunits with extraction mode and schedule is displayed.

Extraction mode is auto detected based on the table size and availability of primary key and timestamp column at the source table. Click on Edit to update edit configuration per loadunit.

set datacoral password
  • Extraction mode: Can be snapshot, incrementalappend or incrementalupdate
  • Interval: The frequency of the extraction mode ranges in discrete interval from 5 minutes
  • Timestampcol: Its auto-detected for incrementalupdate extraction mode
  • Column Blacklist: The columns that need to be excluded in the destination warehouse should be added here
Important

When using the Column Blacklist feature in a loadunit, please make sure that they are excluded from other loadunits as well.

For example:

jobs and job_openings loadunits have custom_fields and keyed_custom_fields as json properties which may have confidential data.

Excluding just these columns in the above loadunits will not suffice, as job_openings data is present as openings column in jobs loadunit, which means openings column should be excluded as well.

Please refer to the links to documentation of each objects in the Greenhouse connector overview page

Step 5: Edit data layouts

Update data type as needed and click on Next

Step 6: Configure warehouse

For each of the load units on the left, you can decide the load mode

  • Load Mode: Datacoral supports the below load modes

    • Replace : This is a wipe and load operation replacing all the rows of the destination table with the results of the transformation query
    • Append: Insert operation where, the result of the transformation query are inserted into the destination table, rows already in the destination table are not updated
    • Merge: Upsert operation where the transformation query results in rows that indicate that the destination table rows have to be inserted, updated, or even deleted. This mode allows for efficient incremental updates to destination tables.
  • Primary Key: This is a mandatory key for Merge load mode.

  • Copy options: Add the copy options (For more information visit Redshift documentation and Snowflake documentation )

When done with the configuration changes, please click on Update and Next on the top right.

Connector Added

You have successfully added the connector once you have landed on the below page. Click on the enable icon on the top right to activate it. Please open a ticket to initiate historical sync.

Questions?

Please contact Datacoral's Support Team, we'd be more than happy to answer any of your questions.