Installation of Datacoral is a two-step process. The first step is to prepare your AWS account through the creation of the appropriate roles and users. Datacoral pipeline and management resources are deployed in the second step of the installation.
Please budget one-two hours to complete the installation of Datacoral following the user guide. The actual installation duration depends on the choices of warehouse and networking configurations.
Step 1: Setup your Datacoral account
You will receive an invitation email from Datacoral to begin the installation. If you are expecting an invitation and haven't received it yet, drop a line to us at firstname.lastname@example.org.
Clicking on the "Get Started" button in the email above will take you to the following screen to choose a secure password. This will allow you to login and access the Datacoral Webapp.
Step 2: Configure your installation
Setup AWS account configuration
Configure your installation by choosing your AWS account name, AWS region, primary availability zone, installation name, and your warehouse type. Parameters to configure your installation are:
- AWS Account Name: Name of the account for easy identification of the workload. If installing Datacoral into your existing AWS account, please choose “default” as the option.
- AWS Region: Name of the AWS region where you would want Datacoral installed
- Primary Availability Zone: Primary availability zone in the AWS Regions (list of availability zones can be found using this command) specified above where Datacoral resources will be created
- Installation Name: Name of the Datacoral installation.
- Warehouse Type: Select the type of warehouse you intend to initialize first. You can always add more warehouses after installation is completed.
Note: If you're choosing Existing Redshift Cluster, make sure to uncheck "Create a new VPC for me".
Review your configuration before you begin preparing your AWS account. If required, feel free to go back and edit your chosen options. When ready, click Next.
Step 3: Prepare AWS Account
Clicking Next opens a new browser tab and redirects you to the AWS CloudFormation Create Stack page. If you are not already logged in, you will be required to login to your AWS account as an Admin user.
The Create Stack page is pre-filled with the supplied options from the previous steps. You would need to assign a password to the Datacoral Console User (this is the read-only admininstrative user Datacoral uses for monitoring purposes) and select the checkboxes at the bottom of the page before clicking “Create Stack”.
Note: Please don't change any configurations that have been pre-set for you, other than choosing a password for the Datacoral console user.
The Webapp will keep track of stack updates and correspondingly update its status. Once all the stacks are successfully created, you can move on to the next steps.
Step 4: Network Setup
Datacoral works with your existing network configuration or we can setup a new network configuration during installation.
We recommend creating a new network configuration with a new VPC for isolation of Datacoral resources and for better auditing, access management of the entire data pipeline. The Datacoral installation process can create a new VPC in your AWS account to house all the network addressable resources that Datacoral creates and manages. In order for us to create a new VPC, you would need to provide a CIDR block. Please select a non-overlapping CIDR block relative to your existing network configuration to allow for peering between the VPCs. Please make sure that you are providing 16 bits in the prefix of the CIDR block(a
/16 CIDR block); for example,
Advanced: In the case that "Create a new VPC for me" wasn't checked at the start of onboarding (this is the case when you have an existing Redshift cluster), you'll need to create and setup a new VPC. See Existing Network Configuration on setting up your networking to work with Datacoral.
Step 5: Warehouse Setup
You have a choice of initializing Datacoral with the appropriate data warehouse and/or data lake. The three supported configurations with Datacoral are:
- Amazon Redshift (create new cluster or use existing cluster)
- Amazon Athena
For Amazon Redshift, Datacoral can instantiate a new cluster and manage it as part of the Datacoral installation or connect to an existing Redshift cluster.
Datacoral can create a new Redshift cluster as part of the installation. To instantiate a new Redshift cluster as part of the onboarding, please select the type of node better suited for your workload and the number of nodes. We default to dc2.large due to the cost/performance ratio. Note that if you want to
Advanced: See Configuring Existing Redshift Cluster for Datacoral for working with your own Redshift cluster.
To instantiate a new Athena, you just need to select the name of the warehouse.
To utilize an existing Snowflake warehouse with the Datacoral installation, please follow the steps here to get database name, user name, password, role etc. You will need to provide these when addition the Snowflake warehouse as shown in the screen below.
Step 6: Initialize installation with the warehouse
Review your configuration before you begin preparation. If required, go back and edit your chosen options. When ready, click Initialize.
Installation status will be show in the Webapp
Step 7: Your installation is ready in your VPC
After Datacoral installation is successful, you can now start adding ingest connectors to get data flowing into your warehouse.