Collect Slices help you capture data from multiple sources so they can be stored in your data lake and organized in your data warehouse.
Datacoral provides Collect Slices for different kinds of sources:
- Databases: Bring your data from your production databases such as Postgres, Mysql, DynamoDB and others into one place
- Saas APIs: Connect any Marketing (Ex. Google Adwords, Mailchimp etc), Customer Success (Ex. Zendesk etc), Project Tracking (Asana, Jira etc), Data-as-a-Service (Ex. Crunchbase, Google Analytics etc) or Payments (Ex. Stripe etc) APIs
- Events: Perform instrumentation on any stream of events such as events from a live application
In order to get started with adding slices, start by looking at the generic commands for working with slices here. This will allow you to list all available slices, add new slices, remove and update existing slices.
Irrespective of where the data is coming from, the first thing Datacoral does is stage the data coming in into stage tables in a file-system based database, like S3/Athena.
Connectors can extract/receive data in different extractmodes.
- snapshot - in this case, each batch of data generated by the collect function is a full copy of the source loadunit
- incremental-appends - in this case, each batch of data generated by the collect function is the set of additional rows that occured at the source.
- incremental-updates - in this case, each batch of data generated by the collect function contains the set of changes which include new rows inserted, existing rows updated, as well as existing rows deleted.
New batches of data get created in new partitions in the staging area irrespective of which mode the collect function is extracting/receiving data.
Users can specify the loadmode for the source tables in the warehouses.
- replace mode replaces the contents of the source table with the content of the latest batch
- append mode appends all the rows of the batch to the source table
- merge mode applies changes in the batch to the specific rows in the source table
See Querying Source Tables to learn how to query the source tables using Athena.