Datacoral Documentation

Datacoral Documentation

  • Release Notes
  • Back to datacoral.com

›Collect Slices

Getting Started

  • Home
  • Install CLI
  • Install Datacoral

CLI Reference

  • Collect Commands
  • Organize Commands
  • CLI Cheatsheet
  • Redshift Materialized Views Cheatsheet
  • Athena Materialized Views Cheatsheet
  • Troubleshooting Documentation

Collect Slices

  • Collect Overview
  • API Slices

    • Asana
    • Asana Premium
    • CloudWatch
    • Datadog
    • Delighted
    • Facebook
    • Fountain
    • FullStory
    • GitHub
    • Google Analytics
    • Google Adwords
    • Greenhouse
    • HubSpot
    • Intercom
    • JIRA
    • Launch Darkly
    • NetSuite
    • Non-Datacoral
    • Outreach
    • Phabricator
    • Pingdom
    • S3
    • Salesforce
    • Stripe
    • Zendesk
    • Zuora

    Database Slices

    • Database Collect Slices
    • Firebase
    • MongoDB
    • MySQL
    • MySQL CDC
    • PostgreSQL

    Events Slices

    • Events Overview
    • Android
    • Browser
    • Objective-C (iOS)
    • Pixel Tracking
    • nodeJS
    • Python
    • .Net
    • Ruby
    • Snowplow Events

Organize Slices

  • Data Organization
  • Managed Redshift
  • Managed Glue

Harness Slices

  • Harness Overview

Technical Documents

  • Timelabels
  • Definitions
  • Materialized Views
  • Security Architecture
  • Encrypt Password
  • Configuring Alerts
  • Existing Redshift

GitHub Collect Slice

Overview

GitHub is a web-based version-control and collaboration platform for software developers. GitHub, which is delivered through a software-as-a-service (SaaS) business model, was started in 2008 and was founded on Git to make software builds faster.

The Datacoral GitHub slice collects data from a GitHub account and enables data flow of repo statistics into a data warehouse, such as Redshift.

Steps to add this slice to your installation

The steps to launch your slice are:

  1. Generate GitHub API keys
  2. Specify the slice config
  3. Add the GitHub slice

1. Generate GitHub API keys

Setup requirements

Before getting started please make sure to have the following information:

  • Admin access in your GitHub account

Setup instructions

You can generate your access auth_token using the following steps:

  1. In your GitHub account, click your account name in the top right corner, then click Settings.
  2. In the left sidebar menu, navigate to Developer settings > Personal access tokens.
  3. If a key has never been generated for your account, click "Generate a personal access token".
  4. Once an token has been created for your account, the token will appear. Click Copy to copy the auth token to your clipboard.

2. Specify the slice config

To get a the starting template save the output of the describe --input-parameters command as follows:

 datacoral collect describe --slice-type github \
 --input-parameters > github_parameters_file.json

Necessary input parameters:

  • auth_token- your auth_token from step 4 above
  • user_agent- username or application name

Example templates:

 {
   "auth_token": "test",
   "user_agent": "test_username"
 }

3. Add the Slice

datacoral collect add --slice-type github --slice-name <slice-name> --parameters-file <params-file>
  • slice-name Name of your slice. A schema with your slice-name is automatically created in your warehouse
  • params-file File path to your input parameters file. Ex. github_parameters_file.json

Supported load units

  • repositories: captures all the attributes for Repositories which are associated with your account
  • milestones: captures all the attributes for Milestones which are associated with your account
  • commits: captures all the attributes for Commits which are associated with Repositories
  • issues: captures all the attributes for Issues which are associated with your account
  • pulls: captures all the attributes for Pulls which are associated with your account
  • organizations: captures all the attributes for Organizations which are associated with your account
  • members: captures all the attributes for Members which are associated with your organizations

Slice output

Output of this slice is stored in S3 and Redshift.

AWS S3 Data stored in AWS S3 is partitioned by date and time in the following bucket s3//:customer_installation.datacoral/<sliceName>

AWS Redshift: Schema - schema name will be same as a slice-name. Tables produced by the slice are:

 - schema.repositories
 - schema.milestones
 - schema.commits
 - schema.issues
 - schema.organizations
 - schema.members
 - schema.pulls

Questions? Interested?

If you have questions or feedback, feel free to reach out at hello@datacoral.co or Request a demo

← FullStoryGoogle Analytics →
  • Overview
  • Steps to add this slice to your installation
    • 1. Generate GitHub API keys
    • Setup requirements
    • 2. Specify the slice config
    • 3. Add the Slice
  • Supported load units
  • Slice output
  • Questions? Interested?
datacoral

Product

OverviewWhy Datacoral ?Slice Catalog

Customers

CustomersGreenhouse Case StudyFront Case Study

Company

AboutTeamCareersBlog

Legal

Privacy Policy
Copyright © 2019 Datacoral