GitHub Collect Slice
Overview
GitHub is a web-based version-control and collaboration platform for software developers. GitHub, which is delivered through a software-as-a-service (SaaS) business model, was started in 2008 and was founded on Git to make software builds faster.
The Datacoral GitHub slice collects data from a GitHub account and enables data flow of repo statistics into a data warehouse, such as Redshift.
Steps to add this slice to your installation
The steps to launch your slice are:
- Generate GitHub API keys
- Specify the slice config
- Add the GitHub slice
1. Generate GitHub API keys
Setup requirements
Before getting started please make sure to have the following information:
- Admin access in your GitHub account
Setup instructions
You can generate your access auth_token using the following steps:
- In your GitHub account, click your account name in the top right corner, then click Settings.
- In the left sidebar menu, navigate to Developer settings > Personal access tokens.
- If a key has never been generated for your account, click "Generate a personal access token".
- Once an token has been created for your account, the token will appear. Click Copy to copy the auth token to your clipboard.
2. Specify the slice config
To get a the starting template save the output of the describe --input-parameters
command as follows:
datacoral collect describe --slice-type github \
--input-parameters > github_parameters_file.json
Necessary input parameters:
auth_token
- your auth_token from step 4 aboveuser_agent
- username or application name
Example templates:
{
"auth_token": "test",
"user_agent": "test_username"
}
3. Add the Slice
datacoral collect add --slice-type github --slice-name <slice-name> --parameters-file <params-file>
slice-name
Name of your slice. A schema with your slice-name is automatically created in your warehouseparams-file
File path to your input parameters file. Ex. github_parameters_file.json
Supported load units
repositories
: captures all the attributes for Repositories which are associated with your accountmilestones
: captures all the attributes for Milestones which are associated with your accountcommits
: captures all the attributes for Commits which are associated with Repositoriesissues
: captures all the attributes for Issues which are associated with your accountpulls
: captures all the attributes for Pulls which are associated with your accountorganizations
: captures all the attributes for Organizations which are associated with your accountmembers
: captures all the attributes for Members which are associated with your organizations
Slice output
Output of this slice is stored in S3 and Redshift.
AWS S3
Data stored in AWS S3 is partitioned by date and time in the following bucket
s3//:customer_installation.datacoral/<sliceName>
AWS Redshift: Schema - schema name will be same as a slice-name. Tables produced by the slice are:
- schema.repositories
- schema.milestones
- schema.commits
- schema.issues
- schema.organizations
- schema.members
- schema.pulls
Questions? Interested?
If you have questions or feedback, feel free to reach out at hello@datacoral.co or Request a demo