GitHub Connector Overview
GitHub is a web-based version-control and collaboration platform for software developers. GitHub, which is delivered through a software-as-a-service (SaaS) business model, was started in 2008 and was founded on Git to make software builds faster.
The Datacoral GitHub slice collects data from a GitHub account and enables data flow of repo statistics into a data warehouse, such as Redshift or Snowflake.
Features & Capabilities
- Backfill: Full historical sync of your entire data
- Data Extraction Modes: snapshot, incremental with pagination
- Data Load Modes: replace, append and merge
- Tables and Columns selection: The ability to select individual schemas, tables and columns for replication in the Datacoral's UI.
- Data-layout: changing the data type of your columns
- Customizations: Update the configurations easily using the UI
- Scheduling: Highly flexible scheduling system
Supported Loadunits
The Github connector automatically collects the following loadunits from the Github API and makes them available in your warehouse for analysis.
Loadunit | Default Extract mode | Description |
---|---|---|
clones | snapshot | captures all the attributes for Clones which are associated with Repositories (NOTE: The auth_token should have push permission for the Repository to get this data) |
collaborators | snapshot paginate | captures all the attributes for Collaborators which are associated with Repositories (NOTE: The auth_token should have push permission for the Repository to get this data) |
commits | snapshot paginate | captures all the attributes for Commits which are associated with Repositories |
contributors | snapshot paginate | captures all the attributes for Contributors which are associated with Repositories |
issues | snapshot paginate | captures all the attributes for Issues which are associated with all the repositories associated with your account |
members | snapshot paginate | captures all the attributes for Members which are associated with your organizations |
milestones | snapshot paginate | captures all the attributes for Milestones which are associated with Repositories |
organizations | snapshot paginate | captures all the attributes for Organizations which are associated with your account |
pulls | snapshot paginate | captures all the attributes for Pulls which are associated with Repositories. open as well as closed pulls are fetched by this loadunit |
repositories | snapshot paginate | captures all the attributes for Repositories which are associated with your account |
views | snapshot | captures all the attributes for Views which are associated with Repositories (NOTE: The auth_token should have push permission for the Repository to get this data) |
note
Note that the loadunit Repositories will have two paramters :
allowedRepositories
: Accepts a list of repositories to include (regex strings)blockedRepositories
: Accepts a list of repositories to exclude (regex strings)
Connector output
Output of this connector is stored in S3 and the data warehouse you chose.
AWS S3
Data stored in AWS S3 is partitioned by date and time
s3://customer_installation.datacoral/<connector-name>
Data Warehouse:
Schema - schema name will be same as <connector-name>
.
Tables produced by the connector are:
Next Steps
- Create a Github Connector through UI or CLI
- Schedule a Demo
Additional Information
Got a question?
Please contact Datacoral's Support Team, we'd be more than happy to answer any of your questions.