Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store actor.id and repo.id for events #10

Open
tisonkun opened this issue Jul 25, 2022 · 5 comments
Open

Store actor.id and repo.id for events #10

tisonkun opened this issue Jul 25, 2022 · 5 comments

Comments

@tisonkun
Copy link

Hi @alexey-milovidov,

Is it possible we store actor.id and repo.id as documented on GitHub Event docs?

So far, we store actor.login and repo.name only, which is convenient to use but fails to handle repo/user rename case.

The process can be we extend those column first and fill with data identified by event id.

@alexey-milovidov
Copy link
Member

Yes, I see that source data has these fields, so it's possible to recalculate the dataset and include these fields.

@tisonkun
Copy link
Author

tisonkun commented Aug 8, 2022

@alexey-milovidov I use Aliyun's ClickHouse service to implement this feature for my usage. Finally, I add actor.id, repo.id and id (event id). Also, the insertion can fail and in order to make the process idempotent, I created the table with the ReplacingMergeTree engine.

My create table statement is:

CREATE TABLE default.github_events
(
    `file_time` DateTime,
    `event_id` UInt64,
    `actor_id` UInt64,
    `repo_id` UInt64,
    `event_type` Enum8('CommitCommentEvent' = 1, 'CreateEvent' = 2, 'DeleteEvent' = 3, 'ForkEvent' = 4, 'GollumEvent' = 5, 'IssueCommentEvent' = 6, 'IssuesEvent' = 7, 'MemberEvent' = 8, 'PublicEvent' = 9, 'PullRequestEvent' = 10, 'PullRequestReviewCommentEvent' = 11, 'PushEvent' = 12, 'ReleaseEvent' = 13, 'SponsorshipEvent' = 14, 'WatchEvent' = 15, 'GistEvent' = 16, 'FollowEvent' = 17, 'DownloadEvent' = 18, 'PullRequestReviewEvent' = 19, 'ForkApplyEvent' = 20, 'Event' = 21, 'TeamAddEvent' = 22),
    `actor_login` LowCardinality(String),
    `repo_name` LowCardinality(String),
    `created_at` DateTime,
    `updated_at` DateTime,
    `action` Enum8('none' = 0, 'created' = 1, 'added' = 2, 'edited' = 3, 'deleted' = 4, 'opened' = 5, 'closed' = 6, 'reopened' = 7, 'assigned' = 8, 'unassigned' = 9, 'labeled' = 10, 'unlabeled' = 11, 'review_requested' = 12, 'review_request_removed' = 13, 'synchronize' = 14, 'started' = 15, 'published' = 16, 'update' = 17, 'create' = 18, 'fork' = 19, 'merged' = 20),
    `comment_id` UInt64,
    `body` String,
    `path` String,
    `position` Int32,
    `line` Int32,
    `ref` LowCardinality(String),
    `ref_type` Enum8('none' = 0, 'branch' = 1, 'tag' = 2, 'repository' = 3, 'unknown' = 4),
    `creator_user_login` LowCardinality(String),
    `number` UInt32,
    `title` String,
    `labels` Array(LowCardinality(String)),
    `state` Enum8('none' = 0, 'open' = 1, 'closed' = 2),
    `locked` UInt8,
    `assignee` LowCardinality(String),
    `assignees` Array(LowCardinality(String)),
    `comments` UInt32,
    `author_association` Enum8('NONE' = 0, 'CONTRIBUTOR' = 1, 'OWNER' = 2, 'COLLABORATOR' = 3, 'MEMBER' = 4, 'MANNEQUIN' = 5),
    `closed_at` DateTime,
    `merged_at` DateTime,
    `merge_commit_sha` String,
    `requested_reviewers` Array(LowCardinality(String)),
    `requested_teams` Array(LowCardinality(String)),
    `head_ref` LowCardinality(String),
    `head_sha` String,
    `base_ref` LowCardinality(String),
    `base_sha` String,
    `merged` UInt8,
    `mergeable` UInt8,
    `rebaseable` UInt8,
    `mergeable_state` Enum8('unknown' = 0, 'dirty' = 1, 'clean' = 2, 'unstable' = 3, 'draft' = 4),
    `merged_by` LowCardinality(String),
    `review_comments` UInt32,
    `maintainer_can_modify` UInt8,
    `commits` UInt32,
    `additions` UInt32,
    `deletions` UInt32,
    `changed_files` UInt32,
    `diff_hunk` String,
    `original_position` UInt32,
    `commit_id` String,
    `original_commit_id` String,
    `push_size` UInt32,
    `push_distinct_size` UInt32,
    `member_login` LowCardinality(String),
    `release_tag_name` String,
    `release_name` String,
    `review_state` Enum8('none' = 0, 'approved' = 1, 'changes_requested' = 2, 'commented' = 3, 'dismissed' = 4, 'pending' = 5)
)
ENGINE = ReplacingMergeTree
ORDER BY (event_type, repo_name, created_at, event_id)
SETTINGS index_granularity = 8192

BTW, I ever tried to find a ClickHouse.com provided service but it seems not yet available to developers :P

@alexey-milovidov
Copy link
Member

@tisonkun Thank you!
ClickHouse Cloud is currently available for early access by selected customers. It will be available in public beta in October.

@tisonkun
Copy link
Author

tisonkun commented Aug 16, 2022

It will be available in public beta in October.

I can't wait to give it a try 🤣 Other vendors always have a version lag to the upstream and it's painful to workaround features already provided in the nightly version.

@tisonkun
Copy link
Author

@alexey-milovidov FYI I shared the related code on https://github.com/korandoru/github-adventurer and wrote a Chinese blog in https://www.tisonkun.org/2022/08/16/github-adventurer-with-clickhouse/.

I submitted an early access request with the email address [email protected]. If only ClickHouse.com provides support for the latest version, I'm willing to migrate to it and write down my experience >_<

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants