Skip to content

Commit

Permalink
fix: Expand CollaboratorsStream primary keys (#278)
Browse files Browse the repository at this point in the history
The collaborators stream pulls one row per selected repo per user who is
a known collaborator of that repo, so `id` (the user ID) alone isn't
sufficient to describe the grain. This PR adds `repo` and `org` to the
primary keys to capture the fact that there can be a row for each user x
repo pair.

My motivation is that I'm getting an error `Query error: UPDATE/MERGE
must match at most one source row for each target row ...` when trying
to extract this stream sequentially with `target-bigquery` and the
[upsert](https://hub.meltano.com/loaders/target-bigquery/#upsert-setting)
setting turned on, and that sounds related to the fact that the column
set as the primary key isn't distinctive.
  • Loading branch information
TrishGillett authored Jul 25, 2024
1 parent 12c845e commit 7fc78fe
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion tap_github/repository_streams.py
Original file line number Diff line number Diff line change
Expand Up @@ -747,7 +747,7 @@ def parse_response(self, response: requests.Response) -> Iterable[dict]:
class CollaboratorsStream(GitHubRestStream):
name = "collaborators"
path = "/repos/{org}/{repo}/collaborators"
primary_keys = ["id"]
primary_keys = ["id", "repo", "org"]
parent_stream_type = RepositoryStream
ignore_parent_replication_key = True
state_partitioning_keys = ["repo", "org"]
Expand Down

0 comments on commit 7fc78fe

Please sign in to comment.