-
Notifications
You must be signed in to change notification settings - Fork 5
GTC-3288: Use Multiple Source URIs when Creating a Table Asset #696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## develop #696 +/- ##
===========================================
- Coverage 76.56% 76.55% -0.02%
===========================================
Files 143 143
Lines 6704 6700 -4
===========================================
- Hits 5133 5129 -4
Misses 1571 1571
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
#### GitHub Container Registry (GHCR) Access Setup | ||
|
||
To authenticate Docker with GitHub Container Registry (`ghcr.io`) for pulling/pushing images, follow these steps: | ||
|
||
##### 1. Create a GitHub Personal Access Token (PAT) | ||
|
||
1. Navigate to: GitHub → Settings → Developer Settings → Personal Access Tokens → Tokens (Classic) | ||
2. Click **Generate new token (Classic)** | ||
3. Configure: | ||
- **Note**: `docker-ghcr-access` (descriptive name) | ||
- **Expiration**: Set duration (or "No expiration" for CI/CD) | ||
- **Scopes**: | ||
- `read:packages` (required for pull) | ||
- `write:packages` (required for push) | ||
4. Click **Generate token** and copy the token value | ||
|
||
##### 2. Authenticate with Docker | ||
|
||
```bash | ||
echo "YOUR_GHCR_TOKEN" | docker login ghcr.io -u GITHUB_USERNAME --password-stdin | ||
``` | ||
|
||
#### Proceed with Setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the docs update!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Testing these Batch scripts is difficult without doing an end-to-end test. Have you tested in dev though?
I'm testing it in |
Success! Please check out my |
This is needed for docker images that are stored in ghcr.io
This is needed to ensure a schema can be created. It can happen that the first source URI has only a header and no data.
c0bb341
to
6afc74f
Compare
It is possible that the first source URI passed in the
creation_options
could be file with only a header and no data.Now, we send up to five source URIs when creating a schema.
Pull request checklist
Please check if your PR fulfills the following requirements:
Pull request type
Please check the type of change your PR introduces:
What is the current behavior?
If the first source URI has no data, no schema is created. Therefore, no table is created.
Seems reasonable. However, we expect the norm to be that source URIs have data so that a schema can be inferred.
Issue Number: GTC-3288
What is the new behavior?
Up to five source URIs are sent to
create_tabular_schema
.Then, each source URI is tried.
Once a schema SQL is created, we break out of the loop and continue the job.
Does this introduce a breaking change?
Test Procedure in Dev Environment
Test Prep
I copied the first three
csv
files from the originally failing task inproduction
and uploaded them todev
:Dataset
Request:
PUT http://gfw-data-api-elb-shared-dev-lb-10091095.us-east-1.elb.amazonaws.com:30253/dataset/gtc_3288_multi_src_tabular_schema/v1
Response:
201 Created
Version
Request:
PUT http://gfw-data-api-elb-shared-dev-lb-10091095.us-east-1.elb.amazonaws.com:30253/dataset/gtc_3288_multi_src_tabular_schema/v1
Payload:
Response:
202 Accepted
Assets
Request:
GET http://gfw-data-api-elb-shared-dev-lb-10091095.us-east-1.elb.amazonaws.com:30253/dataset/gtc_3288_multi_src_tabular_schema/v1/assets
Response:
NOTE: The asset failed because I didn't include schema type hints in the creation options. Therefore, confidence__cat was inferred as boolean. In the production creation_options, confidence__cat is explicitly set to TEXT.
Tasks
Request:
GET http://gfw-data-api-elb-shared-dev-lb-10091095.us-east-1.elb.amazonaws.com:30253/asset/2e15a153-44b9-4a45-bdab-f24137a4d733/tasks
Response:
NOTE:
create_table
succeeds. The only reason the last tasks fails is because of the incorrect schema data type mentioned above.AWS Batch Log