Conversation
…te + envsubst
- Add profiles.yml.template with plaintext docker targets and ${VAR} placeholders for cloud targets
- Update test-warehouse.yml to use CI_WAREHOUSE_SECRETS JSON secret with envsubst substitution
- Split test-all-warehouses.yml into test-docker (pull_request, no secrets/approval) and test-cloud (with fork approval gate)
- BigQuery uses keyfile path instead of inline keyfile_json
- Guard secret extraction behind 'if [ -n "$SECRETS_JSON" ]' for fork PRs
Resolves: ELE-5262
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @devin-ai-integration[bot] |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a docker-based test job, renames and narrows cloud test jobs with fork detection/approval gates, and replaces inline base64 profile generation with a Jinja2-templated Python renderer that consumes CI secrets to produce dbt profiles for CI runs. Changes
Sequence Diagram(s)sequenceDiagram
participant GH as GitHub Actions
participant Workflow as Workflow Jobs
participant Script as generate_profiles.py
participant Template as profiles.yml.j2
participant Secrets as CI Secrets
GH->>Workflow: trigger (push / PR / PR_target)
Workflow->>Workflow: run check-fork-status
alt fork approved
Workflow->>Workflow: approve-fork gate passes
else fork skipped
Workflow->>GH: skip cloud jobs
end
Workflow->>Secrets: read CI_WAREHOUSE_SECRETS (env)
Workflow->>Script: run generate_profiles.py --template Template --output ~/.dbt/profiles.yml --schema-name ...
Script->>Secrets: decode base64 JSON (if present)
Script->>Template: render Jinja2 with context
Script->>Workflow: write rendered profiles.yml
Workflow->>Workflow: run test-docker or test-cloud jobs using generated profiles
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
…ility Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
.github/workflows/test-warehouse.yml (1)
135-136: Harden BigQuery keyfile permissions.The keyfile is written to a predictable
/tmppath with default umask. Usemktempandchmod 600to minimize exposure.🔐 Proposed hardening
- echo "$DECODED" | jq -r '.BIGQUERY_KEYFILE' > /tmp/bigquery_keyfile.json - export BIGQUERY_KEYFILE_PATH=/tmp/bigquery_keyfile.json + BIGQUERY_KEYFILE_PATH="$(mktemp /tmp/bigquery_keyfile.XXXXXX.json)" + echo "$DECODED" | jq -r '.BIGQUERY_KEYFILE' > "$BIGQUERY_KEYFILE_PATH" + chmod 600 "$BIGQUERY_KEYFILE_PATH" + export BIGQUERY_KEYFILE_PATH🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test-warehouse.yml around lines 135 - 136, Replace the predictable /tmp path and default permissions with a secure temporary file: create a temp file via mktemp, write the decoded BIGQUERY_KEYFILE into that temp file (using the current jq pipeline), set strict permissions with chmod 600 on that temp file, and then export BIGQUERY_KEYFILE_PATH pointing to the mktemp-created file so the rest of the workflow uses the secured keyfile.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/test-all-warehouses.yml:
- Line 116: The CI matrix omits "spark" from the warehouse-type array used by
the test-all-warehouses workflow; add "spark" to the array [snowflake, bigquery,
redshift, databricks_catalog, athena] so it becomes [snowflake, bigquery,
redshift, databricks_catalog, athena, spark], ensuring Spark receives CI
coverage consistent with the conditional logic in test-warehouse.yml and the
profiles.yml.template definitions.
In @.github/workflows/test-warehouse.yml:
- Line 137: The current step uses eval on DECODED which allows shell injection
via malicious JSON keys; replace the eval pipeline so it extracts key/value
pairs from DECODED (the variable DECODED and the current jq expression),
validate each key against the identifier regex /^[A-Z_][A-Z0-9_]*$/ and only
export keys that pass, and perform the export without eval (e.g., iterate over
jq output lines and call export with the validated key and properly quoted
value). Ensure you reference DECODED and the jq extraction in your change and
reject or skip any keys that fail the regex.
---
Nitpick comments:
In @.github/workflows/test-warehouse.yml:
- Around line 135-136: Replace the predictable /tmp path and default permissions
with a secure temporary file: create a temp file via mktemp, write the decoded
BIGQUERY_KEYFILE into that temp file (using the current jq pipeline), set strict
permissions with chmod 600 on that temp file, and then export
BIGQUERY_KEYFILE_PATH pointing to the mktemp-created file so the rest of the
workflow uses the secured keyfile.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
.github/workflows/test-all-warehouses.yml.github/workflows/test-warehouse.ymlintegration_tests/profiles/profiles.yml.template
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
.github/workflows/test-warehouse.yml (2)
126-127: Simplify SECRETS_JSON expression.The
format('{0}', secrets.CI_WAREHOUSE_SECRETS)is redundant—it just converts the secret to a string, which it already is. A direct reference with fallback is cleaner.♻️ Proposed simplification
# New lean JSON secret (base64 encoded) - SECRETS_JSON: ${{ secrets.CI_WAREHOUSE_SECRETS && format('{0}', secrets.CI_WAREHOUSE_SECRETS) || '' }} + SECRETS_JSON: ${{ secrets.CI_WAREHOUSE_SECRETS || '' }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test-warehouse.yml around lines 126 - 127, The SECRETS_JSON environment expression is unnecessarily using format('{0}', ...) to coerce the secret to a string; update the SECRETS_JSON assignment to directly reference secrets.CI_WAREHOUSE_SECRETS with a fallback (i.e., use the OR/fallback expression) so SECRETS_JSON: ${{ secrets.CI_WAREHOUSE_SECRETS || '' }} (replace the current format(...) usage in the SECRETS_JSON line).
163-166: Fork PR path relies on undefined environment variables.When neither
SECRETS_JSONnorPROFILES_YMLis present (fork PRs),envsubstruns but onlySCHEMA_NAMEis exported. Cloud target placeholders (e.g.,${SNOWFLAKE_PASSWORD}) will be replaced with empty strings, producing invalid YAML for those targets. This is fine if fork PRs only run docker targets, but the comment should clarify this limitation.Consider either:
- Using
envsubstwith an explicit variable list to only substitute known-safe variables, or- Adding a comment clarifying that only docker targets are functional in this path.
📝 Option 1: Explicit variable substitution
# Fork PRs: no secrets available, but docker targets are still runnable. - envsubst < "${{ github.workspace }}/dbt-data-reliability/integration_tests/profiles/profiles.yml.template" > ~/.dbt/profiles.yml + # Only substitute SCHEMA_NAME; leave other placeholders intact to avoid empty credentials + envsubst '$SCHEMA_NAME' < "${{ github.workspace }}/dbt-data-reliability/integration_tests/profiles/profiles.yml.template" > ~/.dbt/profiles.yml🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test-warehouse.yml around lines 163 - 166, The fork-PR path runs envsubst against the profiles template but only exports SCHEMA_NAME, causing cloud-target placeholders like ${SNOWFLAKE_PASSWORD} to be blank; update the workflow to either (A) call envsubst with an explicit variable list (e.g., envsubst '$SCHEMA_NAME $SAFE_VAR2 ...') so only known-safe variables are replaced, or (B) add a clear comment near the envsubst/else block stating that this fork PR branch only produces valid docker targets and cloud targets will be invalid because secrets (SECRETS_JSON/PROFILES_YML) are not available; change references: envsubst, SCHEMA_NAME, SECRETS_JSON, PROFILES_YML, and the fork PR path..github/workflows/test-all-warehouses.yml (1)
117-127: Verify fusion dbt-version entries don't duplicate base matrix.Similar to test-docker, when
inputs.dbt-versionis"fusion", the include entries for snowflake/bigquery/databricks_catalog will match existing base matrix entries, potentially causing duplicate workflow runs.However, if the intent is to always run fusion in addition to the base version for these warehouses, this is correct behavior. Consider adding a brief comment clarifying whether fusion is meant to be additive or conditional.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test-all-warehouses.yml around lines 117 - 127, The matrix include entries use inputs.dbt-version || 'fusion' for warehouse-type values (snowflake, bigquery, databricks_catalog) which can duplicate base matrix runs when inputs.dbt-version == "fusion"; either (A) make the includes conditional so they only add the fusion entries when inputs.dbt-version != 'fusion' (e.g., guard the include with a conditional expression or separate include for fusion-only) or (B) if fusion is intentionally additive, add a clarifying comment above the matrix include explaining that inputs.dbt-version "fusion" should run in addition to the base matrix; update references to inputs.dbt-version and the matrix include entries (warehouse-type: snowflake, bigquery, databricks_catalog) accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/test-all-warehouses.yml:
- Around line 50-52: The include block unconditionally adds a postgres entry
which duplicates the base matrix when inputs.dbt-version is provided; remove the
include and instead set the base matrix's dbt-version value to use the fallback
expression (dbt-version: ${{ inputs.dbt-version || 'latest_pre' }}) so the
matrix only contains a single postgres entry and only uses "latest_pre" when no
input was supplied; update references to include, dbt-version, and
inputs.dbt-version accordingly.
In @.github/workflows/test-warehouse.yml:
- Around line 144-147: The script blindly writes the jq result of
'.BIGQUERY_KEYFILE' into BIGQUERY_KEYFILE_PATH which will contain the literal
"null" if the key is absent; change the flow to first extract the key into a
variable (e.g., capture jq -r '.BIGQUERY_KEYFILE' into a shell var), check that
the var is non-empty and not "null" before creating/writing the temp file and
setting BIGQUERY_KEYFILE_PATH, and only export BIGQUERY_KEYFILE_PATH when the
key is present; reference the existing symbols BIGQUERY_KEYFILE_PATH, DECODED,
and the jq invocation to locate and update the logic.
---
Nitpick comments:
In @.github/workflows/test-all-warehouses.yml:
- Around line 117-127: The matrix include entries use inputs.dbt-version ||
'fusion' for warehouse-type values (snowflake, bigquery, databricks_catalog)
which can duplicate base matrix runs when inputs.dbt-version == "fusion"; either
(A) make the includes conditional so they only add the fusion entries when
inputs.dbt-version != 'fusion' (e.g., guard the include with a conditional
expression or separate include for fusion-only) or (B) if fusion is
intentionally additive, add a clarifying comment above the matrix include
explaining that inputs.dbt-version "fusion" should run in addition to the base
matrix; update references to inputs.dbt-version and the matrix include entries
(warehouse-type: snowflake, bigquery, databricks_catalog) accordingly.
In @.github/workflows/test-warehouse.yml:
- Around line 126-127: The SECRETS_JSON environment expression is unnecessarily
using format('{0}', ...) to coerce the secret to a string; update the
SECRETS_JSON assignment to directly reference secrets.CI_WAREHOUSE_SECRETS with
a fallback (i.e., use the OR/fallback expression) so SECRETS_JSON: ${{
secrets.CI_WAREHOUSE_SECRETS || '' }} (replace the current format(...) usage in
the SECRETS_JSON line).
- Around line 163-166: The fork-PR path runs envsubst against the profiles
template but only exports SCHEMA_NAME, causing cloud-target placeholders like
${SNOWFLAKE_PASSWORD} to be blank; update the workflow to either (A) call
envsubst with an explicit variable list (e.g., envsubst '$SCHEMA_NAME $SAFE_VAR2
...') so only known-safe variables are replaced, or (B) add a clear comment near
the envsubst/else block stating that this fork PR branch only produces valid
docker targets and cloud targets will be invalid because secrets
(SECRETS_JSON/PROFILES_YML) are not available; change references: envsubst,
SCHEMA_NAME, SECRETS_JSON, PROFILES_YML, and the fork PR path.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
.github/workflows/test-all-warehouses.yml.github/workflows/test-warehouse.ymlintegration_tests/profiles/profiles.yml.template
🚧 Files skipped from review as they are similar to previous changes (1)
- integration_tests/profiles/profiles.yml.template
…CRETS_JSON, explicit envsubst for fork path, fusion matrix comment Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/test-all-warehouses.yml (1)
117-124:⚠️ Potential issue | 🟡 MinorFusion include expression does not always add Fusion runs.
"${{ inputs.dbt-version || 'fusion' }}"resolves to the input version when provided, so these rows won’t schedulefusionin that case.🔧 Proposed fix
- - dbt-version: "${{ inputs.dbt-version || 'fusion' }}" + - dbt-version: "fusion" warehouse-type: snowflake - - dbt-version: "${{ inputs.dbt-version || 'fusion' }}" + - dbt-version: "fusion" warehouse-type: bigquery ... - - dbt-version: "${{ inputs.dbt-version || 'fusion' }}" + - dbt-version: "fusion" warehouse-type: databricks_catalogAlso applies to: 129-130
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test-all-warehouses.yml around lines 117 - 124, The include entries intended to always schedule Fusion runs currently use dbt-version: "${{ inputs.dbt-version || 'fusion' }}" which picks the provided input and thus doesn't force a 'fusion' run; update the two include entries (the ones with warehouse-type: snowflake and warehouse-type: bigquery) to set dbt-version to the literal 'fusion' (e.g., dbt-version: 'fusion') so Fusion runs are always added; make the same change for the duplicate include lines referenced later (the other pair at the same pattern).
♻️ Duplicate comments (1)
.github/workflows/test-all-warehouses.yml (1)
50-52:⚠️ Potential issue | 🟡 MinorPostgres include can become redundant when
inputs.dbt-versionis provided.With a non-empty
inputs.dbt-version, thisincludeentry resolves to the same(dbt-version, warehouse-type)combination already produced by the base matrix.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test-all-warehouses.yml around lines 50 - 52, The matrix include entry that sets dbt-version: "${{ inputs.dbt-version || 'latest_pre' }}" and warehouse-type: postgres can duplicate a base matrix row when inputs.dbt-version is provided; update the workflow so this postgres include is only added when inputs.dbt-version is empty/unset (or otherwise differs from the base matrix). Concretely, wrap or gate the include entry with a condition that checks inputs.dbt-version (or refactor the matrix generation to avoid adding the same (dbt-version, warehouse-type) pair twice) so the include for postgres is skipped when inputs.dbt-version is supplied.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In @.github/workflows/test-all-warehouses.yml:
- Around line 117-124: The include entries intended to always schedule Fusion
runs currently use dbt-version: "${{ inputs.dbt-version || 'fusion' }}" which
picks the provided input and thus doesn't force a 'fusion' run; update the two
include entries (the ones with warehouse-type: snowflake and warehouse-type:
bigquery) to set dbt-version to the literal 'fusion' (e.g., dbt-version:
'fusion') so Fusion runs are always added; make the same change for the
duplicate include lines referenced later (the other pair at the same pattern).
---
Duplicate comments:
In @.github/workflows/test-all-warehouses.yml:
- Around line 50-52: The matrix include entry that sets dbt-version: "${{
inputs.dbt-version || 'latest_pre' }}" and warehouse-type: postgres can
duplicate a base matrix row when inputs.dbt-version is provided; update the
workflow so this postgres include is only added when inputs.dbt-version is
empty/unset (or otherwise differs from the base matrix). Concretely, wrap or
gate the include entry with a condition that checks inputs.dbt-version (or
refactor the matrix generation to avoid adding the same (dbt-version,
warehouse-type) pair twice) so the include for postgres is skipped when
inputs.dbt-version is supplied.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
.github/workflows/test-all-warehouses.yml.github/workflows/test-warehouse.yml
|
The fusion Fixing this would be a separate concern outside the scope of this refactoring PR. Happy to address it in a follow-up if needed. |
…ipt + Jinja2 template - Add generate_profiles.py (click CLI) for rendering profiles from secrets - Convert profiles.yml.template to profiles.yml.j2 (Jinja2) - Use service-account-json method for BigQuery (no temp keyfile needed) - Elementary profile schema uses _elementary suffix via YAML merge keys - Simplify test-warehouse.yml Write dbt profiles step Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…llback Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
integration_tests/profiles/generate_profiles.py (1)
17-27: Fail fast on missing secrets in cloud mode (keep null-undefined only for docker mode).Using
_NullUndefinedunconditionally can hide missing/typoed secret keys. Consider switching toStrictUndefinedwhensecrets_b64is present.🔧 Suggested refactor
-from jinja2 import BaseLoader, Environment, Undefined +from jinja2 import BaseLoader, Environment, StrictUndefined, Undefined @@ - env = Environment( + undefined_cls = _NullUndefined if not secrets_b64 else StrictUndefined + env = Environment( loader=BaseLoader(), - undefined=_NullUndefined, + undefined=undefined_cls, keep_trailing_newline=True, )Also applies to: 118-121
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration_tests/profiles/generate_profiles.py` around lines 17 - 27, The template Undefined subclass _NullUndefined currently silences missing variables for all runs; change behavior to use StrictUndefined when secrets_b64 is present so missing/typoed secret keys fail fast. Modify the code that constructs the Jinja environment (where _NullUndefined is used) to choose StrictUndefined if the variable secrets_b64 (or equivalent flag) is non-empty/present, otherwise keep using _NullUndefined for docker-only mode; also apply the same conditional swap where _NullUndefined is used again later (the occurrences around the second mention). Ensure you import StrictUndefined and preserve the existing __str__/__bool__ semantics only for the docker path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration_tests/profiles/profiles.yml.j2`:
- Around line 50-55: The YAML template uses raw Jinja2 variables for
secret-backed credentials (e.g., snowflake_account, snowflake_user,
snowflake_password, snowflake_role, snowflake_database, snowflake_warehouse)
which can be mis-parsed by YAML; update the template to pipe each secret-backed
field through the toyaml filter (same approach already used for
bigquery_keyfile) so values are emitted as safe YAML scalars and avoid
boolean/number coercion—apply this change to the referenced credential blocks
and the other listed secret fields (lines for bigquery_keyfile and the ranges
noted) to ensure consistent use of | toyaml across all secrets.
---
Nitpick comments:
In `@integration_tests/profiles/generate_profiles.py`:
- Around line 17-27: The template Undefined subclass _NullUndefined currently
silences missing variables for all runs; change behavior to use StrictUndefined
when secrets_b64 is present so missing/typoed secret keys fail fast. Modify the
code that constructs the Jinja environment (where _NullUndefined is used) to
choose StrictUndefined if the variable secrets_b64 (or equivalent flag) is
non-empty/present, otherwise keep using _NullUndefined for docker-only mode;
also apply the same conditional swap where _NullUndefined is used again later
(the occurrences around the second mention). Ensure you import StrictUndefined
and preserve the existing __str__/__bool__ semantics only for the docker path.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
.github/workflows/test-warehouse.ymlintegration_tests/profiles/generate_profiles.pyintegration_tests/profiles/profiles.yml.j2
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
….Error Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
- dremio: add datalake, database, object_storage_source, object_storage_path, root_path, dremio_space, dremio_space_folder - bigquery: add location (US) and priority (interactive) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…' error dbt-dremio maps object_storage_source->datalake, object_storage_path->root_path, dremio_space->database, dremio_space_folder->schema. Having both causes duplicate key errors. Keep only canonical field names. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
… matrix entry Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…biguous strings Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…ields Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…n toyaml Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
refactor: replace CI_PROFILES_YML with committed Jinja2 template + Python renderer
Summary
Replaces the opaque base64-encoded
CI_PROFILES_YML/CI_PROFILES_YML_FUSIONGitHub secrets with a committed, reviewable Jinja2 template (profiles.yml.j2) and a lean JSON secret (CI_WAREHOUSE_SECRETS), rendered by a Python CLI script usingclick+ Jinja2.What changed
integration_tests/profiles/profiles.yml.j2(new) — Jinja2 template with Docker targets (postgres, clickhouse, trino, dremio) in plaintext and cloud targets (snowflake, bigquery, redshift, databricks_catalog, athena) using{{ var }}placeholders. All secret-backed credential fields use the| toyamlfilter for safe YAML serialization. Theelementaryprofile uses a Jinja for-loop to alias all targets with a_elementaryschema suffix.integration_tests/profiles/generate_profiles.py(new) — Python CLI (click) that:CI_WAREHOUSE_SECRETS(base64 JSON) to populate cloud credentialsStrictUndefinedwhen secrets are present (fail-fast on typos) and_NullUndefinedfor docker-only runs (cloud placeholders resolve to empty strings)toyamlfilter: dicts → compact inline YAML{key: val, …}, Undefined →'', everything else → pass-throughtest-warehouse.yml— "Write dbt profiles" step now callsgenerate_profiles.pyinstead ofbase64 -d | sed.test-all-warehouses.yml— Split into two job groups:test-docker: Runs onpull_requestonly. No secrets needed, no fork approval gate. Matrix: postgres, clickhouse, trino, dremio. Usesexcludepattern to limitlatest_preto postgres only.test-cloud: Keeps existingcheck-fork-status+approve-forkpattern. Matrix: snowflake, bigquery, redshift, databricks_catalog, athena (spark removed). Fusion variants included for snowflake, bigquery, databricks_catalog. Usessecrets: inherit.Updates since last revision
_yaml_inline/toyamlfilter: Reduced from a complex type-aware implementation to a simple 3-branch function (dict → compact YAML, Undefined → empty string, everything else → pass-through). The previous version handled edge cases like YAML-ambiguous strings ("yes","null") that don't appear in real credential fields.| toyamlfilter to all secret-backed fields in the template, not justbigquery_keyfile. This ensures dicts are serialized correctly and Undefined values render cleanly in docker-only mode.includepattern (which created duplicate postgres entries wheninputs.dbt-versionwas provided) withexcludepattern that limitslatest_preto only postgres.object_storage_source→datalake,object_storage_path→root_path,dremio_space→database,dremio_space_folder→schema). Having both causedGot duplicate keys. Resolved by keeping only canonical names.datalake,root_path,database.location: USandpriority: interactivefields.auth_type: oauthto thedatabricks_catalogprofile (required when usingclient_id/client_secret).sparktarget from the workflow matrix.isinstance(decoded, dict)validation andbinascii.Errorhandling in the secret decoder.CI status (latest run)
16/18 jobs pass. The 2 failures are pre-existing flaky issues unrelated to this PR:
latest_official/bigquery— Transient BigQuery infrastructure error:"Streaming data from ... is temporarily unavailable". The profile loads correctly; the error occurs during model execution.fusion/bigquerypasses on the same commit.latest_official/dremio— 1 test out of 178 fails with a seed error (test_sum_failed_row_count). 177/178 dremio tests pass, confirming the profile config is correct.All other targets pass: postgres, clickhouse, trino, snowflake (fusion + latest_official), bigquery (fusion), redshift, databricks_catalog (fusion + latest_official), athena.
Review & Testing Checklist for Human
CI_PROFILES_YMLand compare field-by-field againstprofiles.yml.j2. Pay special attention to: snowflake (role/warehouse names?), redshift (port/dbname?), athena (s3 paths?), databricks (auth_type: oauthcorrect?).CI_WAREHOUSE_SECRETSGitHub secret — must be a base64-encoded JSON object with keys matching the template placeholders (e.g.SNOWFLAKE_ACCOUNT,BIGQUERY_KEYFILE, etc.). The BigQuery keyfile should be a nested JSON object (not a string).CI_WAREHOUSE_SECRETSis created. Consider creating the new secret first, then merging this PR, then deleting the oldCI_PROFILES_YML/CI_PROFILES_YML_FUSIONsecrets.test-docker(viapull_request) andtest-cloud(viapull_request).test-docker(viapull_request) andtest-cloud(viapull_request_targetwith approval).test-dockerdoesn't run twice for internal PRs (skipped onpull_request_target).test-dockerjob doesn't passsecrets: inherit, soCI_WAREHOUSE_SECRETSwill be empty. Confirm thatgenerate_profiles.pyproduces a valid profiles.yml with empty cloud target fields and that docker tests only reference docker targets.Notes
CI_PROFILES_YML_FUSIONneeded).location: US,priority: interactive(BigQuery),datalake: S3Source,database: elementary_ci(Dremio). If these ever change, the template must be updated.datalake,root_path,database,schema) — no aliases. If the adapter's field mapping changes, this may need updating._yaml_inlinefunction returns non-string values for pass-through cases (ints, bools, strings). This is intentional — Jinja2 handles the type coercion correctly when inserting into the YAML template.Link to Devin run: https://app.devin.ai/sessions/fe5ef28048b84b139dfdd9c9d15b2765
Requested by: @haritamar
Resolves: ELE-5262
Summary by CodeRabbit