Skip to content

Conversation

Copy link

Copilot AI commented Jan 7, 2026

The sequential workflow hits 10 bank endpoints every 5 minutes, resulting in high GitHub Actions costs and brittle execution where single failures affect all banks.

Changes

Unified Fetcher (scripts/fetch-bank.ts)

  • Consolidates 10 individual .js postprocessors into single TypeScript script with bank config map
  • Accepts bank name as CLI arg: deno run --allow-all scripts/fetch-bank.ts akbank
  • Logs structured errors to logs/failures.jsonl with timestamp/stack traces
  • Inline readJSON helper replaces external dependency

Matrix Workflow (.github/workflows/flat.yml)

  • Schedule: */5*/10 (50% cost reduction)
  • Matrix strategy with max-parallel: 5, fail-fast: false
  • Concurrency control: group: ${{ github.workflow }}-${{ github.ref }}
  • Retry logic for git push conflicts (5 attempts with exponential backoff)
  • Summary job aggregates matrix results, displays success/failure counts

Automated Issue Management

  • Creates issues for failures with duplicate detection via label query
  • Issue body includes error details, stack trace, workflow run link
  • Auto-closes issues when banks recover
  • Labels: automated, data-fetch

Code Quality

  • Fixed async forEach race conditions → for...of loops
  • Corrected garanti/ziraat URLs to actual endpoints
  • Exact bank name matching in workflow dispatch filter

Example

Manual trigger for specific banks:

# Workflow dispatch input
banks: "akbank,ing,hsbc"

# Filter logic
if: ${{ github.event.inputs.banks == '' || 
       contains(fromJSON(format('["{0}"]', replace(github.event.inputs.banks, ',', '","'))), matrix.bank) }}

Error log structure:

{"timestamp": "2026-01-07T12:00:00.000Z", "bank": "akbank", "error": "HTTP 500", "stack": "..."}

CSV output format and data processing logic unchanged. Compatible with existing archive.py.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • dl.deno.land
    • Triggering command: /usr/bin/curl curl -s REDACTED (dns block)
    • Triggering command: /usr/bin/curl curl --fail --location --progress-bar --output /home/REDACTED/.deno/bin/deno.zip REDACTED (dns block)
    • Triggering command: /usr/local/bin/deno deno run --unsafely-ignore-certificate-errors --allow-all scripts/fetch-bank.ts hsbc (dns block)
  • esm.ubuntu.com
    • Triggering command: /usr/lib/apt/methods/https /usr/lib/apt/methods/https tcp --destination-port 53 -j ACCEPT (dns block)
  • www.hsbcyatirim.com.tr
    • Triggering command: /usr/local/bin/deno deno run --unsafely-ignore-certificate-errors --allow-all scripts/fetch-bank.ts hsbc (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Problem Statement

The current flat-file workflow runs sequentially, hitting multiple bank endpoints every 5 minutes. This results in high GitHub Actions minute costs and limited debuggability. We need to refactor to a more cost-effective and maintainable solution.

Requirements

1. Matrix-Based Parallel Approach

  • Replace sequential fetching with GitHub Actions matrix strategy
  • Run fetches in parallel with fail-fast: false to continue even if one bank fails
  • Use max-parallel: 5 to limit concurrent jobs
  • Add concurrency control to cancel redundant runs

2. Unified Fetcher Script

  • Create a single TypeScript/Deno fetcher script that handles all banks
  • Configure banks via a central config object
  • Script should accept bank name as argument
  • Maintain current CSV output format (no Parquet conversion)
  • Handle errors gracefully with structured logging

3. Improved Error Handling & Debugging

  • Use continue-on-error: true for individual fetch steps
  • Log failures to structured error files
  • Upload failure logs as artifacts (retention: 7 days)
  • Create a summary job that aggregates results from all matrix jobs
  • Show success/failure counts in job summary

4. Automatic Issue Creation

  • Post GitHub issues when banks fail to fetch
  • Critical: Check for existing open issues to prevent duplicates
  • Only create issue if no open issue exists for that bank
  • Include timestamp, bank name, and error details
  • Label with 'automated' and 'data-fetch'
  • Close issues automatically when bank recovers

5. Schedule Optimization

  • Change cron schedule from */5 * * * * to */10 * * * * (every 10 minutes)
  • Keep workflow_dispatch for manual triggers
  • Add optional input to fetch specific banks only

6. Current Behavior to Preserve

  • Keep CSV file format (do NOT convert to Parquet)
  • Maintain existing file structure: {bank}.csv in root
  • Keep current data processing logic from existing .js files
  • Preserve Git commit behavior for data updates

Current Setup

The repository currently uses:

  • .github/workflows/flat.yml with sequential githubocto/flat@v3 actions
  • Individual JavaScript postprocessing files: akbank.js, ing.js, garanti.js, etc.
  • CSV output files: akbank.csv, ing.csv, etc.
  • Deno v1.x runtime

Banks to fetch:

  • akbank
  • ing
  • garanti
  • enpara
  • teb
  • hsbc
  • kuveytturk
  • isbank
  • ziraat
  • yapikredi

Implementation Details

Unified Fetcher Structure

Create scripts/fetch-bank.ts or similar that:

  1. Maintains a config object mapping bank names to their endpoints and parsers
  2. Accepts bank name as CLI argument
  3. Fetches data from the appropriate endpoint
  4. Processes using existing logic from individual .js files
  5. Outputs to CSV in the same format as current implementation
  6. Logs errors to logs/failures.jsonl with structured data

Workflow Structure

  1. Matrix job (fetch): Runs for each bank in parallel
  2. Summary job (summary): Aggregates results, creates issues if needed

Error Tracking

  • Log structure: { timestamp, bank, error, stack }
  • Artifact naming: failure-log-{bank}
  • Issue deduplication: Query existing issues before creating new ones

Success Criteria

  • ✅ Workflow runs every 10 minutes (50% cost reduction)
  • ✅ Banks fetched in parallel (faster execution)
  • ✅ Individual bank failures don't stop other banks
  • ✅ Failed fetches logged with full error details
  • ✅ Job summary shows which banks succeeded/failed
  • ✅ Issues created automatically for failures (no duplicates)
  • ✅ CSV files continue to work as before
  • ✅ Existing data format preserved

Files to Create/Modify

  • .github/workflows/flat.yml - Refactor to matrix strategy
  • scripts/fetch-bank.ts - New unified fetcher (or similar path)
  • Any bank config files needed
  • Error logging structure

Notes

  • Do NOT convert to Parquet - keep CSV format
  • Reuse existing parsing logic from .js files where possible
  • Ensure backward compatibility with existing data consumers
  • Consider the existing archive.py script that processes these CSVs

This pull request was created from Copilot chat.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Refactor flat-file workflow to matrix-based parallel approach Refactor to matrix-based parallel workflow with automated issue management Jan 7, 2026
Copilot AI requested a review from c6p January 7, 2026 08:03
Copilot finished work on behalf of c6p January 7, 2026 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants