Skip to content

Conversation

@shbhmexe
Copy link
Contributor

This PR makes small, safe corrections that improve stability and correctness without changing behavior or adding features:

  1. Restore legacy datelc file (cncfdm.py)
  • Problem: src/tests/gitdm-tests.py expects a datelc file for the -D path, but cncfdm only wrote datelc.csv.
  • Fix: Write both datelc.csv (modern, with header) and legacy datelc (fixed‑width, no header) like the historical gitdm.py did.
  • Impact: Tests and existing downstream scripts relying on datelc work again; no change to CSV output.
  1. Correct error path in ReadFileType() (ConfigFile.py)
  • Problem: Code called ConfigFile.croak, which referenced a function attribute that doesn’t exist; on malformed lines this would raise the wrong exception.
  • Fix: Call module‑level croak() directly.
  • Impact: Proper, user‑friendly error handling on bad file‑type config lines.
  1. CSV output hygiene (csvdump.py)
  • Problem: Author name was sanitized, then immediately overwritten, dropping quote/backslash sanitization; CSV files were left unclosed.
  • Fix: Sanitize once (quotes, backslashes, apostrophes) then email_encode; close CSV files after writing.
  • Impact: More robust CSVs on Windows and fewer chances of malformed fields; semantics unchanged.
  1. Indentation normalization (cncfdm.py, gitdm.py)
  • Problem: Mixed tabs/spaces in a small block under --numstat handling.
  • Fix: Convert those lines to spaces; no logic changes.
  • Impact: Avoids potential TabError/formatting issues; behavior identical.

Verification

  • Changes are isolated and additive.
  • The -D flow now emits both datelc.csv and datelc (legacy), matching expectations in src/tests/gitdm-tests.py.
  • No core counting logic, parsing, or report math was changed.
  • CSV headers and rows are unchanged except for safer author-name sanitization (previously intended but accidentally bypassed).

Risk assessment

  • Very low. The only observable output change is the presence of the legacy datelc file (additive) and correctly sanitized author names in ChangeSets CSVs.

How to test

  • Run the regression tests from repo root:
    • python src/tests/gitdm-tests.py
  • Or quick smoke‐test for -D path:
    • git --git-dir src/tests/testrepo log -p -M | ./gitdm -D
    • Verify both datelc and datelc.csv are created and datelc matches src/tests/expected-datelc.
      xpected by tests. No breaking changes.

…normalize indent

- cncfdm: write legacy fixed‑width `datelc` alongside CSV `datelc.csv` to keep tests and existing tooling working.
- ConfigFile: call module‐level `croak()` in `ReadFileType()` instead of `ConfigFile.croak` (which was invalid and would crash on malformed lines).
- csvdump: consistently sanitize and encode author names (quotes, backslashes, apostrophes) when emitting ChangeSets; close CSV files after writing.
- Minor: replace tab indentation with spaces in numstat blocks (cncfdm.py, gitdm.py) to avoid tab/space mixups.

These changes do not alter analytics logic or outputs except for: (a) re‑adding the legacy `datelc` file (additive), and (b) fixing author name sanitization in the CSV which was intended but previously overridden.

Signed-off-by: Shubham Shukla <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant