Restore legacy datelc, fix ConfigFile error path, and sanitize CSV author names #917

shbhmexe · 2025-11-23T17:16:12Z

This PR makes small, safe corrections that improve stability and correctness without changing behavior or adding features:

Restore legacy datelc file (cncfdm.py)

Problem: src/tests/gitdm-tests.py expects a datelc file for the -D path, but cncfdm only wrote datelc.csv.
Fix: Write both datelc.csv (modern, with header) and legacy datelc (fixed‑width, no header) like the historical gitdm.py did.
Impact: Tests and existing downstream scripts relying on datelc work again; no change to CSV output.

Correct error path in ReadFileType() (ConfigFile.py)

Problem: Code called ConfigFile.croak, which referenced a function attribute that doesn’t exist; on malformed lines this would raise the wrong exception.
Fix: Call module‑level croak() directly.
Impact: Proper, user‑friendly error handling on bad file‑type config lines.

CSV output hygiene (csvdump.py)

Problem: Author name was sanitized, then immediately overwritten, dropping quote/backslash sanitization; CSV files were left unclosed.
Fix: Sanitize once (quotes, backslashes, apostrophes) then email_encode; close CSV files after writing.
Impact: More robust CSVs on Windows and fewer chances of malformed fields; semantics unchanged.

Indentation normalization (cncfdm.py, gitdm.py)

Problem: Mixed tabs/spaces in a small block under --numstat handling.
Fix: Convert those lines to spaces; no logic changes.
Impact: Avoids potential TabError/formatting issues; behavior identical.

Verification

Changes are isolated and additive.
The -D flow now emits both datelc.csv and datelc (legacy), matching expectations in src/tests/gitdm-tests.py.
No core counting logic, parsing, or report math was changed.
CSV headers and rows are unchanged except for safer author-name sanitization (previously intended but accidentally bypassed).

Risk assessment

Very low. The only observable output change is the presence of the legacy datelc file (additive) and correctly sanitized author names in ChangeSets CSVs.

How to test

Run the regression tests from repo root:
- python src/tests/gitdm-tests.py
Or quick smoke‐test for -D path:
- git --git-dir src/tests/testrepo log -p -M | ./gitdm -D
- Verify both datelc and datelc.csv are created and datelc matches src/tests/expected-datelc.
  xpected by tests. No breaking changes.

…normalize indent - cncfdm: write legacy fixed‑width `datelc` alongside CSV `datelc.csv` to keep tests and existing tooling working. - ConfigFile: call module‐level `croak()` in `ReadFileType()` instead of `ConfigFile.croak` (which was invalid and would crash on malformed lines). - csvdump: consistently sanitize and encode author names (quotes, backslashes, apostrophes) when emitting ChangeSets; close CSV files after writing. - Minor: replace tab indentation with spaces in numstat blocks (cncfdm.py, gitdm.py) to avoid tab/space mixups. These changes do not alter analytics logic or outputs except for: (a) re‑adding the legacy `datelc` file (additive), and (b) fixing author name sanitization in the CSV which was intended but previously overridden. Signed-off-by: Shubham Shukla <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Restore legacy datelc, fix ConfigFile error path, and sanitize CSV author names #917

Restore legacy datelc, fix ConfigFile error path, and sanitize CSV author names #917

shbhmexe commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Restore legacy datelc, fix ConfigFile error path, and sanitize CSV author names #917

Are you sure you want to change the base?

Restore legacy datelc, fix ConfigFile error path, and sanitize CSV author names #917

Conversation

shbhmexe commented Nov 23, 2025

Verification

Risk assessment

How to test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant