Fix/deduplicate casting log message by DavidEiglspergerQC · Pull Request #983 · Quantco/glum

DavidEiglspergerQC · 2026-03-11T16:20:42Z

Problem:
align_df_categories logs at INFO every time a column is cast to Enum or its categories are re-aligned. Since it runs on every .predict() call (via _convert_from_df), any code path that calls predict in a loop (CV grid search, PD plots, SHAP, H² stats produces hundreds of identical log lines)

Solution:
Track emitted columns in a module-level set and only log the first occurrence. Casting/alignment behavior is unchanged.

There might be more nuanced fixes for this, I just realized as I ran some workflows using a categorical feature and found it quite annoying that the entire logs are spammed with the same message, so I went for this quick fix. Happy to get your opinion on this.

Copilot

Pull request overview

Reduces repeated INFO logging from align_df_categories during prediction workflows by deduplicating “cast/align category” messages across calls, and documents the change for the next patch release.

Changes:

Add a module-level set to ensure align_df_categories emits cast/align INFO logs only once per column per process/session.
Update the changelog with an unreleased 3.2.1 entry describing the reduced log spam.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`src/glum/_utils.py`	Deduplicates `align_df_categories` INFO logs using a module-level emitted-columns set.
`CHANGELOG.rst`	Adds an unreleased entry describing the logging change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/glum/_utils.py

CHANGELOG.rst

MarcAntoineSchmidtQC

Looks good. Thanks!

stanmart · 2026-03-12T13:39:39Z

We might also want to think about making these messages DEBUG-level, but the current fix is good nevertheless, thank you!

stanmart

Oh wait, sorry, I missed that this is a module level global. This might get a bit confusing (log is emitted only once per session, not once per fit).

MarcAntoineSchmidtQC · 2026-03-12T15:33:39Z

Oh wait, sorry, I missed that this is a module level global. This might get a bit confusing (log is emitted only once per session, not once per fit).

That's a good point. I think it should be displayed once per fit, but then the question is: Can we make the call _convert_from_df only once in these examples?

DavidEiglspergerQC · 2026-03-12T15:38:42Z

Oh wait, sorry, I missed that this is a module level global. This might get a bit confusing (log is emitted only once per session, not once per fit).

That's a good point. I think it should be displayed once per fit, but then the question is: Can we make the call _convert_from_df only once in these examples?

Exactly, using an instance-level set, we still get duplicated warnings as e.g. for cv we create fresh estimators for each fold/param combo, so each gets its own empty set...

DavidEiglspergerQC · 2026-03-12T15:55:59Z

We now:

Use DEBUG level
Use instance level deduplication

This solves the whole problem if one runs the workflows at INFO level and "most" of the problem if one runs them at DEBUG level (no tons of duplicate logs e.g. for pd plots).

The only "odd" thing is that at DEBUG level we have duplicate logs for CV as we there create fresh estimators for each fold/param combo and therefore have fold times each log as you can see in this screenshot:

stanmart

Thank you, this looks good to me. I don't mind deduplicated debug messages too much. They are usually hidden by default, and whenever the user wants debug logging extra info is usually not a problem.

DavidEiglspergerQC added 2 commits March 11, 2026 17:03

Deduplicate log messages

63819e2

Add changelog entry

038470a

DavidEiglspergerQC requested review from MatthiasSchmidtblaicherQC and Copilot March 11, 2026 16:20

Copilot started reviewing on behalf of DavidEiglspergerQC March 11, 2026 16:21 View session

DavidEiglspergerQC marked this pull request as ready for review March 11, 2026 16:22

DavidEiglspergerQC requested review from MarcAntoineSchmidtQC, jtilly and stanmart as code owners March 11, 2026 16:22

Copilot AI reviewed Mar 11, 2026

View reviewed changes

src/glum/_utils.py Outdated Show resolved Hide resolved

src/glum/_utils.py Outdated Show resolved Hide resolved

CHANGELOG.rst Outdated Show resolved Hide resolved

Co-pilot feedback

78fe2a7

MarcAntoineSchmidtQC approved these changes Mar 12, 2026

View reviewed changes

stanmart requested changes Mar 12, 2026

View reviewed changes

change to instance level set and debg-level

cf24e54

stanmart approved these changes Mar 12, 2026

View reviewed changes

DavidEiglspergerQC merged commit 88547f7 into main Mar 12, 2026
24 checks passed

DavidEiglspergerQC deleted the fix/deduplicate-casting-log-message branch March 12, 2026 16:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/deduplicate casting log message#983

Fix/deduplicate casting log message#983
DavidEiglspergerQC merged 4 commits intomainfrom
fix/deduplicate-casting-log-message

DavidEiglspergerQC commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarcAntoineSchmidtQC left a comment

Uh oh!

stanmart commented Mar 12, 2026

Uh oh!

stanmart left a comment

Uh oh!

MarcAntoineSchmidtQC commented Mar 12, 2026

Uh oh!

DavidEiglspergerQC commented Mar 12, 2026

Uh oh!

DavidEiglspergerQC commented Mar 12, 2026

Uh oh!

stanmart left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

DavidEiglspergerQC commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarcAntoineSchmidtQC left a comment

Choose a reason for hiding this comment

Uh oh!

stanmart commented Mar 12, 2026

Uh oh!

stanmart left a comment

Choose a reason for hiding this comment

Uh oh!

MarcAntoineSchmidtQC commented Mar 12, 2026

Uh oh!

DavidEiglspergerQC commented Mar 12, 2026

Uh oh!

DavidEiglspergerQC commented Mar 12, 2026

Uh oh!

stanmart left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants