Implement specialized Hurdle distribution #7810

ricardoV94 · 2025-06-04T12:16:21Z

It indirectly addresses the issue reported in in pymc-devs/nutpie#163

The new objects have a logp that handles the discrete + continuous process correctly, without requiring the arbitrary truncation of the latter at epsilon. This provides a cheaper and more stable logp / logcdf.
For discrete variables we keep using a truncation

~~Also added special logic to truncate a Hurdle distribution which solves bambinos/bambi#768~~, this is not the desired behavior, reverted it

CC @zwelitunyiswa

📚 Documentation preview 📚: https://pymc--7810.org.readthedocs.build/en/7810/

This does not require the arbitrary truncation of continuous distribution in the logp/logcdf

ricardoV94 · 2025-06-04T15:05:32Z

tests/distributions/test_distribution.py

+        dist
+        for dist in dists
+        if (
+            getattr(dist, "rv_type", None) is not None


This was too restrictive, a subclass also inherits the dispatch function, and need not be in the registry explicitly

codecov · 2025-06-04T15:32:02Z

Codecov Report

Attention: Patch coverage is 93.18182% with 6 lines in your changes missing coverage. Please review.

Project coverage is 92.86%. Comparing base (c217727) to head (0917343).

Files with missing lines	Patch %	Lines
pymc/distributions/mixture.py	91.30%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7810      +/-   ##
==========================================
+ Coverage   91.77%   92.86%   +1.08%     
==========================================
  Files         107      107              
  Lines       18381    18407      +26     
==========================================
+ Hits        16870    17093     +223     
+ Misses       1511     1314     -197

Files with missing lines	Coverage Δ
pymc/distributions/moments/means.py	`100.00% <100.00%> (ø)`
pymc/distributions/truncated.py	`99.48% <100.00%> (+0.03%)`	⬆️
pymc/distributions/mixture.py	`95.27% <91.30%> (+0.25%)`	⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ricardoV94 · 2025-06-04T15:55:55Z

pymc/distributions/truncated.py

+
+    # Creating a hurdle with the adjusted weights and the truncated distribution
+    # Should be equivalent to truncating the original hurdle distribution
+    return op.rv_op(adjusted_weights, zero_dist, truncated_dist, size=size)


Actually this is not quite correct, one must also adjust psi by the amount of mass truncated in the other dist...

Also users may be more interested in a Hurdle with a truncated dist, instead of a truncated hurdle, as it won't distort the meaning of the original psi

This is something that could be added at the bambi level perhaps @tomicapretto?

@ricardoV94 In my use case I am interested in constraining the gamma distribution of the hurdle-gamma. The reason is that there are natural upper limits as it relates to body parts (i.e. the injury area cannot exceed the size of the body part or some proportion thereof). Therefore, any predictions are above that threshold would not make clinical sense.

It's a subtle difference, a Truncated(Hurdle(psi, Gamma)) vs Hurdle(psi, Truncated(Gamma))

The generative process of the first is that you resample the whole hurdle process everytime you get a draw out of the truncation bounds, which necessarily inflates the zeros above (1-psi).

In the second is the usual hurdle process, you either get zeros with (1-psi) or you get non zeros from a truncated gamma.

I guess your is more like the second?

@ricardoV94 It is the Hurdle(psi, Truncated(Gamma)).

Then I suspect some functionality needs to go into bambi. You can create that distribution already in PyMC, using the same strategy with which HurdleGamma is defined, but using a TruncatedGamma as the dist.

Does Bambi allow to use arbitrary PyMC distributions or only a subset that it already understands?

Because otherwise the constrained thing in bambi would do the Truncated(Hurdle(Gamma)) not the one you want.

Bambi does allow arbitrary non-mixture continuous PYMC distributions (the way @tomicapretto implemented it is very intuitive and flexible). Ah, so constrained would do Truncated(Hurdle(psi, Gamma)) and inflate 0 counts. Good to know, especially that is very important in what I study. Bambi allows you to apply truncated priors, but erroneous data could overwhelm that prior ... I should actually test that.

I removed the commit to truncate hurdles. Instead what you want seems to be this:

import pymc as pm from pymc.distributions import Gamma, Truncated from pymc.distributions.mixture import _Hurdle class HurdleTruncatedGamma: def __new__(cls, name, psi, alpha=None, beta=None, mu=None, sigma=None, upper=None, **kwargs): return _Hurdle._create( name=name, nonzero_p=psi, nonzero_dist=Truncated.dist(Gamma.dist(alpha=alpha, beta=beta, mu=mu, sigma=sigma), upper=upper), **kwargs, ) @classmethod def dist(cls, psi, alpha=None, beta=None, mu=None, sigma=None, upper=None, **kwargs): return _Hurdle._create( name=None, nonzero_p=psi, nonzero_dist=Truncated.dist(Gamma.dist(alpha=alpha, beta=beta, mu=mu, sigma=sigma), upper=upper), **kwargs, ) dist = HurdleTruncatedGamma.dist(psi=0.5, mu=20, sigma=10, upper=30, size=(1000,)) draws = pm.draw(dist, random_seed=48) print((draws == 0).mean(), draws.max()) # 0.5 29.921184342002615

That means we may still need the scalarloop implemented in numba, since I suspect the special operation will show up in the gradient of the Truncated Gamma. Anyway, this PR already makes hurdles less expensive by default, just not this special Hurdle of a Truncated variable.

zwelitunyiswa · 2025-06-04T16:07:49Z

It indirectly addresses the issue reported in in pymc-devs/nutpie#163

The new objects have a logp that handles the discrete + continuous process correctly, without requiring the arbitrary truncation of the latter at epsilon. This provides a cheaper and more stable logp / logcdf. For discrete variables we keep using a truncation

Also added special logic to truncate a Hurdle distribution which solves bambinos/bambi#768

CC @zwelitunyiswa

📚 Documentation preview 📚: https://pymc--7810.org.readthedocs.build/en/7810/

@ricardoV94 This is amazing. Thank you so much for this!

zaxtax

LGTM! We should probably add a test similar to one that motivate this PR in the first class

ricardoV94 · 2025-06-05T20:21:54Z

LGTM! We should probably add a test similar to one that motivate this PR in the first class

We have the pre-existing hurdlesl tests, in a sense this is just a refactor/optimization. Can't think of anything ~~reasonable~~ obvious to test here?

zaxtax · 2025-06-05T20:25:08Z

LGTM! We should probably add a test similar to one that motivate this PR in the first class

We have the pre-existing hurdlesl tests, in a sense this is just a refactor/optimization. Can't think of anything ~~reasonable~~ obvious to test here?

What caused the error originally reported here? pymc-devs/nutpie#163 Does that have a test already?

ricardoV94 · 2025-06-05T20:29:23Z

LGTM! We should probably add a test similar to one that motivate this PR in the first class

We have the pre-existing hurdlesl tests, in a sense this is just a refactor/optimization. Can't think of anything ~~reasonable~~ obvious to test here?

What caused the error originally reported here? pymc-devs/nutpie#163 Does that have a test already?

That was fixed sometime ago in PyTensor: pymc-devs/pytensor#1137

The performance question when in numba is addressed by pymc-devs/pytensor#1445

Neither is PyMC specific

zaxtax · 2025-06-06T07:39:58Z

LGTM! We should probably add a test similar to one that motivate this PR in the first class

We have the pre-existing hurdlesl tests, in a sense this is just a refactor/optimization. Can't think of anything ~~reasonable~~ obvious to test here?

What caused the error originally reported here? pymc-devs/nutpie#163 Does that have a test already?

That was fixed sometime ago in PyTensor: pymc-devs/pytensor#1137

The performance question when in numba is addressed by pymc-devs/pytensor#1445

Neither is PyMC specific

Feel free to merge whenever you feel comfortable! I think it's good to go

lucianopaz

Looks good to me. I just left two very minor comments

lucianopaz · 2025-06-06T07:36:38Z

pymc/distributions/mixture.py

@@ -56,8 +56,8 @@
 ]


-class MarginalMixtureRV(SymbolicRandomVariable):
-    """A placeholder used to specify a distribution for a mixture sub-graph."""
+class _BaseMixtureRV(SymbolicRandomVariable):


Should you also change the _print_name?

lucianopaz · 2025-06-06T07:44:54Z

pymc/distributions/mixture.py

-    )
-
-    return mix_logp
+    mix_support_point = pt.sum(weights * support_point_components, axis=mix_axis)


Why not use the logsumexp and have log scale weights here? Is it because the weights are already in the 0-1 range and taking the log won’t help with precision?

We're not computing any log quantities nor starting with any log quantities so I don't think it would help. Also the initial point is not so critical?

ricardoV94 requested a review from tomicapretto June 4, 2025 12:16

ricardoV94 force-pushed the hurdle_mixtures branch 2 times, most recently from 77ed668 to 9c65ca1 Compare June 4, 2025 14:02

ricardoV94 changed the title ~~Implement specialized Hurdle distribution~~ Implement specialized Hurdle distribution and allow truncating it Jun 4, 2025

ricardoV94 added the enhancements label Jun 4, 2025

ricardoV94 force-pushed the hurdle_mixtures branch 3 times, most recently from 3d5772c to bdb3f12 Compare June 4, 2025 14:15

ricardoV94 requested a review from lucianopaz June 4, 2025 14:15

ricardoV94 force-pushed the hurdle_mixtures branch from bdb3f12 to ac73b55 Compare June 4, 2025 14:28

Implement specialized Hurdle class

3f7efa1

This does not require the arbitrary truncation of continuous distribution in the logp/logcdf

ricardoV94 force-pushed the hurdle_mixtures branch from ac73b55 to 9a65487 Compare June 4, 2025 14:52

pymc-devs deleted a comment from review-notebook-app bot Jun 4, 2025

ricardoV94 force-pushed the hurdle_mixtures branch from 9a65487 to 0917343 Compare June 4, 2025 15:04

ricardoV94 commented Jun 4, 2025

View reviewed changes

ricardoV94 marked this pull request as draft June 4, 2025 15:59

ricardoV94 force-pushed the hurdle_mixtures branch from 0917343 to 3f7efa1 Compare June 5, 2025 08:21

ricardoV94 changed the title ~~Implement specialized Hurdle distribution and allow truncating it~~ Implement specialized Hurdle distribution Jun 5, 2025

ricardoV94 marked this pull request as ready for review June 5, 2025 12:34

zaxtax approved these changes Jun 5, 2025

View reviewed changes

lucianopaz approved these changes Jun 6, 2025

View reviewed changes

Uh oh!

Implement specialized Hurdle distribution #7810

Are you sure you want to change the base?

Implement specialized Hurdle distribution #7810

Conversation

ricardoV94 commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jun 4, 2025

Codecov Report

Uh oh!

ricardoV94 Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zwelitunyiswa Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zwelitunyiswa Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zwelitunyiswa Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zwelitunyiswa commented Jun 4, 2025

Uh oh!

zaxtax left a comment

Choose a reason for hiding this comment

Uh oh!

ricardoV94 commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zaxtax commented Jun 5, 2025

Uh oh!

ricardoV94 commented Jun 5, 2025

Uh oh!

zaxtax commented Jun 6, 2025

Uh oh!

lucianopaz left a comment

Choose a reason for hiding this comment

Uh oh!

lucianopaz Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

lucianopaz Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ricardoV94 commented Jun 4, 2025 •

edited

Loading

ricardoV94 Jun 4, 2025 •

edited

Loading

zwelitunyiswa Jun 4, 2025 •

edited

Loading

ricardoV94 Jun 4, 2025 •

edited

Loading

zwelitunyiswa Jun 4, 2025 •

edited

Loading

ricardoV94 Jun 4, 2025 •

edited

Loading

zwelitunyiswa Jun 5, 2025 •

edited

Loading

ricardoV94 Jun 5, 2025 •

edited

Loading

ricardoV94 commented Jun 5, 2025 •

edited

Loading