Skip to content

feat: custom modifiers#1188

Closed
lukasheinrich wants to merge 13 commits intomasterfrom
custom_modifiers
Closed

feat: custom modifiers#1188
lukasheinrich wants to merge 13 commits intomasterfrom
custom_modifiers

Conversation

@lukasheinrich
Copy link
Contributor

@lukasheinrich lukasheinrich commented Nov 19, 2020

Description

This is a first example on how to e.g. have a "gaussian" bump on top of a simple model
this gives a first look into how to solve #850

cc @alexander-held

#%%
import pyhf
import matplotlib.pyplot as plt
import numpy as np
# %%
class custom_modifier(object):
    def apply(self,pars):
        #shape is  (n_modifiers, n-global_samples, n_alphas, n_global_bin)
        base = np.zeros((1,len(m.config.samples),1,sum(m.config.channel_nbins.values())))

        import scipy.stats
        bins = np.linspace(-5,5,11)
        yields = 20*(scipy.stats.norm.cdf(bins[1:]) - scipy.stats.norm.cdf(bins[:-1]))

        base[0,self.config.samples.index('signal'),0,:] = yields
        return base



m = pyhf.Model(
{'channels': [{'name': 'singlechannel',
   'samples': [{'name': 'signal',
     'data': [0]*10,
     'modifiers': [{'name': 'mu', 'type': 'normfactor', 'data': None}]},
    {'name': 'background',
     'data': [10]*10,
     'modifiers': [{'name': 'uncorr_bkguncrt',
       'type': 'shapesys',
       'data': [5]*10}]}]}]},
       custom_modifiers=[custom_modifier()]
)

bp = pyhf.tensorlib.astensor(m.config.suggested_init())
bp[m.config.poi_index] = 0.2
plt.bar(range(10),m.expected_actualdata(bp))
# %%

# %%

image

@matthewfeickert matthewfeickert marked this pull request as draft November 19, 2020 17:17
@alexander-held
Copy link
Member

A big question for solving #850 is what kind of language to use to specify custom expressions. I think for many use cases the expressions are simple enough that the language does not matter much. And it might be possible to add the language to the workspace in such a way that in the future new languages could be added?

@lukasheinrich
Copy link
Contributor Author

yeah I think we need to solve 2 issues:

  • just technically allow the specification of new modifiers at the python level
  • how to codify these into JSON this is in the direction of HiFav2

a big question is also in what order these modifications happen, i.e. here we have the custom modifier an additive modifier where the total rate in pyhf is computed as

lambda = (nominal + additive.....) * multiplicative

but you could also imagine expressions that expect to operate on yields onec all other modifiers have been applied

@alexander-held
Copy link
Member

I think this is technically still in the scope of normal HistFactory https://root.cern/doc/v620/classRooStats_1_1HistFactory_1_1Measurement.html#ade80531b4588d22ad5e7339ce74e7027. The version implemented there just acts as a normfactor, so this question of order is avoided. Once you allow more custom variations the problem becomes more complicated indeed.

@lukasheinrich
Copy link
Contributor Author

lukasheinrich commented Nov 19, 2020

this now supports declaring additional parameters in the modifiers.. here e.g. a mean of the gaussian

class custom_modifier(object):
    def __init__(self,name = 'name'):
        self.required_parsets = {
            'mean': {
                'paramset_type': pyhf.parameters.paramsets.unconstrained,
                'n_parameters': 1,
                'modifier': 'normfactor',
                'is_constrained': False,
                'is_shared': True,
                'inits': (0.0,),
                'bounds': ((-5, 5),),
                'fixed': False
        }}
    def apply(self,pars):
        base = np.zeros((1,len(m.config.samples),1,sum(m.config.channel_nbins.values())))

        bins = np.linspace(-5,5,11)
        mean = pars[self.config.par_slice('mean')][0]
        yields = 20*(scipy.stats.norm(loc = mean).cdf(bins[1:]) - scipy.stats.norm(loc = mean).cdf(bins[:-1]))
        base[0,self.config.samples.index('signal'),0,:] = yields
        return base

image

@lukasheinrich lukasheinrich force-pushed the custom_modifiers branch 4 times, most recently from 545693d to 763c3a1 Compare November 20, 2020 14:20
@lukasheinrich lukasheinrich changed the title first example custom modifiers Nov 20, 2020
@matthewfeickert matthewfeickert added the feat/enhancement New feature or request label Nov 25, 2020
@lukasheinrich lukasheinrich force-pushed the custom_modifiers branch 15 times, most recently from 9dd6470 to 81a8734 Compare November 26, 2020 18:30
@lukasheinrich lukasheinrich force-pushed the custom_modifiers branch 4 times, most recently from b2a17cf to 30a696a Compare January 2, 2021 15:29
Copy link
Contributor

@kratsg kratsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think I do like a bulk of the refactoring. One thing that struck me is that we still kept around the op_code stuff -- which I suppose is fine. I guess a future iteration would somehow add a modify_rate() method that would know how to modify the rate that it gets passed in... and then at some point, we'd have to deal with priority/preference in some way.. but this is a rabbit hole anyway.

One thing I appreciate a lot, that was not so clear from the body of the PR, is that you've elided a lot of the logic from the megasamples back down to each of the individual modifier modules -- so that pdf.py isn't as scary as it used to be.

model (:class:`~pyhf.pdf.Model`): The Model instance.

"""
modifier_set = modifier_set or pyhfset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
modifier_set = modifier_set or pyhfset
modifier_set = {**pyhfset, **(modifier_set or {})}

I would prefer that we're able to override the default set of modifiers rather than completely replacing it.

# run jsonschema validation of input specification against the (provided) schema
log.info(f"Validating spec against schema: {self.schema:s}")
utils.validate(self.spec, self.schema, version=self.version)
if validate:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this I'm unsure about. We should require users to be including their own custom schema rather than skipping the validation? but...

'shapefactor': shapefactor,
'shapesys': shapesys,
'staterror': staterror,
from .histosys import histosys_builder, histosys_combined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we dropping __all__ from this? Presumably, someone would want to build their own custom modifier based off an existing one and we're not quite allowing it to be imported so easily. But maybe that's the point?

from .shapesys import shapesys_builder, shapesys_combined
from .staterror import staterr_builder, staterror_combined

pyhfset = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a huge fan here. we're relying on basically bookkeeping through the indices (which is scattered throughout the code) and knowing that builder is [0] and combined is [1]. We could either do a namedtuple here for each, so that modifiers['histosys'].builder and modifiers['histosys'].combined work instead, or we do pyhf.modifiers.builders['histosys'] and pyhf.modifiers.combined['histosys']. I think this was a better design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing is this is exactly the registry model (without the decoration functionality) which is fine. I guess the question is whether you want the registry to be defined per-model or per-pyhf instantiation. This change allows a different modifier_set per-model which is probably a good thing (less "global" state, and more state-less).

'fixed': False,
'auxdata': (0.0,),
}
def required_parset(sample_data, modifier_data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should likely be called _required_parset to indicate it's not a public method to be exposed in this module. Same for all others.

)
assert result.shape[1] == 2
assert pytest.approx([0.0, 0.26418431]) == pyhf.tensorlib.tolist(result[:, 1])
assert pytest.approx([0.26418431, 0.0]) == pyhf.tensorlib.tolist(result[:, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess can we add one more assert here on the order of the parameters in the model, just so we can keep track moving forward as well? This will future-proof in case it somehow changes again.

}
with pytest.raises(pyhf.exceptions.InvalidModifier):
pyhf.pdf._ModelConfig(spec)
pyhf.pdf.Model(spec, validate=False) # don't validate to delay exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pyhf.pdf.Model(spec, validate=False) # don't validate to delay exception
pyhf.pdf.Model(spec, validate=False) # skip validation exception

model = pyhf.Model(spec, poi_name='mypoi')
assert model.config.suggested_fixed() == [False, True]
assert model.config.poi_index == 1
assert model.config.suggested_fixed()[model.config.par_slice('mypoi')] == [True]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order changed again didn't it? why not just swap it around instead of combining the two? We won't catch a bug if both parameters are somehow "fixed" moving forward (so we lose coverage)

assert len(model.config.suggested_fixed()) == 5
assert model.config.suggested_fixed() == [False, True, True, True, False]
assert model.config.poi_index == 4
assert model.config.suggested_fixed()[model.config.par_slice('uncorr')] == [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here about catching the bug.

]
),
rtol=1e-5,
rtol=2e-5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this get relaxed (if it's a refactor)?

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@balunas
Copy link

balunas commented Aug 17, 2021

It looks like it's been quite some time since any progress was made here. It's been bumped back for a few releases, is there some technical limitation, or has it just not been prioritized? There are some analysis use cases this could be really helpful for, and we've been having to use some really inefficient work-arounds instead. Is there a way we could help?

@IzaV10
Copy link

IzaV10 commented Aug 17, 2021

Hi,

I tried running the simple example with the "gaussian" bump on top of a simple mode. The branch was installed as pip install git+https://github.com/scikit-hep/pyhf.git@custom_modifiers but when running the code abovel (from the very first entry of the conversation) I get an error.

File "/<PATH TO VIRTUAL ENVIRONMENT>/pyhf-env/lib/python3.7/site-packages/pyhf/pdf.py", line 497, in __init__
    self.config = _ModelConfig(self.spec, **config_kwargs)
  File "/<PATH TO VIRTUAL ENVIRONMENT>/pyhf-env/lib/python3.7/site-packages/pyhf/pdf.py", line 171, in __init__
    f"Unsupported options were passed in: {list(config_kwargs.keys())}."
pyhf.exceptions.Unsupported: Unsupported options were passed in: ['custom_modifiers'].

I suspect that the way the class custom_modifier() is defined needs to be changed. Is there any documentation available on how to use this option?

Thank you for your help and time.

Cheers,
Iza

@lhenkelm lhenkelm mentioned this pull request Sep 23, 2021
1 task
@lukasheinrich
Copy link
Contributor Author

tests pass (2 tiny floating point changes... ) will clean up tomorrow

@lukasheinrich
Copy link
Contributor Author

I don't really understand the CI failures - passes on my machine - any clues @matthewfeickert ?

image

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

string-based constraints and more dedubpe in builders

pyflakes

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
@lukasheinrich
Copy link
Contributor Author

closed in favor of #1625

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat/enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants