Skip to content

PR to record the prevalence of disease/condition, still births, neonatal death, maternal mortality#1455

Closed
RachelMurray-Watson wants to merge 747 commits intomasterfrom
rmw/log_prevalence_all_disease
Closed

PR to record the prevalence of disease/condition, still births, neonatal death, maternal mortality#1455
RachelMurray-Watson wants to merge 747 commits intomasterfrom
rmw/log_prevalence_all_disease

Conversation

@RachelMurray-Watson
Copy link
Collaborator

@RachelMurray-Watson RachelMurray-Watson commented Aug 15, 2024

The point prevalence is recorded for a number of modules and conditions within modules (Alri, BladderCancer, BreastCancer, CardioMetabolicDisorders ( chronic_ischemic_hd, chronic_kidney_disease, chronic_lower_back_pain, diabetes, hypertension), COPD, Depression, Diarrhoea, Epilepsy, Hiv, Labor (Intrapartum stillbirth), Malaria, Measles, NewbornOutcomes, OesophagealCancer, OtherAdultCancer, PostnatalSupervisor, PregnancySupervisor (Antenatal stillbirth), ProstateCancer, RTI, Schisto, TB, Demography (maternal_deaths, newborn_deaths).

Additional questions:
- Okay to calculate the prevalence of diarrhoea? It is a not really a disease in its own right, more of a symptom
- For some modules (RTI), may be more accurately described by calculating incidence, rather than prevalence. Is that useful/okay? Or should it be skipped?

Other notes:
- COPD is defined as ch_lung_function > 3.
- Have not included events in the CardioMetabolic Module (ever_heart_attack and ever_stroke) as would be a cumulative incidence living people who have had such events

@RachelMurray-Watson RachelMurray-Watson marked this pull request as ready for review August 22, 2024 09:19
force_cols=self.recognised_modules_names,
)
self._years_written_to_log += [year]
def write_to_log_prevalence_monthly(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Monthly prevalence logs seem reasonable, but bear in mind that some conditions, e.g. malaria can develop and resolve within 1 week. In these cases, we could think about using the clinical counter - which counts episodes of disease. It may not make too much difference but perhaps running a daily then monthly logger and checking if the prevalences vary considerably would be useful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made this change below


return health_values.loc[df.is_alive]

def report_prevalence(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we should log only active cases, this way we can compare with WHO reports / GBD etc. Also the way that we assign latent cases is not identical to other models, we don't have infections -> latent -> active so we would under-estimate the latent infections. Best to stick to symptomatic active cases only.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's great to know, thank you!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the change

assert (df.dtypes == orig.dtypes).all()


def find_closest_recording(prevalence, target_date, log_value, column_name, multiply_by_pop):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the usefulness of finding the closest reported prevalence value. Would a useful test perhaps be to check all registered modules are logging prevalence every month and the inverse of this (no prevalence reported if module not registered), or set the incidence to 0 for one disease and check logger not reporting anything above 0, assert prevalence values for 2010 within reasonable range, e.g. test for extreme or unlikely values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a long discussion about the test yesterday on our call Part II. We decided to include a dummy disease with which to compare prevalences (this will be in the new test file), and to see if the prevalence of what it reports matches with what has been reported it its own logging file.

I suppose by doing it the way that I was doing it, I was trying to see if the calculations themselves were working, as well as the general mechanics of logging. But do you think such a test is unnecessary? And that by showing e.g. with a dummy module and/or what you have suggested above, it would suffice?

Copy link
Collaborator

@tbhallett tbhallett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just adding here the comment made in-person: I like the way this is being done overall.... and we should add this new method to the Module base class so that it's formally part of the definition of a disease module (in the same way that report_dalys is.)

@RachelMurray-Watson
Copy link
Collaborator Author

so that it's formally part of the definition of a disease module

Grand! Have that done

@RachelMurray-Watson RachelMurray-Watson self-assigned this Sep 5, 2024
@RachelMurray-Watson
Copy link
Collaborator Author

just adding here the comment made in-person: I like the way this is being done overall.... and we should add this new method to the Module base class so that it's formally part of the definition of a disease module (in the same way that report_dalys is.)

(Based on conversation yesterday) - changed so that it is no longer in base class (as not all modules are disease modules), but there is an assertion checking to see that if something uses the healthburden module, it must have the report_prevalence function

@tbhallett
Copy link
Collaborator

tbhallett commented Dec 19, 2025

Thanks Rachel.
I could see a few things causing the failing tests, so have fixed those.
I've also proposed some changes to hopefully make this more an integrated part of the framework (using module name, allowing different sizes/shapes of returns etc.)
I've refactored to avoid duplicating the code for producing age/sex prevalence results -- but we could wind that back as there are a couple of exceptions, and I did it mostly to experiment with options relating to (1) below.

Some things to discuss:

  1. The measure of 'prevalence' being used mostly is {Number with disease in age-group} / {Number alive}. This is an unusual measure of 'prevalence' and most readers will expect it to be {Number with disease in age-group} / {Number alive in age-group}. There are probably good reasons why you've done your original approach, but it might be that we want to be outputting just the numerators so that the a 'denominator of choice' can be applied in the scripts. I haven't done this as I didn't want to break your scripts.

  2. Hard-coded logic re MMR in demography feels misplaced to me for an addition to master. This kind of thing is coming out the demography logs already, so the advantage of making a change to master to incorporate those metrics during the simulation seem small. If we need it to make the processing for future use-cases easier, then, it's an argument for ....

  3. The whole thing would actually probably fit most neatly in its own module. This would avoid having to make those additions in demography and it's all quite self-contained really. This would be a quick change, but again, didn't want to break your scripts.

None of this affects you using this for your own purposes. I was just reviewing it with a view to it coming into master so that others can use it in the way they expect and can be maintained along with the rest of the codebase. (You can still use it as you want by branching off from before I've added any changes).

So, I'm pausing here to discuss....

  • Whether these changes would disrupt your work flow. (In which case, just keep using your own versions)
  • Whether any of these design choices you've already considered and rejected for reasons I'm missing right now.

When we're aligned, I can make those last little changes (refactoring in its own class, updating what the prevalence metric actually is TBD), and get it merged (in the New Year).

@RachelMurray-Watson
Copy link
Collaborator Author

Thanks Rachel. I could see a few things causing the failing tests, so have fixed those. I've also proposed some changes to hopefully make this more an integrated part of the framework (using module name, allowing different sizes/shapes of returns etc.) I've refactored to avoid duplicating the code for producing age/sex prevalence results -- but we could wind that back as there are a couple of exceptions, and I did it mostly to experiment with options relating to (1) below.

Some things to discuss:

  1. The measure of 'prevalence' being used mostly is {Number with disease in age-group} / {Number alive}. This is an unusual measure of 'prevalence' and most readers will expect it to be {Number with disease in age-group} / {Number alive in age-group}. There are probably good reasons why you've done your original approach, but it might be that we want to be outputting just the numerators so that the a 'denominator of choice' can be applied in the scripts. I haven't done this as I didn't want to break your scripts.
  2. Hard-coded logic re MMR in demography feels misplaced to me for an addition to master. This kind of thing is coming out the demography logs already, so the advantage of making a change to master to incorporate those metrics during the simulation seem small. If we need it to make the processing for future use-cases easier, then, it's an argument for ....
  3. The whole thing would actually probably fit most neatly in its own module. This would avoid having to make those additions in demography and it's all quite self-contained really. This would be a quick change, but again, didn't want to break your scripts.

None of this affects you using this for your own purposes. I was just reviewing it with a view to it coming into master so that others can use it in the way they expect and can be maintained along with the rest of the codebase. (You can still use it as you want by branching off from before I've added any changes).

So, I'm pausing here to discuss....

  • Whether these changes would disrupt your work flow. (In which case, just keep using your own versions)
  • Whether any of these design choices you've already considered and rejected for reasons I'm missing right now.

When we're aligned, I can make those last little changes (refactoring in its own class, updating what the prevalence metric actually is TBD), and get it merged (in the New Year).

Thanks for making all these changes!!

  1. I can't remember the logic of why we did this (it may be that we were just interested in the population prevalence, but that doesn't explain the age group break down). I can definitely "just" output the other numbers, though, to allow more freedom in the end product.

  2. Okay, that's grand - I think there was a lot of back and forth as to how to include the various maternal/neonatal stats anyways, so removing it is no problem.

  3. I hadn't considered using it as it's own module, but if that is what is more convenient, I could do that (perhaps after Christmas).

…into rmw/log_prevalence_all_disease

# Conflicts:
#	src/tlo/methods/demography.py
… - the number of individuals of each age and sex with a specific disease/condition.

NB - no longer has a denominator, just raw numbers.

This replaces the "report_prevalence" that was previously in the healthburden module.
Included file with parameters

Redid test to not use HealthBurden Module any more
Fixed line formatting
as not called until after the initialisation
@tbhallett
Copy link
Collaborator

closing this PR in favour of #1772

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants