Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't compute log10(chl) for small chl #305

Merged
merged 2 commits into from
Oct 3, 2024

Conversation

mnlevy1981
Copy link
Collaborator

if chl <= chl_min, we already have log10(chl_min) stored as a parameter. Avoiding the computation of log10(chl) in those cases prevents the possibility of a floating point exception (if chl <= 0), and this approach replaces a max() statement with an if block so it shouldn't affect performance.

This requires adding chl_min to the optics_type (which already has log10chl_min and log10chl_max).

if chl <= chl_min, we already have log10(chl_min) stored as a parameter.
Avoiding the computation of log10(chl) in those cases prevents the possibility
of a floating point exception, and lets us replace a max() statement with an if
check so it shouldn't affect performance.

This requires adding chl_min to the optics_type (which already has log10chl_min
and log10chl_max).
@mnlevy1981
Copy link
Collaborator Author

Note that this fix will allow us to run CESM with DEBUG=TRUE and CHL_FROM_FILE=False (if MARBL is available to provide CHL). Previously, debug tests were failing because MARBL was returning chl=0 at some points and MOM6 was dutifully trying to compute log10(chl) anyway. The MOM_interface tag that includes this MOM6 update should add an SMS_D test for C1850MARBL_JRA to aux_mom_MARBL (and maybe prebeta?) to make sure we can continue to run in debug mode if needed.

@mnlevy1981 mnlevy1981 marked this pull request as ready for review September 27, 2024 22:17
@mnlevy1981
Copy link
Collaborator Author

testing: Verified this is bit-for-bit in a one month test run (with MARBL providing chlorophyll)

@klindsay28
Copy link
Collaborator

@mnlevy1981 , do you know why this is needed in lookup_ohlmann_swpen, but not lookup_ohlmann_opacity, which has the same log10chl computation?
Perhaps @fobryan3 has an idea about this.

@mnlevy1981
Copy link
Collaborator Author

@mnlevy1981 , do you know why this is needed in lookup_ohlmann_swpen, but not lookup_ohlmann_opacity, which has the same log10chl computation? Perhaps @fobryan3 has an idea about this.

Huh, that's really odd. I'll add a commit to update lookup_ohlmann_opacity() as well, but I don't understand why we're passing debug tests with the function written as-is. As Frank notes in his comment in the code, lookup_ohlmann_swpen() only uses surface chlorophyll while lookup_ohlmann_opacity() uses full depth:

    !!! Only the surface CHL is used above in setting optics%sw_pen_band for all schemes.
    !!! Seems inconsistent to use depth dependent CHL in opacity calculation.

So if anything I would expect to need to update opacity() but not swpen() -- if MARBL always returns positive chlorophyll at the surface but can have 0 chlorophyll at depth then we'd only hit log10(0) in the opacity computation.

@fobryan3
Copy link

fobryan3 commented Oct 1, 2024

I see the problem you are talking about - trying to evaluate log10(chl) when chl=0 - but am not totally following the thread above. I think Mike is on track to replace that min/max compound function with an if block would be the most straight forward. The table is in log space so we need to do a two step test for chl > 0 then log10(chl) > log10_chl_min .AND. log10(chl) < log10_chl_max. lookup_ohlmann_swpen is called before lookup_ohlmann_opacity. Is that why swpen triggers error?

@mnlevy1981
Copy link
Collaborator Author

I see the problem you are talking about - trying to evaluate log10(chl) when chl=0 - but am not totally following the thread above.

In the CESM test suite, setting DEBUG=TRUE causes runs with MARBL turned on to fail, and the error logs point to the chlorophyll lookup in lookup_ohlmann_swpen() as the culprit. The change in 5726d94 updates that subroutine to only do the lookup if chlorophyll is larger than the smallest value in the table, but leaves lookup_ohlmann_opacity() unchanged. Somehow that's sufficient to pass the debug tests, but I think the answer is to update lookup_ohlmann_opacity() as well and not spend too much time worrying about why the test passed even without the second update.

@marshallward
Copy link

Would min(log10(max(chl, chl_min)), log10(chl_max)) work? It would seem to avoid the chl=0 evaluation without introducing an if-block, which could impede performance.

log10(chl_max) would presumably be precomputed, although this could introduce bit differences.

previous commit only avoiding log10(0) in lookup_ohlmann_swpen() but we have
another log10(chl) in the opacity function
@mnlevy1981
Copy link
Collaborator Author

Would min(log10(max(chl, chl_min)), log10(chl_max)) work? It would seem to avoid the chl=0 evaluation without introducing an if-block, which could impede performance.

log10(chl_max) would presumably be precomputed, although this could introduce bit differences.

@marshallward do you prefer your proposed formulation to what is currently in this PR? The optics_type previously had log10chl_min and log10chl_max, and I added chl_min -- your proposal would let us drop both log10chl_min and log10chl_max, replacing the latter with chl_max, but it does add an additional log10() call in the instances where chl < chl_min

@klindsay28
Copy link
Collaborator

Would min(log10(max(chl, chl_min)), log10(chl_max)) work? It would seem to avoid the chl=0 evaluation without introducing an if-block, which could impede performance.

@marshallward, does an if-block have greater performance implications and min and/or max? I would have guessed that the implicit branches in min and max had behavior similar to an explicit if, but perhaps this just demonstrates my ignorance.

@marshallward
Copy link

The short answer is that min, max and log10 can be vectorized, whereas the if-block will prevent vectorization. And in many cases, both branches of an if-block will be initiated while the condition test is being checked, after which the correct path will be chosen.

The long answer is that it's better "most of the time" to use the branchless expressions. (e.g. log10 is quite slow, so it might be a consideration.) But it's almost never worse than the branched form.

In this case, there's a lot of if-blocks outside of your function call, so it's probably not very important. I probably should have held my tongue. 😝 Apologies for any distraction!

@klindsay28
Copy link
Collaborator

I probably should have held my tongue. 😝 Apologies for any distraction!

Thanks for the explanation. I have no complaints about learning from your knowledge and experience.

@mnlevy1981
Copy link
Collaborator Author

mnlevy1981 commented Oct 3, 2024

I ran pr_mom and got expected results: DIMCSL failed (#302), and the baseline comparison for that test failed in the dimscale_1_Tp comparisons but all other tests in the testlist were bit-for-bit (as were the other comparisons within the DIMSCL). @alperaltuntas if you approve this, I'll merge it and make a new tag so I can finish ESCOMP/MOM_interface#192

68	PASS
	5	RUN (PASS)
      1 DIMCSL_Ld1.TL319_t232.G_JRA.derecho_intel.mom-cfc_mods
      1 ERI.TL319_t232.G_JRA.derecho_intel.mom-debug
      1 ERS.TL319_t232_wg37.GW_JRA.derecho_intel
      1 SMS_D.TL319_t232.G_JRA_RYF.derecho_intel
      1 SMS_Ld2.ne30pg3_t232.BLT1850.derecho_intel.mom-bcompset
	4	BASELINE (PASS)
      1 ERI.TL319_t232.G_JRA.derecho_intel.mom-debug
      1 ERS.TL319_t232_wg37.GW_JRA.derecho_intel
      1 SMS_D.TL319_t232.G_JRA_RYF.derecho_intel
      1 SMS_Ld2.ne30pg3_t232.BLT1850.derecho_intel.mom-bcompset
6	FAIL
	2	COMPARE (FAIL)
      1 DIMCSL_Ld1.TL319_t232.G_JRA.derecho_intel.mom-cfc_mods
      1 ERI.TL319_t232.G_JRA.derecho_intel.mom-debug
	3	MEMCOMP (FAIL)
      1 DIMCSL_Ld1.TL319_t232.G_JRA.derecho_intel.mom-cfc_mods
      1 ERI.TL319_t232.G_JRA.derecho_intel.mom-debug
      1 SMS_Ld2.ne30pg3_t232.BLT1850.derecho_intel.mom-bcompset
	1	BASELINE (FAIL)
      1 DIMCSL_Ld1.TL319_t232.G_JRA.derecho_intel.mom-cfc_mods

Sidenote: is the ERI comparison failure known? It occurred while generating the baselines as well:

2	FAIL
	2	COMPARE (FAIL)
      1 DIMCSL_Ld1.TL319_t232.G_JRA.derecho_intel.mom-cfc_mods
      1 ERI.TL319_t232.G_JRA.derecho_intel.mom-debug

Copy link
Collaborator

@klindsay28 klindsay28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mnlevy1981 mnlevy1981 merged commit ad7cf38 into NCAR:dev/ncar Oct 3, 2024
10 checks passed
@mnlevy1981 mnlevy1981 deleted the avoid_zero_chl branch October 30, 2024 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants