Clarify public symbols (#477)

gdalle · willtebbutt · web-flow · commit a5cb83b92e85 · 2025-02-24T08:32:57.000Z
* Clarify public symbols

* Imports

* Imports

* macros

* Format

* Add preparation

* Handle macros

* Tweak imports

* Refine export list

* Fix CUDA loading

* Fix benchmark loading

* Fix docs

* Actually fix docs build

* Formatting

* Do not export Config

* Tweak canonical settings

* Do interface properly

* Sort out canonicalisation

* Bump patch version

* Bump patch version again

* Bump patch _again_

---------

Co-authored-by: willtebbutt &lt;wtebbutt@turing.ac.uk&gt;
Co-authored-by: Will Tebbutt &lt;wt0881@my.bristol.ac.uk&gt;
diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "Mooncake"
 uuid = "da2b9cff-9c12-43a0-ae48-6db2b0edb7d6"
 authors = ["Will Tebbutt, Hong Ge, and contributors"]
-version = "0.4.97"
+version = "0.4.98"
 
 [deps]
 ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b"
diff --git a/bench/run_benchmarks.jl b/bench/run_benchmarks.jl
@@ -23,7 +23,10 @@ using Mooncake:
     generate_hand_written_rrule!!_test_cases,
     generate_derived_rrule!!_test_cases,
     TestUtils,
-    _typeof
+    _typeof,
+    primal,
+    tangent,
+    zero_codual
 
 using Mooncake.TestUtils: _deepcopy
 
diff --git a/docs/make.jl b/docs/make.jl
@@ -5,6 +5,10 @@ DocMeta.setdocmeta!(
     :DocTestSetup,
     quote
         using Random, Mooncake
+        using Mooncake: tangent_type, fdata_type, rdata_type
+        using Mooncake: zero_tangent
+        using Mooncake: NoTangent, NoFData, NoRData, MutableTangent, Tangent
+        using Mooncake: build_rrule, Config
     end;
     recursive=true,
 )
@@ -30,6 +34,7 @@ makedocs(;
     pages=[
         "Mooncake.jl" => "index.md",
         "Tutorial" => "tutorial.md",
+        "Interface" => "interface.md",
         "Understanding Mooncake.jl" => [
             joinpath("understanding_mooncake", "introduction.md"),
             joinpath("understanding_mooncake", "algorithmic_differentiation.md"),
diff --git a/docs/src/developer_documentation/developer_tools.md b/docs/src/developer_documentation/developer_tools.md
@@ -6,7 +6,7 @@ which save you from having to dig in to the objects created by `build_rrule`.
 
 Since these provide access to internals, they do not follow the usual rules of semver, and
 may change without notice!
-```@docs
+```@docs; canonical=false
 Mooncake.primal_ir
 Mooncake.fwd_ir
 Mooncake.rvs_ir
diff --git a/docs/src/developer_documentation/internal_docstrings.md b/docs/src/developer_documentation/internal_docstrings.md
@@ -5,11 +5,12 @@ Consequently, they can change between non-breaking changes to Mooncake.jl withou
 
 The purpose of this is to make it easy for developers to find docstrings straightforwardly via the docs, as opposed to having to ctrl+f through Mooncake.jl's source code, or looking at the docstrings via the Julia REPL.
 
-```@autodocs; canonical=false
+```@autodocs; canonical=true
 Modules = [Mooncake]
 Public = false
+Filter = t -> !(t in [Mooncake.value_and_pullback!!, Mooncake.prepare_pullback_cache, Mooncake.Config])
 ```
 
-```@docs
+```@docs; canonical=true
 Mooncake.IntrinsicsWrappers
-```
+```
diff --git a/docs/src/developer_documentation/misc_internals_notes.md b/docs/src/developer_documentation/misc_internals_notes.md
@@ -132,7 +132,7 @@ Last checked: 09/02/2025, Julia v1.10.8 / v1.11.3, Mooncake 0.4.82.
 Mooncake handles recursive function calls by delaying code generation for generic function calls until the first time that they are actually run.
 The docstring below contains a thorough explanation:
 
-```@docs
+```@docs; canonical=false
 Mooncake.LazyDerivedRule
 ```
 
diff --git a/docs/src/interface.md b/docs/src/interface.md
@@ -0,0 +1,14 @@
+# Interface
+
+This is the public interface that day-to-day users of AD are expected to interact with if
+for some reason DifferentiationInterface.jl does not suffice.
+If you have not tried using Mooncake.jl via DifferentiationInterface.jl, please do so.
+See [Tutorial](@ref) for more info.
+
+```@docs; canonical=true
+Mooncake.Config
+Mooncake.value_and_gradient!!(::Mooncake.Cache, f::F, x::Vararg{Any, N}) where {F, N}
+Mooncake.value_and_pullback!!(::Mooncake.Cache, ȳ, f::F, x::Vararg{Any, N}) where {F, N}
+Mooncake.prepare_gradient_cache
+Mooncake.prepare_pullback_cache
+```
diff --git a/docs/src/known_limitations.md b/docs/src/known_limitations.md
@@ -7,6 +7,7 @@ Mooncake.jl has a number of known qualitative limitations, which we document her
 ```@meta
 DocTestSetup = quote
     using Mooncake
+    using Mooncake: NoTangent, build_rrule
 end
 ```
 
diff --git a/docs/src/understanding_mooncake/rule_system.md b/docs/src/understanding_mooncake/rule_system.md
@@ -279,8 +279,8 @@ _**Representing Gradients**_
 This package assigns to each type in Julia a unique `tangent_type`, the purpose of which is to contain the gradients computed during reverse mode AD.
 The extended docstring for [`tangent_type`](@ref) provides the best introduction to the types which are used to represent tangents / gradients.
 
-```@docs
-tangent_type(P)
+```@docs; canonical=false
+Mooncake.tangent_type(P)
 ```
 
 
@@ -295,7 +295,7 @@ Conversely, the gradient w.r.t. a value type resides in another value type.
 
 The following docstring provides the best in-depth explanation.
 
-```@docs
+```@docs; canonical=false
 Mooncake.fdata_type(T)
 ```
 
@@ -327,7 +327,7 @@ Now that you've seen what data structures are used to represent gradients, we ca
 ```@meta
 DocTestSetup = quote
     using Mooncake
-    using Mooncake: CoDual
+    using Mooncake: CoDual, NoFData, NoRData
     import Mooncake: rrule!!
 end
 ```
diff --git a/docs/src/utilities/debug_mode.md b/docs/src/utilities/debug_mode.md
@@ -33,12 +33,12 @@ _**The Solution**_
 Check that the types of the fdata / rdata associated to arguments are exactly what `tangent_type` / `fdata_type` / `rdata_type` require upon entry to / exit from rules and pullbacks.
 
 This is implemented via `DebugRRule`:
-```@docs
+```@docs; canonical=false
 Mooncake.DebugRRule
 ```
 
 You can straightforwardly enable it when building a rule via the `debug_mode` kwarg in the following:
-```@docs
+```@docs; canonical=false
 Mooncake.build_rrule
 ```
 
diff --git a/docs/src/utilities/debugging_and_mwes.md b/docs/src/utilities/debugging_and_mwes.md
@@ -5,7 +5,7 @@ In order to debug what is going on when this happens, or to produce an MWE, it i
 
 We recommend making use of Mooncake.jl's testing functionality to generate your test cases:
 
-```@docs
+```@docs; canonical=false
 Mooncake.TestUtils.test_rule
 ```
 
diff --git a/docs/src/utilities/defining_rules.md b/docs/src/utilities/defining_rules.md
@@ -6,7 +6,7 @@ In this section, we detail some useful strategies which can help you avoid havin
 
 ## Simplfiying Code via Overlays
 
-```@docs
+```@docs; canonical=false
 Mooncake.@mooncake_overlay
 ```
 
@@ -15,7 +15,7 @@ Mooncake.@mooncake_overlay
 If the above strategy does not work, but you find yourself in the surprisingly common
 situation that the adjoint of the derivative of your function is always zero, you can very
 straightforwardly write a rule by making use of the following:
-```@docs
+```@docs; canonical=false
 Mooncake.@zero_adjoint
 Mooncake.zero_adjoint
 ```
@@ -28,18 +28,18 @@ There are some instances where it is most convenient to implement a `Mooncake.rr
 
 There is enough similarity between these two systems that most of the boilerplate code can be avoided.
 
-```@docs
+```@docs; canonical=false
 Mooncake.@from_rrule
 ```
 
 ## Adding Methods To `rrule!!` And `build_primitive_rrule`
 
 If the above strategies do not work for you, you should first implement a method of [`Mooncake.is_primitive`](@ref) for the signature of interest:
-```@docs
+```@docs; canonical=false
 Mooncake.is_primitive
 ```
 Then implement a method of one of the following:
-```@docs
+```@docs; canonical=false
 Mooncake.rrule!!
 Mooncake.build_primitive_rrule
 ```
diff --git a/ext/MooncakeCUDAExt.jl b/ext/MooncakeCUDAExt.jl
@@ -10,6 +10,7 @@ import Mooncake:
     rrule!!,
     @is_primitive,
     tangent_type,
+    primal,
     tangent,
     zero_tangent_internal,
     randn_tangent_internal,
@@ -26,6 +27,7 @@ import Mooncake:
     increment_and_get_rdata!,
     MaybeCache,
     IncCache,
+    NoRData,
     StackDict
 
 import Mooncake.TestUtils:
diff --git a/src/Mooncake.jl b/src/Mooncake.jl
@@ -143,33 +143,11 @@ include("interface.jl")
 include("config.jl")
 include("developer_tools.jl")
 
-export primal,
-    tangent,
-    randn_tangent,
-    increment!!,
-    NoTangent,
-    Tangent,
-    MutableTangent,
-    PossiblyUninitTangent,
-    set_to_zero!!,
-    tangent_type,
-    zero_tangent,
-    _scale,
-    _add_to_primal,
-    _diff,
-    _dot,
-    zero_codual,
-    codual_type,
-    rrule!!,
-    build_rrule,
-    value_and_gradient!!,
-    value_and_pullback!!,
-    NoFData,
-    NoRData,
-    fdata_type,
-    rdata_type,
-    fdata,
-    rdata,
-    get_interpreter
+# Public, not exported
+include("public.jl")
+@public Config, value_and_pullback!!, prepare_pullback_cache
+
+# Public, exported
+export value_and_gradient!!, prepare_gradient_cache
 
 end
diff --git a/src/config.jl b/src/config.jl
@@ -1,7 +1,7 @@
 """
     Config(; debug_mode=false, silence_debug_messages=false)
 
-Configuration struct for use with ADTypes.AutoMooncake.
+Configuration struct for use with `ADTypes.AutoMooncake`.
 """
 @kwdef struct Config
     debug_mode::Bool = false
diff --git a/src/interface.jl b/src/interface.jl
@@ -175,10 +175,7 @@ _copy!!(::Number, src::Number) = src
 """
     prepare_pullback_cache(f, x...)
 
-WARNING: experimental functionality. Interface subject to change without warning!
-
-Returns a `cache` which can be passed to `value_and_gradient!!`. See the docstring for
-`Mooncake.value_and_gradient!!` for more info.
+Returns a cache used with [`value_and_pullback!!`](@ref). See that function for more info.
 """
 function prepare_pullback_cache(fx...; kwargs...)
 
@@ -200,18 +197,46 @@ end
 """
     value_and_pullback!!(cache::Cache, ȳ, f, x...)
 
-WARNING: experimental functionality. Interface subject to change without warning!
+!!! info
+    If `f(x...)` returns a scalar, you should use [`value_and_gradient!!`](@ref), not this
+    function.
+
+Computes a 2-tuple. The first element is `f(x...)`, and the second is a tuple containing the
+pullback of `f` applied to `ȳ`. The first element is the component of the pullback
+associated to any fields of `f`, the second w.r.t the first element of `x`, etc.
+
+There are no restrictions on what `y = f(x...)` is permitted to return. However, `ȳ` must be
+an acceptable tangent for `y`. This means that, for example, it must be true that
+`tangent_type(typeof(y)) == typeof(ȳ)`.
+
+As with all functionality in Mooncake, if `f` modifes itself or `x`, `value_and_gradient!!`
+will return both to their original state as part of the process of computing the gradient.
+
+!!! info
+    `cache` must be the output of [`prepare_pullback_cache`](@ref), and (fields of) `f` and
+    `x` must be of the same size and shape as those used to construct the `cache`. This is
+    to ensure that the gradient can be written to the memory allocated when the `cache` was
+    built.
+
+!!! warning
+    `cache` owns any mutable state returned by this function, meaning that mutable
+    components of values returned by it will be mutated if you run this function again with
+    different arguments. Therefore, if you need to keep the values returned by this function
+    around over multiple calls to this function with the same `cache`, you should take a
+    copy (using `copy` or `deepcopy`) of them before calling again.
+
+# Example Usage
+```jldoctest
+f(x, y) = sum(x .* y)
+x = [2.0, 2.0]
+y = [1.0, 1.0]
+cache = Mooncake.prepare_pullback_cache(f, x, y)
+Mooncake.value_and_pullback!!(cache, 1.0, f, x, y)
 
-Like other methods of `value_and_pullback!!`, but makes use of the `cache` object returned
-by [`prepare_pullback_cache`](@ref) in order to avoid having to re-allocate various tangent
-objects repeatedly. You must ensure that `f` and `x` are the same types and sizes as those
-used to construct `cache`.
+# output
 
-Warning: `cache` owns any mutable state returned by this function, meaning that mutable
-components of values returned by it will be mutated if you run this function again with
-different arguments. Therefore, if you need to keep the values returned by this function
-around over multiple calls to this function with the same `cache`, you should take a copy
-(using `copy` or `deepcopy`) of them before calling again.
+(4.0, (NoTangent(), [1.0, 1.0], [2.0, 2.0]))
+```
 """
 function value_and_pullback!!(cache::Cache, ȳ, f::F, x::Vararg{Any,N}) where {F,N}
     tangents = tuple_map(set_to_zero!!, cache.tangents)
@@ -222,10 +247,7 @@ end
 """
     prepare_gradient_cache(f, x...)
 
-WARNING: experimental functionality. Interface subject to change without warning!
-
-Returns a `cache` which can be passed to `value_and_gradient!!`. See the docstring for
-`Mooncake.value_and_gradient!!` for more info.
+Returns a cache used with [`value_and_gradient!!`](@ref). See that function for more info.
 """
 function prepare_gradient_cache(fx...; kwargs...)
     rule = build_rrule(fx...; kwargs...)
@@ -236,20 +258,42 @@ function prepare_gradient_cache(fx...; kwargs...)
 end
 
 """
-    value_and_gradient!!(cache::Cache, fx::Vararg{Any, N}) where {N}
+    value_and_gradient!!(cache::Cache, f, x...)
+
+Computes a 2-tuple. The first element is `f(x...)`, and the second is a tuple containing the
+gradient of `f` w.r.t. each argument. The first element is the gradient w.r.t any
+differentiable fields of `f`, the second w.r.t the first element of `x`, etc.
+
+Assumes that `f` returns a `Union{Float16, Float32, Float64}`.
+
+As with all functionality in Mooncake, if `f` modifes itself or `x`, `value_and_gradient!!`
+will return both to their original state as part of the process of computing the gradient.
 
-WARNING: experimental functionality. Interface subject to change without warning!
+!!! info
+    `cache` must be the output of [`prepare_gradient_cache`](@ref), and (fields of) `f` and
+    `x` must be of the same size and shape as those used to construct the `cache`. This is
+    to ensure that the gradient can be written to the memory allocated when the `cache` was
+    built.
 
-Like other methods of `value_and_gradient!!`, but makes use of the `cache` object returned
-by [`prepare_gradient_cache`](@ref) in order to avoid having to re-allocate various tangent
-objects repeatedly. You must ensure that `f` and `x` are the same types and sizes as those
-used to construct `cache`.
+!!! warning
+    `cache` owns any mutable state returned by this function, meaning that mutable
+    components of values returned by it will be mutated if you run this function again with
+    different arguments. Therefore, if you need to keep the values returned by this function
+    around over multiple calls to this function with the same `cache`, you should take a
+    copy (using `copy` or `deepcopy`) of them before calling again.
 
-Warning: `cache` owns any mutable state returned by this function, meaning that mutable
-components of values returned by it will be mutated if you run this function again with
-different arguments. Therefore, if you need to keep the values returned by this function
-around over multiple calls to this function with the same `cache`, you should take a copy
-(using `copy` or `deepcopy`) of them before calling again.
+# Example Usage
+```jldoctest
+f(x, y) = sum(x .* y)
+x = [2.0, 2.0]
+y = [1.0, 1.0]
+cache = prepare_gradient_cache(f, x, y)
+value_and_gradient!!(cache, f, x, y)
+
+# output
+
+(4.0, (NoTangent(), [1.0, 1.0], [2.0, 2.0]))
+```
 """
 function value_and_gradient!!(cache::Cache, f::F, x::Vararg{Any,N}) where {F,N}
     coduals = tuple_map(CoDual, (f, x...), tuple_map(set_to_zero!!, cache.tangents))
diff --git a/src/public.jl b/src/public.jl
diff --git a/src/test_resources.jl b/src/test_resources.jl
diff --git a/src/test_utils.jl b/src/test_utils.jl
diff --git a/src/tools_for_rules.jl b/src/tools_for_rules.jl
diff --git a/test/front_matter.jl b/test/front_matter.jl