Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding stats functions #694

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Adding stats functions #694

wants to merge 7 commits into from

Conversation

no-dpr
Copy link

@no-dpr no-dpr commented Jan 17, 2025

fixes #574

Implementing size property to lazy tensors to use with stats functions.

Copy link
Collaborator

@willow-ahrens willow-ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename the field

src/interface/lazy.jl Outdated Show resolved Hide resolved
Copy link
Collaborator

@willow-ahrens willow-ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! This is looking good! I've added some comments about packaging. This PR needs some tests as well. You could add tests to interface_tests.jl. Be sure to leave a link to the github issue to document what precisely the tests are testing.

src/Finch.jl Outdated
@@ -54,6 +54,7 @@ export lazy, compute, fused, tensordot, @einsum
export choose, minby, maxby, overwrite, initwrite, filterop, d

export fill_value, AsArray, expanddims, tensor_tree
export mean
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of exporting our own mean, we should use the mean from the Statistics standard library. We also don't want to depend on Statistics explicitly, instead we should use an "optional dependency", where mean, std, and var definitions are loaded only when the user loads both Finch and Statistics. Finch currently takes this approach with SparseArrays. You can see the approach in ext/SparseArraysExt.jl. We would also need to handle Statistics the same way we handle SparseArrays in the Project.toml. More on this concept here: https://pkgdocs.julialang.org/v1/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions)

function LinearAlgebra.norm(arr::AbstractTensorOrBroadcast, p::Real = 2)
compute(norm(lazy(arr), p))[]
end

"""
expanddims(arr::AbstractTensor, dims)
expanddims(arr::AbstractTensor, dims)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should stay indented

@@ -7,12 +7,13 @@ const AbstractArrayOrBroadcasted = Union{AbstractArray,Broadcasted}
mutable struct LazyTensor{T, N}
data
extrude::NTuple{N, Bool}
shape::NTuple{N, Int}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better or worse, Finch array dimensions may use different types independently. So you could have a matrix with row dimension UInt8 and column dimension BigInt. The way we handle this is to parameterize the type of LazyTensor based on the type of the dimensions:

mutable struct LazyTensor{T, N, Shape <: Tuple}
    data
    extrude::NTuple{N, Bool}
    shape::Shape
    fill_value::T
end

Note that the dimension tuple is not constrained to be length N with the type annotations. That's okay, we can check that it's correct when we construct the LazyTensor.


function Base.show(io::IO, tns::LazyTensor)
join(io, [x ? "1" : "?" for x in tns.extrude], "×")
join(io, [string(x) for x in tns.shape], "×")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may also be able to get away with join(io, tns.shape, "×"), but I'm not sure, try it?

end

function Base.map!(dst, f, src::LazyTensor, args...)
res = map(f, src, args...)
return LazyTensor(identify(reformat(dst, res.data)), res.extrude, res.fill_value)
return LazyTensor(identify(reformat(dst, res.data)), res.extrude, res.size, res.fill_value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost certainly, this should be size(res) or res.shape, depending on whether or not we know that res is a LazyTensor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mean, std, and var functions
2 participants