Skip to content

Fix ShardLevel Structure#765

Closed
AdrianGushin wants to merge 71 commits intofinch-tensor:wma/shard_levelsfrom
AdrianGushin:shard_dev
Closed

Fix ShardLevel Structure#765
AdrianGushin wants to merge 71 commits intofinch-tensor:wma/shard_levelsfrom
AdrianGushin:shard_dev

Conversation

@AdrianGushin
Copy link
Collaborator

This pull request adds some fixes for the ShardLevel structure and adds associated test cases. It also updates the CPU struct to facilitate multiple distinct cores.

Copy link
Member

@willow-ahrens willow-ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there! This is looking great, only small changes requested.

Project.toml Outdated
DataStructures = "0.18"
Distributions = "0.25"
HDF5 = "0.17"
InteractiveUtils = "1.11.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InteractiveUtils shouldn't be a Finch dep, we need to remove before merging.

Project.toml Outdated
NPZ = "15e1cf62-19b3-5cfa-8e77-841668bca605"
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
TensorMarket = "8b7d4fe7-0b45-4d0d-9dd8-5cc9b23b4b77" No newline at end of file
TensorMarket = "8b7d4fe7-0b45-4d0d-9dd8-5cc9b23b4b77"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensor market should be a test dep, but I don't think it's an extra right? do we need to move it to the test project.toml?

@@ -1,8 +1,10 @@
using InteractiveUtils
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this, it was just for debugging


A datatype representing a device on which tasks can be executed.
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this like break the documentation for abstract device?

end,
)
VirtualCPU(value(n, Int))
VirtualCPU(value(n, Int), literal(id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also okay to just use id without wrapping as literal, as long as you wrap it when needed. whatever's convenient

FinchNotation.finch_leaf(mem::VirtualCPULocalMemory) = virtual(mem)
function virtualize(ctx, ex, ::Type{CPULocalMemory})
VirtualCPULocalMemory(virtualize(ctx, :($ex.device), CPU))
function virtualize(ctx, ex, ::Type{CPULocalMemory{id}}) where {id}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of keying CPULocalMemory on id, let's put the whole CPU type in the type parameter so that further changes to CPU parameterization don't need to affect the localmem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we can recursively virtualize the cpu

global_memory(device::CPU) = CPUSharedMemory(device)
local_memory(device::CPU{id}) where {id} = CPULocalMemory{id}(device)
shared_memory(device::CPU{id}) where {id} = CPUSharedMemory{id}(device)
global_memory(device::CPU{id}) where {id} = CPUSharedMemory{id}(device)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here, I think it makes more sense to key on the CPU than the ID. feel free to disagree here.

function transfer(task::MemoryChannel, arr::MultiChannelBuffer)
if task.device == arr.device
temp = arr.data[task.t]
@assert isa(temp, Vector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good for debugging, but I don't think this will always be the case, we might have different buffer types than vector


@testset "Finch" begin
include("modules/checkoutput_testsetup.jl")
include("suites/constructors_tests.jl")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow. It's crazy that this wasn't already included

end

@test C[4,4] == 12
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test! Let's also add a test in perhaps representation.jl which generates some reference output for a shard level kernel

@willow-ahrens
Copy link
Member

We can merge this to main once we have some more tests.

Copy link
Member

@willow-ahrens willow-ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Adrian! This all looks good, the only requirement we need now is a test which calls check_output to compare the generated shardlevel code against a reference output. See elsewhere in the code where this function is used for examples. You'll need to follow instructions in the contributing guide to generate new 64-bit reference, and run the "fixbot" action to generate new 32-bit reference

@willow-ahrens
Copy link
Member

I've added you to the repo as a collaborator, you can re-open the PR using finch-tensor as the remote and it will automatically run tests.

@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

❌ Patch coverage is 77.46479% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/architecture.jl 63.33% 11 Missing ⚠️
src/tensors/levels/shard_levels.jl 89.74% 4 Missing ⚠️
src/tensors/levels/sparse_list_levels.jl 0.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
src/lower.jl 87.86% <ø> (ø)
src/tensors/levels/sparse_dict_levels.jl 88.41% <100.00%> (ø)
src/tensors/levels/sparse_list_levels.jl 92.41% <0.00%> (ø)
src/tensors/levels/shard_levels.jl 64.59% <89.74%> (ø)
src/architecture.jl 64.08% <63.33%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@willow-ahrens willow-ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A summary of failing tests and next steps to merge this:

  1. Windows tests are failing because we need to run fixbot (Contributing.md)
  2. Ubuntu latest is failing for unrelated reasons (my bad, julia 1.12.3 is much stricter)
  3. I'd like to see the CPU tagging and SparseDict, etc. fixes split out and merged to main first.

I'll leave 1 to you. I'll work on solving 2 asap. I won't be able to get to 3 until sometime tomorrow evening, feel free to do that earlier! I use "gitlens search & compare" vscode extension for stuff like that.

- uses: actions/checkout@v3
with:
token: ${{ secrets.WILLOW_BOT_TOKEN }}
token: ${{ secrets.ADRIAN_BOT_TOKEN }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must have run this from your fork. If the willow_bot_token is not working, I should fix that rather than you using your own token on your fork, so that others can also trigger the action from the actions pane without generating their own tokens. I have verified you have write permissions, so in the future, if you make this a branch on the finch-tensor/Finch.jl repo, you should be able to manually run FixBot on this branch from the actions tab without changing this secret.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants