Skip to content

Conversation

@tkarna
Copy link
Contributor

@tkarna tkarna commented Nov 18, 2025

Implements #16.

  • Adds a Workload object and execution engine utility functions that can execute it.
  • Adds two CPU examples that demonstrate the usage
    • One that allocates input data with NumPy and another that uses MLIR helper functions to allocate/fill/deallocate input memrefs.

Copy link
Contributor

@rolfmorel rolfmorel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a partial pass through it. Will try to complete the first pass through tomorrow!

pass

@abstractmethod
def get_input_arrays(self, execution_engine) -> list:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> abc.Sequence[ir.Value]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had that each Workload is explicitly associated with a Target (or more than one), i.e. has a Target member, and each Target keeps track of it's associated execution_engine, this signature becomes simpler.

pass

@abstractmethod
def get_complexity(self) -> list:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the different "dimensions" of the returned thing are known, and have sensible names, I would go with a NamedTuple.

print(schedule_module)


def lower_payload(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this just be a non-abstractmethod method on the Workload class?

dump_schedule: bool = False,
):
if not dump_kernel or dump_kernel != "initial":
with context, location:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind there are very few cases where you have a reference to a mlir.ir.Context and haven't entered its context manager yet. The story for mlir.ir.Location is similar (there you might actually want to enter and exit nested Locations based on the scope of the IR you're building).

Have a look at #20 (comment) and #20 (comment) . My take is that the function which acts as the process's main() should set up and enter a mlir.ir.Context and set a mlir.ir.Location.unknown(). If we do that, the rest of the code should barely need to think about these any more.

if not dump_kernel or dump_kernel != "initial":
with context, location:
# invoke transform interpreter directly
transform_interpreter.apply_named_sequence(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have llvm/llvm-project#168223, this can probably be simplified. (E.g. do we need apply_transform_schedule?)

return (flop_count, memory_reads, memory_writes)

def payload_module(self):
with self.context, self.location:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with self.context, self.location:

A = f.arguments[0]
B = f.arguments[1]
C = f.arguments[2]
a_tensor = bufferization.ToTensorOp(tensor_t, A, restrict=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#10 (review):

As a general principle we can now set for ourselves for the project right at the start: in case the upstream bindings already have a workable snake_case version of an op, lets use that over the CamelCaseOp version. The crux of the argument being that this will make the Python code look closer to a terse version of the MLIR textual format.

c_tensor = bufferization.ToTensorOp(
tensor_t, C, restrict=True, writable=True
)
add = linalg.add(a_tensor, b_tensor, outs=[c_tensor])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to prefer doing the linalg.add on tensors rather than on memrefs? If anything, the IR would be even simpler.

schedule_module.operation.attributes[
"transform.with_named_sequence"] = (ir.UnitAttr.get())
with ir.InsertionPoint(schedule_module.body):
named_sequence = transform.NamedSequenceOp(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
named_sequence = transform.NamedSequenceOp(
named_sequence = transform.named_sequence(

Comment on lines +141 to +143
if dump_kernel == "bufferized":
transform.YieldOp()
return schedule_module
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this in the committed version?

An alternative to consider is inserting transform.print %mod if some verbosity flag has been set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants