Getting "op exceeded stack allocation limit" from linalg_ext.fft

### What happened?

The `linalg_ext.fft` operation outputs 2 tensors: the real part and the imaginary part of the operation. If the tensors are large enough and one of the output tensors is not used later in the program, the compile will fail with `error: 'func.func' op exceeded stack allocation limit of 32768 bytes for function.`. This compile error is surprising for users because the exact same program works with smaller tensor sizes, then fails to compile once the tensors are large enough, even if the size of the 1D FFT is the same.

For example, here is the last part of an IRFFT (takes a complex input and returns a real output).
```
%5:2 = iree_linalg_ext.fft ins(%c1, %cst_8, %cst_7 : index, tensor<1xf32>, tensor<1xf32>) outs(%4#0, %4#1 : tensor<32xf32>, tensor<32xf32>) : tensor<32xf32>, tensor<32xf32>
%6:2 = iree_linalg_ext.fft ins(%c2, %cst_6, %cst_5 : index, tensor<2xf32>, tensor<2xf32>) outs(%5#0, %5#1 : tensor<32xf32>, tensor<32xf32>) : tensor<32xf32>, tensor<32xf32>
%7:2 = iree_linalg_ext.fft ins(%c3, %cst_4, %cst_3 : index, tensor<4xf32>, tensor<4xf32>) outs(%6#0, %6#1 : tensor<32xf32>, tensor<32xf32>) : tensor<32xf32>, tensor<32xf32>
%8:2 = iree_linalg_ext.fft ins(%c4, %cst_2, %cst_1 : index, tensor<8xf32>, tensor<8xf32>) outs(%7#0, %7#1 : tensor<32xf32>, tensor<32xf32>) : tensor<32xf32>, tensor<32xf32>
%9:2 = iree_linalg_ext.fft ins(%c5, %cst_0, %cst : index, tensor<16xf32>, tensor<16xf32>) outs(%8#0, %8#1 : tensor<32xf32>, tensor<32xf32>) : tensor<32xf32>, tensor<32xf32>
util.return %9#0 : tensor<32xf32>
```
Notice that it returns `%9#0` and ignores `%9#1` because the imaginary part is not needed. The problem is that, once the input sizes get large enough, this errors out with `error: 'func.func' op exceeded stack allocation limit`. The reason is that the imaginary part of that last FFT got marked as readonly. Here are what the tensors to that last FFT get turned into
```
%0 = hal.interface.binding.subspan layout(#pipeline_layout3) binding(0) alignment(64) offset(%c0) flags(Indirect) : !iree_tensor_ext.dispatch.tensor<readwrite:tensor<32xf32>>
%1 = hal.interface.binding.subspan layout(#pipeline_layout3) binding(1) alignment(64) offset(%c256) flags("ReadOnly|Indirect") : !iree_tensor_ext.dispatch.tensor<readonly:tensor<32xf32>>
```

We've been working around this by using the imaginary output in some non-trivial way to essentially fool the compiler into not marking it readonly. I discovered that using `util.optimization_barrier %9#1` will also work around the issue.

@MaheshRavishankar suggested in issue https://github.com/iree-org/iree/issues/22695 that the right fix would be vectorize the FFT operation.

@hanhanW Moving the conversation here from https://github.com/iree-org/iree/issues/22473

Repro case is attached here: [irfft.zip](https://github.com/user-attachments/files/23779602/irfft.zip)

### Steps to reproduce your issue

iree-compile --iree-hal-target-device=local --iree-hal-local-target-device-backends=llvm-cpu --iree-llvmcpu-target-cpu=host -o /dev/null irfft.mlir


### What component(s) does this issue relate to?

Compiler

### Version information

3.9.0 (I also tried 3.8.0 and 3.5.0)

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Getting "op exceeded stack allocation limit" from linalg_ext.fft #22776

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Getting "op exceeded stack allocation limit" from linalg_ext.fft #22776

Description

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions