Segmentation fault when running tritonbench flash attention with `--causal`

### Describe the bug

I'm running the benchmarking command from the `ws` branch, but added the `--causal` flag, i.e.:

```
TORCH_CUDA_ARCH_LIST=9.0a cuda-gdb --args python run.py --op flash_attention --only triton_tutorial_flash_v2_ws,triton_tutorial_flash_v2_tma_ws,triton_tutorial_flash_v2 --num-inputs 1 --seq-len 4096 --metrics tflops --batch 8 --n-heads 16 --d-head 128 --causal
```

I'm seeing a segfault here:
```
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007d54b9f6b32e in mlir::detail::IROperandBase::insertInto<mlir::IRObjectWithUseList<mlir::OpOperand> > (useList=0x5b34c5d18750, this=0x5b34c5cef190) at /root/.triton/llvm/llvm-b5cc222d-ubuntu-x64/include/mlir/IR/UseDefLists.h:101
101           nextUse->back = &nextUse;
```

Without the flag it looks WAI.

### Environment details

Tritonbench at `3a5dccb159834968567a2e45e561dc1aeaa8f8a8`
Meta triton at `67f51cc1420cabeb6bf4d28c1813e38ea9a92e20`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault when running tritonbench flash attention with `--causal` #18

Describe the bug

Environment details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Segmentation fault when running tritonbench flash attention with --causal #18

Description

Describe the bug

Environment details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Segmentation fault when running tritonbench flash attention with `--causal` #18