Originally a task in Add P-eagle support in training.
Task
Attention mask construction for non-causal $[L, 2L]$(position length) patterns within groups
- Attention mask construction for parallel patterns, using flex attention
- Use simple concatenation without slicing to verify correctness first
- If needed: flex attention supports slicing "Block masks" for optimization
- Test that attention masks line up with author’s implementation
Originally a task in Add P-eagle support in training.
Task
Attention mask construction for non-causal$[L, 2L]$ (position length) patterns within groups