Skip to content

SUBTASK 3 Peagle Attention Mask Implementation #330

@shanjiaz

Description

@shanjiaz

Originally a task in Add P-eagle support in training.

Task

Attention mask construction for non-causal $[L, 2L]$(position length) patterns within groups

  • Attention mask construction for parallel patterns, using flex attention
    1. Use simple concatenation without slicing to verify correctness first
    2. If needed: flex attention supports slicing "Block masks" for optimization
  • Test that attention masks line up with author’s implementation

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestin-progressIssue is already assigned and in-progress.keep-open

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions